Decision implementation with integrated data quality monitoring

US11816079B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11816079-B2
Application numberUS-202117359849-A
CountryUS
Kind codeB2
Filing dateJun 28, 2021
Priority dateJun 28, 2021
Publication dateNov 14, 2023
Grant dateNov 14, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computer-implemented methods and systems include downstream execution for individual rule-based flagging of upstream data quality errors by receiving upstream data from a plurality of sources, identifying a downstream task to be executed, applying a plurality of rules to the upstream data, generating a plurality of outputs including at least one output for each of the plurality of rules applied to the upstream data, each of the plurality of outputs being associated with a corresponding rule of the plurality of rules, identifying a tagged population based on the plurality of outputs, determining that at least one of the plurality of outputs does not meet a corresponding rule threshold, and activating the downstream execution for the tagged population after at least one of (i) updating the corresponding rule threshold or (ii) overriding an error.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented downstream execution method for individual rule-based flagging of upstream data quality errors, comprising: receiving upstream data, corresponding to an overall population of users, from a plurality of sources each source selected from one of a relational database, a non-relational database, or a file system; identifying a downstream task to be executed, the downstream task being associated with at least a portion of the overall population of users; applying a plurality of rules to the upstream data, each of said plurality of rules determining the inclusion or exclusion of a portion of the overall population of users; generating a plurality of outputs including at least one output for each of the plurality of rules applied to the upstream data, each of the plurality of outputs being associated with a corresponding rule of the plurality of rules; identifying a tagged population of users based on the plurality of outputs, the tagged population being a subset of the overall population of users; determining that at least one of the plurality of outputs does not meet a corresponding rule threshold; and activating an execution of the downstream task for the tagged population of users after at least one of (i) updating the corresponding rule threshold or (ii) overriding an error generated based on the determining that the at least one of the plurality of outputs does not meeting the corresponding rule threshold. 2. The method of claim 1 , further comprising generating a graphical representation of the at least one of the plurality of outputs that does not meet the corresponding rule threshold, the graphical representation comprising an indication of the corresponding rule threshold. 3. The method of claim 1 , wherein the upstream data comprises one or more of user account information, user behavior information, user action information, user status, or user changes. 4. The method of claim 1 , wherein the upstream data comprises one or more of a system status, a system profile, and a system action. 5. The method of claim 1 , wherein the corresponding rule threshold is generated by a machine learning model. 6. The method of claim 5 , wherein the machine learning model is updated based on the execution of the downstream task. 7. The method of claim 5 , wherein the machine learning model is generated based on training data comprising data from past execution of the downstream task. 8. The method of claim 5 , wherein the machine learning model is generated based on training data from attributes associated with the corresponding rule. 9. The method of claim 1 , further comprising organizing the upstream data based at least on a type of at least a subset of the upstream data. 10. The method of claim 9 , wherein the organized upstream data associates a plurality of data points with a corresponding user. 11. The method of claim 1 further comprising modifying a first corresponding threshold of a first rule independently from modifying a second corresponding threshold of a second rule. 12. A computer-implemented downstream execution method, comprising: receiving source data from each of a plurality of sources each source selected from one of a relational database, a non-relational database, or a file system; identifying a downstream task to be executed, the downstream task being associated with at least a portion of an overall population of users; applying a plurality of rules to each of the source data from the plurality of sources, each of said plurality of rules being either inclusive or exclusive and selected from a pool of available rules or generated based on the downstream task, each or a subset of the plurality of rules being applied at each of the source data from the plurality of sources; generating a plurality of outputs including at least one output for each of the plurality of rules applied to each of the source data; determining that at least one of the plurality of outputs from a first source of the plurality of sources does not meet a corresponding rule threshold; flagging the first source based on the at least one of the plurality outputs not meeting the corresponding rule threshold; identifying a plurality of usable sources from the plurality of sources, the usable sources excluding the first source; identifying a tagged population of users based on the plurality of outputs associated with the usable sources, the tagged population of users being a subset of the overall population of users; and activating an execution of the downstream task for the tagged population of users. 13. The method of claim 12 , further comprising: identifying a last known valid source; and including the last known valid source in the plurality of usable sources. 14. The method of claim 13 , wherein the last known valid source is a previous version of the first source. 15. The method of claim 14 , wherein the last known valid source previously met the corresponding rule threshold. 16. The method of claim 12 , wherein more than one of the plurality of sources comprise data about a same user. 17. The method of claim 12 , wherein the corresponding rule threshold is generated by a machine learning model. 18. The method of claim 12 , further comprising organizing the source data based at least on a type of at least a subset of the source data. 19. The method of claim 18 , wherein the organized source data associates a plurality of data points with a corresponding user. 20. A system comprising: a data storage device storing processor-readable instructions; and a processor operatively connected to the data storage device and configured to execute the instructions to perform operations that include: receiving source data from each of a plurality of sources each source selected from one of a relational database, a non-relational database, or a file system; identifying a downstream task to be executed, the downstream task being associated with at least a portion of an overall population of users; applying a plurality of rules to each of the source data from the plurality of sources, each of said plurality of rules being either inclusive or exclusive and selected from a pool of available rules or generated based on the downstream task, each or a subset of the plurality of rules being applied at each of the source data from the plurality of sources; generating a plurality of outputs including at least one output for each of the plurality of rules applied to each of the source data; determining that at least one of the plurality of outputs from a first source of the plurality of sources does not meet a corresponding rule threshold; flagging the first source based on the at least one of the plurality outputs not meeting the corresponding rule threshold; identifying a plurality of usable sources from the plurality of sources, the usable sources excluding the first source; identifying a tagged population of users based on the plurality of outputs associated with the usable sources, the tagged population of users being a subset of the overall population of users; and activating an execution of the downstream task for the tagged population of users.

Assignees

Inventors

Classifications

  • G06F16/215Primary

    Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors · CPC title

  • where the reporting involves the use of self describing data formats, i.e. metadata, markup languages, human readable formats · CPC title

  • Performance evaluation by statistical analysis · CPC title

  • Data format conversion from or to a database · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11816079B2 cover?
Computer-implemented methods and systems include downstream execution for individual rule-based flagging of upstream data quality errors by receiving upstream data from a plurality of sources, identifying a downstream task to be executed, applying a plurality of rules to the upstream data, generating a plurality of outputs including at least one output for each of the plurality of rules applied…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/215. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 14 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).