Data quality analysis
US-2016364434-A1 · Dec 15, 2016 · US
US10657120B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10657120-B2 |
| Application number | US-201615283846-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 3, 2016 |
| Priority date | Oct 3, 2016 |
| Publication date | May 19, 2020 |
| Grant date | May 19, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system includes an interface and one or more processors. The interface receives a dataset comprising a plurality of variable length input records, each input record comprising a plurality of fields. The one or more processors compare the input record to a plurality of predetermined record types to determine whether the input record matches one or more of the predetermined record types. Upon a determination that the input record matches one or more of the predetermined record types, the one or more processors determine one or more rules applicable to the input record. The one or more rules are determined based on the predetermined record types that match the input record. The one or more processors apply the one or more rules applicable the input record. The one or more rules determine the quality of the input record based on a structure of one or more of the fields of the input record and/or a value of one or more of the fields of the input record. The one or more processors determine a data quality of the dataset, the data quality determined based at least in part on the result of applying the one or more rules applicable to the input record.
Opening claim text (preview).
The invention claimed is: 1. A system, comprising: an interface and one or more processors, the interface configured to: receive, from an upstream progression point configured to receive, process, and transmit data, a dataset comprising a plurality of variable length input records, each input record comprising a plurality of fields; for each input record, the one or more processors configured to: compare the input record to a plurality of predetermined record types to determine whether the input record matches one or more of the predetermined record types; upon a determination that the input record matches one or more of the predetermined record types, determine one or more rules applicable to the input record, the one or more rules determined based on the predetermined record types that match the input record; and determine whether the input record comprises a data quality error by applying the one or more rules applicable to the input record, wherein the one or more rules determine whether the input record comprises the data quality error based on one or both of a structure of one or more of the fields of the input record and a value of one or more of the fields of the input record; and the one or more processors further configured to: determine a data quality of the dataset associated with whether the input records comprise greater than a threshold number of data quality errors, the data quality determined based at least in part on the result of determining, for each input record, whether the input record comprises the data quality error; determine that the data quality of the received dataset is below a predetermined data quality; in response to determining that the data quality is below the predetermined data quality, communicate, to a data hub, the determined data quality and a return code indicating that the data quality is below the predetermined data quality; following communicating the data quality and the return code to the data hub, receive, from the data hub, a request to reprocess the dataset before the dataset is provided to a downstream progression point, the request based on a comparison of the communicated data quality and a data quality determined at the upstream progression point; determine a file size associate with the received dataset; determine, based on the determined file size, an updated rule indicating that a subsequently received dataset should have a larger file size than the determined file size; and store the updated rule. 2. The system of claim 1 , wherein the one or more rules that determine the quality of the input record based on a structure of one or more of the fields of the input record comprise at least one of: a rule that determines whether one or more of the plurality of fields is in a particular format; and a rule that determines whether the dataset comprises a number of input records that falls within a predetermined numerical range. 3. The system of claim 1 , wherein the one or more rules that determine the quality of the input record based on a value of one or more of the fields of the input record comprise at least one of: a rule that determines whether the value of one or more of the plurality of fields is within a certain numerical range; and a rule that determines whether one or more of the plurality of fields comprises a null value. 4. The system of claim 1 , wherein the one or more rules comprise a subset of rules selected from a group of rules available to a data quality utility, the subset of rules determined from a control file configured to include or exclude rules based on constraints of the system. 5. The system of claim 1 , wherein at least one of the rules determines whether the structure or the value of one or more of the fields of the input record falls within a predetermined tolerance, the predetermined tolerance determined based on applying the at least one rule to at least one previous dataset before applying the at least one rule to the current dataset. 6. The system of claim 1 , wherein the system is further configured to communicate the dataset and information indicating the data quality of the dataset to a data hub. 7. A non-transitory computer readable medium comprising logic that, when executed by an image routing processor, is operable to: receive, from an upstream progression point configured to receive, process, and transmit data, a dataset comprising a plurality of variable length input records, each input record comprising a plurality of fields; for each input record, compare the input record to a plurality of predetermined record types to determine whether the input record matches one or more of the predetermined record types; for each input record, upon a determination that the input record matches one or more of the predetermined record types, determine one or more rules applicable to the input record, the one or more rules determined based on the predetermined record types that match the input record; for each input record, determine whether the input record comprises a data quality error by applying the one or more rules applicable to the input record, wherein the one or more rules determine whether the input record comprises the data quality error based on one or both of a structure of one or more of the fields of the input record and a value of one or more of the fields of the input record; and determine a data quality of the dataset associated with whether the input records comprise greater than a threshold number of data quality errors, the data quality determined based at least in part on the result of determining, for each input record, whether the input record comprises the data quality error; determine that the data quality of the received dataset is below a predetermined data quality; in response to determining that the data quality is below the predetermined data quality, communicate, to a data hub, the determined data quality and a return code indicating that the data quality is below the predetermined data quality; following communicating the data quality and the return code to the data hub, receive, from the data hub, a request to reprocess the dataset before the dataset is provided to a downstream progression point, the request based on a comparison of the communicated data quality and a data quality determined at the upstream progression point; determine a file size associated with the received dataset; determine, based on the determined file size, an updated rule indicating that a subsequently received dataset should have a larger file size than the determined file size; and store the updated rule. 8. The computer readable medium of claim 7 , wherein the one or more rules that determine the quality of the input record based on a structure of one or more of the fields of the input record comprise at least one of: a rule that determines whether one or more of the plurality of fields is in a particular format; and a rule that determines whether the dataset comprises a number of input records that falls within a predetermined numerical range. 9. The computer readable medium of claim 7 , wherein the one or more rules that determine the quality of the input record based on a value of one or more of the fields of the input record comprise at least one of: a rule that determines whether the value of one or more of the plurality of fields is within a certain numerical range; and a rule that determines whether one or more of the plurality of fields comprises a null value. 10. The computer readable medium of claim 7 , wherein the one or more rules comprise a subset of rules selected from a group of rules available to a data quality utility, the subset of rules determined from a control file c
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
Ensuring data consistency and integrity · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.