Verifying data consistency
US-9542406-B1 · Jan 10, 2017 · US
US10956403B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10956403-B2 |
| Application number | US-201816192684-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 15, 2018 |
| Priority date | Feb 1, 2016 |
| Publication date | Mar 23, 2021 |
| Grant date | Mar 23, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures is provided. The method includes loading data from an update-in-place data structure to a first set of hash buckets in a processing platform, loading data from append-only data structures to a second set of hash buckets in the processing platform, performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of has buckets, and generating a report based on the bucket-level comparison.
Opening claim text (preview).
What is claimed is: 1. A method for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures, the method comprising: performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of hash buckets; generating an intermediate report based on the bucket-level comparison, wherein generating the intermediate report based on the bucket-level comparison comprises: determining an update occurred to the first update-in-place data structure during the bucket-level comparison; identifying transient differences between the first update-in-place data structure and the append-only data structures, wherein the transient differences comprise differences caused by either in-flight transactions, by rollback transactions, or by in-flight transactions and by rollback transactions committed at the first update-in-place data structure after loading the data from the first update-in-place data structure to the first set of hash buckets in the processing platform; and removing the transient differences from the intermediate report listing differences between the first update-in-place data structure and the append-only data structures; and generating a final report based on the intermediate report and removal of the identified transient differences, wherein the final report comprises persistent differences between the first update-in-place data structure and the append-only data structures and omits the identified transient differences removed from the intermediate report, wherein the final report is generated for live comparison of the first update-in-place data structure and the append-only data structures, and wherein the differences are inserted into a second update-in-place data structure that is associated with the first update-in-place data structure. 2. The method according to claim 1 , wherein the data from the append-only data structures comprises a second set of key values that corresponds to rows of data in the append-only data structures, and wherein loading the data from the append-only data structures to the second set of hash buckets is based on a second set of hash values associated with the second set of key values. 3. The method according to claim 2 , wherein the first set of hash values and the second set of hash values are determined based on a common hash function. 4. The method according to claim 3 , wherein the bucket-level comparison is performed on buckets from the first set of hash buckets sharing common hash values with buckets from the second set of hash buckets. 5. The method according to claim 4 , wherein the data from the update-in-place data structure further comprises a first set of checksum values that corresponds to rows of data in the update-in-place data structure, and wherein the data from append-only data structures further comprises a second set of checksum values that corresponds to rows of data in the append-only data structures. 6. The method according to claim 5 , wherein the report based on the bucket-level comparison comprises: a third set of key values corresponding to differences between the update-in-place data structure and the append-only data structures; a set of difference types corresponding to the third set of key values; checksum values from the first set of checksum values corresponding to the third set of key values; and checksum values from the second set of checksum values corresponding to the third set of key values. 7. The method according to claim 4 , further comprising: generating a first set of checksum values that corresponds to rows of data in the update-in-place data structure; and generating a second set of checksum values that corresponds to rows of data in the append-only data structures, wherein generating the first set of checksum values and generating the second set of checksum values are based on a common function. 8. The method according to claim 1 , wherein generating the report based on the bucket-level comparison comprises: determining an update occurred to the update-in-place data structure during the bucket-level comparison. 9. The method according to claim 8 , wherein generating the report based on the bucket-level comparison comprises: identifying transient differences between the update-in-place data structure and the append-only data structures, wherein the transient differences are differences caused by in-flight transactions committed at the update-in-place data structure after loading the data from the update-in-place data structure to the first set of hash buckets in the processing platform; and removing the transient differences from an intermediate report listing differences between the update-in-place data structure and the append-only data structures. 10. The method according to claim 8 , wherein generating the report based on the bucket-level comparison comprises: identifying transient differences between the update-in-place data structure and the append-only data structures, wherein the transient differences are differences caused by rollback transactions committed at the update-in-place data structure after loading the data from the update-in-place data structure to the first set of hash buckets in the processing platform; and removing the transient differences from an intermediate report listing differences between the update-in-place data structure and the append-only data structures. 11. The method according to claim 1 , wherein loading the data from the update-in-place data structure to the first set of hash buckets and loading the data from the append-only data structures to the second set of hash buckets is performed in parallel. 12. The method according to claim 1 , wherein the bucket-level comparison is performed in parallel. 13. The method according to claim 1 , wherein the update-in-place data structure is a relational database management systems (RDBMS). 14. A computer program product for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures, the computer program product comprising at least one computer readable non-transitory storage medium having computer readable program instructions thereon for execution by a processor, the computer readable program instructions comprising program instructions for: performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of hash buckets; generating an intermediate report based on the bucket-level comparison, wherein generating the intermediate report based on the bucket-level comparison comprises: determining an update occurred to the first update-in-place data structure during the bucket-level comparison; identifying transient differences between the first update-in-place data structure and the append-only data structures, wherein the transient differences comprise differences caused by either in-flight transactions, by rollback transactions, or by in-flight transactions and by rollback transactions committed at the first update-in-place data structure after loading the data from the first update-in-place data structure to the first set of hash buckets in the processing platform; and removing the transient differences from the intermediate report listing differences between the first update-in-place data structure and the append-only data structures; and generating a final report based on the intermediate report and removal of the identif
Updating · CPC title
Append-only file systems, e.g. using logs or journals to store data · CPC title
Optimistic concurrency control · CPC title
Hash tables · CPC title
Change logging, detection, and notification (replication G06F16/27) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.