Difference determination in a database environment

US9529881B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9529881-B2
Application numberUS-201414314507-A
CountryUS
Kind codeB2
Filing dateJun 25, 2014
Priority dateJun 14, 2013
Publication dateDec 27, 2016
Grant dateDec 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed to determine differences between a source table and a target table in a database environment, as being persistent or transient. A first set of differences between the source table and the target table is determined at a first point in time. A second set of differences between the source table and the target table is determined at a second point in time subsequent to the first point in time. At least one of a set of persistent differences and a set of transient differences is determined. The set of persistent differences includes a set intersection of the first and second sets of differences, the set intersection being filtered based on matching non-key values of the differences. The set of transient differences includes a relative complement of the second set of differences in the first set of differences.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method to programmatically determine differences between a source table and a target table in a database environment, as being persistent or transient, and based on sets of differences from different points in time, the computer-implemented method comprising: determining a first set of differences between the source table and the target table at a first point in time, wherein the database environment comprises a replication environment; determining a second set of differences between the source table and the target table at a second point in time subsequent to the first point in time by at least a predefined interval; determining, by operation of one or more computer processors, without suspending application access to the source and target tables, and without any changing any schema of the source and target tables: (i) a set of persistent differences comprising a set intersection of the first and second sets of differences, wherein the set intersection is filtered based on matching non-key values of differences in the set intersection; and (ii) a set of transient differences comprising a relative complement of the second set of differences in the first set of differences; outputting an indication of each difference in the set of persistent differences as being persistent; and outputting an indication of each difference in the set of transient differences as being transient. 2. The computer-implemented method of claim 1 , wherein the predefined interval comprises a replication latency interval, wherein each of the set of persistent differences and the set of transient differences is determined based further on difference types of the first and second sets of differences, wherein the difference types are determined based on comparing non-key values of the first and second sets of differences, wherein the first set of differences is generated via a first comparison operation comparing a set of rows between the source and target tables, wherein the second set of differences is generated via a second comparison operation restricted to comparing a subset of rows between the source and target tables, to which the first set of differences pertains, wherein the subset of rows is smaller than the set of rows. 3. The computer-implemented method of claim 2 , wherein at least one given set selected from set of persistent differences and the set of transient differences is determined based on checksums generated for rows to which the given set of differences pertains, wherein the method further comprises: determining a set of tentative differences by filtering the second set of differences based on non-matching non-key values of the differences, wherein at least one difference in the set of tentative differences is subsequently determined to be a persistent difference or a transient difference. 4. The computer-implemented method of claim 3 , wherein the second comparison operation is further restricted based on a specified block range, wherein the at least one difference is determined to be a persistent difference or a transient difference based on the second set of differences and a third set of differences, wherein the third set of differences is determined between the source table and the target table at a third point in time subsequent to the second point in time by at least the predefined interval. 5. The computer-implemented method of claim 4 , wherein the second comparison operation includes reuse of one or more partitioning queries generated in the first comparison operation, wherein the transient differences comprise false-positive differences resulting from an asynchronous property of the replication environment. 6. The computer-implemented method of claim 5 , wherein each of the first comparison operation and the second comparison operation is performed via coordination among a pool of threads including a plurality of merger threads and a difference reporter thread, wherein the false-positive differences comprise changes at the source table that have yet to be propagated to the target table, due to the asynchronous property of the replication environment. 7. The computer-implemented method of claim 6 , wherein the second comparison operation includes one or more multi-row insert operations, whereby multiple difference determinations between the source and target tables are used to distinguish between persistent differences and transient differences of the source and target tables; wherein the replication environment comprises an active-active replication environment; wherein each of the first comparison operation and the second comparison operation is based at least in part on a composite checksum aggregating two row-based checksums from the source table and the target table, respectively; wherein the pool of threads further includes a main thread, a partitioner thread, and a plurality of worker threads. 8. The computer-implemented method of claim 1 , wherein the transient differences comprise false-positive differences resulting from an asynchronous property of the replication environment. 9. The computer-implemented method of claim 8 , wherein the false-positive differences comprise changes at the source table that have yet to be propagated to the target table, due to the asynchronous property of the replication environment. 10. The computer-implemented method of claim 1 , wherein the replication environment comprises an active-active replication environment. 11. The computer-implemented method of claim 1 , wherein the predefined interval comprises a replication latency interval. 12. The computer-implemented method of claim 1 , wherein each of the set of persistent differences and the set of transient differences is determined based further on difference types of the first and second sets of differences. 13. The computer-implemented method of claim 1 , further comprising determining difference types of the first and second sets of differences, based on comparing non-key values of the first and second sets of differences. 14. The computer-implemented method of claim 1 , wherein the first set of differences is generated via a first comparison operation comparing a set of rows between the source and target tables. 15. The computer-implemented method of claim 1 , wherein the second set of differences is generated via a second comparison operation restricted to comparing a subset of rows between the source and target tables, to which the first set of differences pertains. 16. The computer-implemented method of claim 1 , wherein at least one given set selected from set of persistent differences and the set of transient differences is determined based on checksums generated for rows to which the given set of differences pertains. 17. The computer-implemented method of claim 1 , further comprising: determining a set of tentative differences by filtering the second set of differences based on non-matching non-key values of the differences, wherein at least one difference in the set of tentative differences is subsequently determined to be a persistent difference or a transient difference. 18. The computer-implemented method of claim 1 , wherein at least one difference is determined to be a persistent difference or a transient difference based on the second set of differences and a third set of differences, wherein the third set of differences is determined between the source table and the target table at a third point in time subsequent to the second point in time by at least the predefined interval. 19.

Assignees

Inventors

Classifications

  • G06F16/27Primary

    Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F16/273Primary

    Asynchronous replication or reconciliation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9529881B2 cover?
Techniques are disclosed to determine differences between a source table and a target table in a database environment, as being persistent or transient. A first set of differences between the source table and the target table is determined at a first point in time. A second set of differences between the source table and the target table is determined at a second point in time subsequent to the…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/27. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).