Database table comparison

US9600513B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9600513-B2
Application numberUS-201113156629-A
CountryUS
Kind codeB2
Filing dateJun 9, 2011
Priority dateJun 9, 2011
Publication dateMar 21, 2017
Grant dateMar 21, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed for comparing database tables. In one embodiment, the database tables are partitioned. Queries are generated for retrieving each partition. For each generated query, a stored procedure is invoked, using the respective generated query as an input parameter to the stored procedure. The stored procedure is configured to generate a checksum based on the partition retrieved from executing the respective query. The application compares the generated checksums to determine if the partitions and/or tables are consistent.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method to reduce an amount of network traffic incurred in comparing database tables replicated across a network, the database tables including a first database table and a second database table, the computer-implemented method comprising: identifying at least one key range, each key range of which pertains to a first target partition of a respective one of the first database table and the second database table; generating, based on the identified at least one key range and for each the first and second database tables, a respective database query configured to retrieve the first target partition from the respective database table; invoking, for each database table, a respective stored procedure stored in the respective database table's database, using the respective database query generated for the respective database table as an input parameter to the respective stored procedure, wherein the respective stored procedure is configured to compute a respective first target partition checksum based on locally accessing the first target partition in the respective database table, wherein the stored procedure for the first database table and the stored procedure for the second database table are remotely invoked; and programmatically determining, by operation of one or more computer processors, whether the first target partition is consistent across the first and second database tables, based on a comparison between the first target partition checksums computed from the first and second database tables, respectively, and without requiring any data in the first target partition to be transmitted across the network. 2. The computer-implemented method of claim 1 , wherein the first database table is stored in a first database, wherein the second database table is stored in a second database operatively connected to the first database via the network; wherein whether the first target partition is consistent across the first and second database tables is determined without requiring any data in the first target partition in the first database table to be transmitted across the network and without requiring any data in the first target partition in the second database table to be transmitted across the network. 3. The computer-implemented method of claim 2 , wherein the database tables are replicated across the network, wherein each of the first database table and the second database table comprises a plurality of partitions; wherein the identifying, generating, invoking, and determining steps are repeated for each partition in the plurality of partitions and using a distinct key range specific to the respective partition, in order to determine whether data stored in the respective partition is consistently across the first and second database tables, thereby determining whether the database tables are consistently replicated across the network. 4. The computer-implemented method of claim 3 , wherein each of the stored procedure for the first database table and the stored procedure for the second database table is remotely invoked by an application configured to determine whether the first target partition is consistent across the first and second database tables, wherein the application is operatively connected to a respective database management system (DBMS) configured to manage each database, wherein the application is configured to: determine that the first target partition is consistent across the first and second database tables upon determining that the first target partition checksum from the first database table matches the first target partition checksum from the second database table; and determine that the first target partition is not consistent across the first and second database tables upon determining that the first target partition checksum from the first database table does not match the first target partition checksum from the second database table. 5. The computer-implemented method of claim 4 , wherein the amount of network traffic incurred in comparing the database tables replicated across the network is reduced by virtue of checksum generation logic being implemented in the stored procedures rather than in the application, wherein the plurality of partitions include at least three partitions, wherein the application is configured to: determine that the database tables are consistently replicated across the network upon determining a partition checksum match for each partition between the first and second database tables; and determine that the database tables are not consistently replicated across the network upon determining a partition checksum mismatch for at least one partition between the first and second database tables. 6. The computer-implemented method of claim 5 , wherein the application includes a pool of threads comprising: (i) a main thread, (ii) a partitioner thread; (iii) a reporter thread; (iv) at least two worker threads for each partition; and (v) at least one merger thread for each partition; (vi) a plurality of cleaner threads; and (vii) a plurality of validator threads; wherein the application is configured to determine whether the database tables are consistently replicated via a plurality of stages of processing including a preprocessing stage, a differencing stage, and a cleanup stage; wherein the preprocessing stage includes verifying, by the main thread and the validator threads, whether the database tables satisfy a plurality of preconditions; wherein the differencing stage includes partitioning the database tables and identifying differences therein, the differencing stage comprising a plurality of sub-stages including a partitioning sub-stage, a discovery sub-stage, and a reporting sub-stage; wherein the cleanup stage includes deleting, by the cleaner threads and from each of the database tables, one or more temporary tables associated with the worker threads; wherein the partitioning sub-stage includes generating the plurality of partitions, wherein the discovery sub-stage includes identifying differences between the database tables, wherein the reporting sub-stage includes performing a predefined action based on the identified differences between the database tables. 7. The computer-implemented method of claim 6 , wherein the partitioner thread is configured to, for each partition: determine a respective pair of boundary key values defining a key range of the respective partition, the pair of boundary key values including a minimum key value and a maximum key value; and generate, based on the respective pair of boundary key values, a respective database query to retrieve the respective partition, wherein each database query is stored in a partitioning queue for use in the discovery sub-stage; wherein the at least two worker threads include: (i) a first worker thread configured to invoke the stored procedure for the first database table and (ii) a second worker thread configured to invoke the stored procedure for the second database table; wherein each worker thread receives, from the merger thread for the given partition, the database query generated for the respective database table. 8. The computer-implemented method of claim 7 , wherein multiple target partitions of the first and second database tables are compared in parallel by virtue of the pool of threads included in the application, wherein the respective stored procedure for each database table is configured to, for a given partition in the respective database table: compute a plurality of row checksums including a respective row checksum for each of a plurality of rows in the given partition of the respective database table; and compute a partition checksum for the given partition, based o

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9600513B2 cover?
Techniques are disclosed for comparing database tables. In one embodiment, the database tables are partitioned. Queries are generated for retrieving each partition. For each generated query, a stored procedure is invoked, using the respective generated query as an input parameter to the stored procedure. The stored procedure is configured to generate a checksum based on the partition retrieved …
Who is the assignee on this patent?
Bourbonnais Serge, Kutsch Marcel, Li Xiao, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F16/2365. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 21 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).