Dataset reconciliation through partitioning and polynomial interpolation

US10528595B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10528595-B2
Application numberUS-201715486746-A
CountryUS
Kind codeB2
Filing dateApr 13, 2017
Priority dateApr 13, 2017
Publication dateJan 7, 2020
Grant dateJan 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for synchronizing datasets comprising the steps of: (1) partitioning each dataset into a plurality of bins according to a first partitioning rule, wherein each bin contains a random subset of elements of symmetric difference taken from a universe of all possible elements, (2) performing a first round of polynomial interpolation (PI) at a first encoding threshold on each bin of the first-partitioned datasets, wherein if any bin contains a number of elements that is less than or equal to the first encoding threshold the elements contained therein are decoded during the first PI round, and wherein if any bin contains a number of elements that is greater than the first encoding threshold the elements contained therein are not decoded during the first PI round; and (3) synchronizing the datasets based on the decoded elements.

First claim

Opening claim text (preview).

We claim: 1. A method for synchronizing first and second datasets respectively stored in first and second computers, the method comprising the following steps: partitioning each of the first and second datasets into a plurality of bins according to a first partitioning rule, wherein each bin contains a random subset of elements of symmetric difference (hereafter referred to as elements) taken from a universe of all possible elements that could be contained in the first and a second datasets; performing a first round of polynomial interpolation (PI) at a first encoding threshold on each bin of the first-partitioned datasets with the first and second computers, wherein if any bin contains a number of elements that is less than or equal to the first encoding threshold the elements contained therein are decoded during the first PI round, and wherein if any bin contains a number of elements that is greater than the first encoding threshold the elements contained therein are not decoded during the first PI round; and synchronizing the first and second datasets based on the decoded elements. 2. The method of claim 1 , further comprising the following steps which are performed prior to the synchronizing step: partitioning the first and a second datasets into a plurality of bins according to a second partitioning rule, wherein each bin contains a random subset of elements taken from the universe of all possible elements that could be contained in the first and a second datasets; and performing a second round of PI at a second encoding threshold on each bin of the second-partitioned datasets with the first and second computers, wherein if any bin contains a number of elements that is less than or equal to the second encoding threshold the elements contained therein are decoded during the second PI round, and wherein if any bin contains a number of elements that is greater than the second encoding threshold the elements contained therein are not decoded during the second PI round. 3. The method of claim 2 , further comprising the step of removing encoded elements that were decoded under the first PI round from the second partition bins prior to the decoding performed during the second PI round. 4. The method of claim 3 , wherein the second encoding threshold is less than the first encoding threshold. 5. The method of claim 4 , further comprising the step of transmitting in the same transmission between the first and second computers the encoded bins from the first and second PI rounds such that the dataset synchronization is performed in a single communication exchange between the first and second computers. 6. The method of claim 5 , wherein the transmission is performed in a low-bandwidth environment. 7. The method of claim 5 , wherein the size of the symmetric difference is unknown before the first PI round. 8. The method of claim 1 , wherein each bin contains a random subset of elements according to a binomial distribution. 9. The method of claim 4 , further comprising the following step which is performed prior to the synchronizing step: continuing to partition and perform rounds of PI until all elements have been decoded. 10. The method of claim 5 , wherein the datasets are highly structured data that form a common operation picture, which is composed of discrete data elements known as tracks such that after the single communication exchange the tracks are synchronized. 11. A method for synchronizing first and second datasets S A and S B respectively stored in first and second computers, wherein the first and second datasets S A and S B are sets of binary data strings of length b, the method comprising the following steps: partitioning each of the first and second datasets S A and S B into =d/log(d) bins according to a first partitioning rule, wherein each bin contains n/ strings, where d is the absolute value of symmetric difference between the first and second datasets S A and S B , and n is an index, and wherein each bin contains a random subset of elements of the symmetric difference (hereafter referred to as elements) taken from a universe 2 b of all possible elements that could be contained in the first and a second datasets S A and S B ; performing a first round of polynomial interpolation (PI) at a first encoding threshold M on each bin of the first-partitioned datasets with the first and second computers by computing characteristic functions, one for each bin, and evaluating each function at the threshold of M points; computing a hash consisting of H bits for each bin; partitioning each of the first and second datasets S A and S B into =d/log(d) bins according to a second partitioning rule, wherein each bin contains n/ strings; performing a second round of PI at a second encoding threshold fM on each bin of the second-partitioned datasets with the first and second computers by computing characteristic functions, one for each bin, and evaluating each function at the threshold of fM points, where 0<f<1; computing a hash consisting of fH bits for each bin; and iteratively repeating the above steps with different partitioning rules such that the encoding threshold and the length of the hashes are reduced by a factor of f with each iteration until the number encoding threshold is less than or equal to log(log(d)). 12. The method of claim 11 , further comprising the step of transmitting results from all rounds of PI and all hashes performed by the first computer in a single transmission to the second computer and vice-versa. 13. The method of claim 12 , wherein the transmissions are performed in a low-bandwidth environment. 14. The method of claim 12 , wherein d is unknown before the first PI round. 15. The method of claim 14 , wherein the datasets are highly structured data that form a common operation picture, which is composed of discrete data elements known as tracks such that after the single transmission from each of the first and second computers the tracks are synchronized. 16. A method for synchronizing first and second datasets respectively stored in first and second computers, the method comprising the following steps: performing first partitions of the first and second datasets by the first and second computers respectively such that each dataset is partitioned into a plurality of bins according to a first partitioning rule, wherein each bin contains a random subset of elements of symmetric difference (hereafter referred to as elements); performing a first round of polynomial interpolation (PI) at a first encoding threshold on each bin of the first-partitioned datasets with the first and second computers; recording encodings from the first round of PI at the first and second computers; performing second partitions of the first and second datasets by the first and second computers respectively such that each dataset is partitioned into a plurality of bins according to a second partitioning rule, wherein each bin contains a random subset of elements; performing a second round of PI at a second encoding threshold on each bin of the second-partitioned datasets with the first and second computers; recording encodings from the second round of PI at the first and second computers; and communicating the encodings from the first and second rounds of PI at the first and second computers in a single communication exchange between the first and second computers, wherein for any given round of PI if any bin contains a number of elements that is less than or equal to the corresponding encoding threshold, the elements contained therein are decoded by the receiving co

Assignees

Inventors

Classifications

  • Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • G06F16/278Primary

    Data partitioning, e.g. horizontal or vertical partitioning · CPC title

  • Synchronous replication · CPC title

  • Asynchronous replication or reconciliation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10528595B2 cover?
A method for synchronizing datasets comprising the steps of: (1) partitioning each dataset into a plurality of bins according to a first partitioning rule, wherein each bin contains a random subset of elements of symmetric difference taken from a universe of all possible elements, (2) performing a first round of polynomial interpolation (PI) at a first encoding threshold on each bin of the firs…
Who is the assignee on this patent?
Spawar Systems Ct Pacific, Us Navy
What technology area does this patent fall under?
Primary CPC classification G06F16/278. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).