Communication and message-efficient protocol for computing the intersection between different sets of data

US9438705B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9438705-B2
Application numberUS-201314107097-A
CountryUS
Kind codeB2
Filing dateDec 16, 2013
Priority dateDec 16, 2013
Publication dateSep 6, 2016
Grant dateSep 6, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method comprising: analyzing a plurality of data items in a relational database, wherein different portions of the data items are stored in a plurality of servers; determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers; calculating a logarithm function based on the maximum size of the subset of the data items in each of the at least two servers; calculating a highest number of sequences of communications between the at least two servers such that when the logarithmic function is iteratively calculated, a result of the logarithmic function remains smaller than 1; and generating a protocol between the two at least servers for performing an intersection operation using the highest number of sequences calculated, wherein the intersection operation determines common data items stored in the at least two servers through communications between the at least two servers such that the number of times the communications are exchanged does not exceed the highest number of sequences calculated. 2. The method of claim 1 , wherein the intersection operation is used to compute a number of distinct elements in a union of the data items between the at least two servers. 3. The method of claim 1 , wherein the protocol is generated by Jaccard similarity. 4. The method of claim 1 , wherein the highest number of sequences is defined as O(k) bits where k is the maximum size of the subset of the data items in each of the at least two servers. 5. The method of claim 4 , wherein the maximum size of the subset of the data items is defined as n and the logarithmic function is defined as O (k log n) bits of communication. 6. The method of claim 4 , wherein the value of n is alterable when new data items are added and when existing data items are removed from the relational database such that O (k log n) is further calculated by the logarithmic function O(log*k) for the at least two servers. 7. The method of claim 6 , wherein the at least two servers deterministically exchange their inputs using only O(k log(n/k)) bits of communication. 8. The method of claim 7 , wherein the at least two servers hash their data items in elements of O(log k)-bit strings. 9. The method of claim 8 , wherein the at least two servers exchange the hashed values, in order to determine which elements are in the intersection of their data items. 10. The method of claim 9 , wherein the logarithmic function includes an upper bound and a lower bound in its functional calculation. 11. The method of claim 10 , wherein the intersection operation is generated by considering both the upper bound and the lower bound in the logarithmic function. 12. A computer program product comprising a computer readable storage medium having program code embodied therewith, the program code is executable by a processor to: analyze a plurality of data items in a relational database, wherein different portions of the data items are stored in a plurality of servers; determine a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers; calculate a logarithm function based on the maximum size of the subset of the data items in each of the at least two servers; calculate a highest number of sequences of communications between the at least two servers such that when the logarithmic function is iteratively calculated, a result of the logarithmic function remains smaller than 1; and generate a protocol between the at least two servers for performing an intersection operation using the highest number of sequences calculated, wherein the intersection operation determines common data items stored in the at least two servers through the communications between the at least two servers such that the number of times communications are exchanged does not exceed the highest number of sequences calculated. 13. The computer program product of claim 12 , wherein the highest number of sequences is defined as O(k) bits where k is the maximum size of the subset of the data items in each of the at least two servers. 14. The computer program product of claim 13 , wherein the maximum size of the subset of the data items is defined as n and the logarithmic function is defined as O (k log n) bits of communication. 15. The computer program product of claim 13 , wherein the value of n is alterable when new data items are added and when existing data items are removed from the relational database such that O (k log n) is further calculated by the logarithmic function O (log* k) for the at least two servers. 16. The computer program product of claim 15 , wherein the at least two servers deterministically exchange their inputs using only O(k log(n/k)) bits of communication. 17. The computer program product of claim 16 , wherein the at least two servers hash their data items in elements of O(logk)-bit strings. 18. The computer program product of claim 17 , wherein the at least two servers exchange the hashed values, in order to determine which elements are in the intersection of their data items. 19. The computer program product of claim 18 , wherein the logarithmic function includes an upper bound and a lower bound in its functional calculation and the intersection operation is generated by considering both the upper bound and the lower bound in the logarithmic function. 20. A system comprising: a memory having computer readable computer instructions; and a processor for executing the computer readable instructions, the instructions including: determining a maximum size of a subset of data items stored in each of at least two servers among a plurality of servers, the plurality of servers communicate with one another; calculating a logarithm function based on the maximum size of the subset of the data items in each of the at least two servers; calculating a highest number of sequences of communications between the at least two servers such that when the logarithmic function is iteratively calculated, a value result of the logarithmic function remains smaller than 1; and generating a protocol between the two servers for performing an intersection operation using the highest number of sequences calculated, wherein the intersection operation determines common data items stored in the at least two servers through communications between the at least two servers such that the number of times the communications are exchanged does not exceed the highest number of sequences calculated.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • H04L69/166Primary

    IP fragmentation; TCP segmentation · CPC title

  • Physics · mapped topic

  • H04L69/26Primary

    Special purpose or proprietary protocols or architectures (network applications for proprietary or special purpose networking environments H04L67/12) · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9438705B2 cover?
Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on th…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04L69/166. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 06 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).