Communication and message-efficient protocol for computing the intersection between different sets of data

US2016173660A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016173660-A1
Application numberUS-201615064187-A
CountryUS
Kind codeA1
Filing dateMar 8, 2016
Priority dateDec 16, 2013
Publication dateJun 16, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, and calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than one. A protocol is then generated between the two servers for performing an intersection operation using the highest number of sequences calculated.

First claim

Opening claim text (preview).

What is claimed: 1 . A method comprising: analyzing a plurality of data items in a relational database, wherein different portions of the data items are stored in a plurality of servers; determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers; calculating a logarithm function based on the maximum size of the subset of the data items in each of the two servers, wherein the logarithmic function includes an upper bound and a lower bound in its functional calculation; calculating a highest number of sequences of communications between the two servers such that when the logarithmic function is iteratively applied, a value of the logarithmic function remains smaller than 1, wherein the highest number of sequences is defined as O(k) bits where k is the maximum size of the subset of the data items in each of the two servers, wherein the maximum size of the subset of the data items is defined as n and the logarithmic function is defined as O(k log n) bits of communication, wherein the value of n is alterable when at least one of a new data item is added and when an existing data item is removed from the relational database such that O(k log n) is further calculated by the logarithmic function O(log*k) for the two servers, wherein the two servers deterministically exchange their inputs using only O(k log(n/k)) bits of communication, wherein the two servers hash their data items in elements of O(log k)-bit strings, and wherein the two servers exchange the hashed values in order to determine which elements are in the intersection of their data items; and generating a protocol between the two servers for performing an intersection operation using the highest number of sequences calculated, wherein the intersection operation determines common data items stored in the two servers through the communications between the two servers such that the number of times the communications are exchanged does not exceed the highest number of sequences calculated, and wherein the intersection operation is used to compute a number of distinct elements in a union of the data items between the two servers.

Assignees

Inventors

Classifications

  • for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS] · CPC title

  • H04L69/24Primary

    Negotiation of communication capabilities · CPC title

  • H04L69/166Primary

    IP fragmentation; TCP segmentation · CPC title

  • of multimedia data, e.g. slideshows comprising image and additional audio data (retrieval of still image data G06F16/50; retrieval of audio data G06F16/60; retrieval of video data G06F16/70) · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016173660A1 cover?
Embodiments relate to data processing. A method includes analyzing a plurality of data items in a relational database, where different portions of the data items are stored in a plurality of servers. The method also includes determining a maximum size of a subset of the data items stored in each of at least two servers among the plurality of servers, calculating a logarithm function based on th…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04L69/24. Mapped technology areas include Electricity.
When was this patent published?
Publication date Thu Jun 16 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).