Partitioned join with dense inner table representation

US9953057B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9953057-B2
Application numberUS-201514743225-A
CountryUS
Kind codeB2
Filing dateJun 18, 2015
Priority dateJun 18, 2015
Publication dateApr 24, 2018
Grant dateApr 24, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

To perform a join operation on database objects, data structures contained in a first database object are distributed across database partitions in accordance with a partitioning scheme. Data structures of the first database object are associated with respective indices computed complementarily to the partitioning scheme. Other indices are computed from the respective data structures of a second database object. The join operation is performed at each of the database partitions on the data structures in the respective first and second database objects having the indices and the other indices in common.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus to perform a join operation on a plurality of database objects comprising: a plurality of processing nodes, each comprising a processor and a data storage unit, the processing nodes being communicatively coupled one to another and configured to: distribute data structures contained in a first database object across a plurality of database partitions in accordance with a partitioning scheme, the database partitions being stored in respective data storage units of the processing nodes, wherein each of the plurality of database partitions uniquely corresponds to a partition processing module configured to perform one or more operations on the corresponding database partition in parallel with one or more other partition processing modules to complete a common task on the first database object; associate the data structures of the first database object with indices computed complementary to the partitioning scheme; compute other indices from the data structures contained in a second database object; and perform a join operation at each of the database partitions on the data structures in the respective first and second database objects having the indices and the other indices in common. 2. The apparatus of claim 1 , wherein the processing nodes are further configured to: perform, as the partitioning scheme, a hash function on key values identifying data in the respective first and second database objects on which the join operation is predicated; distribute the first and second database objects across the database partitions in accordance with the hashed key values. 3. The apparatus of claim 2 , wherein the processing nodes are further configured to: determine whether data values in the data structure of the first database object meet a density criterion. 4. The apparatus of claim 3 , wherein the processing nodes are further configured to: construct a memory array that associates each of the data structures of the first database object with the respective indices in response to the data values meeting the density criterion. 5. The apparatus of claim 3 , wherein the processing nodes are further configured to: construct a hash table that associates the hashed key values with the data structures in response to the data values failing to meet the density criterion. 6. The apparatus of claim 1 , wherein the processing nodes are further configured to: distribute the data structures in the first and second database objects in accordance with a modulus operation K MOD N, where K is the key value and N is the number of partitions over which the data structures are distributed; and compute the index and the other index in accordance with an integer division operation K/N. 7. The apparatus of claim 1 , wherein the processing nodes are further configured to: distribute the data structures in the first and second database objects in accordance with a modulus operation (K MOD P) MOD N, where K is the key value, N is the number of partitions over which the data structures are distributed and P is a predetermined prime number; and compute the index and the other index in accordance with an integer division operation ((K/P)*S)+((K MOD P)/N), where S is a scale factor. 8. The apparatus of claim 7 , wherein the processing nodes are further configured to: determine the scale factor, S, from one of: S≥ 1+(( P− 1)/ N )); and S≥ 1+( P/N ); wherein P is the prime number, and N is the quantity of partitions. 9. The computer readable medium of claim 1 , having other processing instructions thereon that, when executed by the processors, cause the processors to: distribute the data structures in the first and second database objects in accordance with a modulus operation (K MOD P) MOD N, where K is the key value, N is the number of partitions over which the data structures are distributed and P is a predetermined prime number; and compute the index and the other index in accordance with an integer division operation ((K/P)*S)+((K MOD P)/N), where S is a scale factor determined from one of: S≥ 1+(( P− 1)/ N )); and S≥ 1+( P/N ). 10. A tangible, non-transient computer readable medium having encoded thereon processing instructions that, when executed by one or more processors, cause the processors to: distribute data structures contained in a first database object across a plurality of database partitions in accordance with a partitioning scheme, wherein each of the plurality of database partitions uniquely corresponds to a partition processing module configured to perform one or more operations on the corresponding database partition in parallel with one or more other partition processing modules to complete a common task on the first database object; associate data structures of the first database object with indices computed complementary to the partitioning scheme; compute other indices from the respective data structures contained in a second database object; and perform a join operation at each of the database partitions on the data structures in the respective first and second database objects having the indices and the other indices in common. 11. The computer readable medium of claim 10 , having other processing instructions thereon that, when executed by the processors, cause the processors to: perform, as the partitioning scheme, a hash function on key values identifying data in the respective first and second database objects on which the join operation is predicated; distribute the first and second database objects across the database partitions in accordance with the hashed key values. 12. The computer readable medium of claim 10 , having other processing instructions thereon that, when executed by the processors, cause the processors to: distribute the data structures in the first and second database objects in accordance with a modulus operation K MOD N, where K is the key value and N is the number of partitions over which the data structures are distributed; and compute the index and the other index in accordance with an integer division operation K/N.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9953057B2 cover?
To perform a join operation on database objects, data structures contained in a first database object are distributed across database partitions in accordance with a partitioning scheme. Data structures of the first database object are associated with respective indices computed complementarily to the partitioning scheme. Other indices are computed from the respective data structures of a secon…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2456. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 24 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).