Optimizing an order of execution of multiple join operations

US10061804B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10061804-B2
Application numberUS-201715796980-A
CountryUS
Kind codeB2
Filing dateOct 30, 2017
Priority dateDec 4, 2012
Publication dateAug 28, 2018
Grant dateAug 28, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method, system, and/or computer program product optimizes an order of execution of column join operations. A first partitioning of the first data column splits the first data column into first subsets of rows. A second partitioning of the second data column splits the second data column into a second subsets of rows. Cardinalities of sub-tables derived by a respective joining of the subsets of rows of the first and second data columns are estimated, based on the first and second value frequency information. An order of execution of multiple join operations is then optimized based on the estimated cardinalities of the sub-tables.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for optimizing an order of execution of multiple join operations based on at least a first data column and a second data column in a database system having multiple processing units, the method comprising: providing, by one or more processors, at least a first partitioning of the first data column, wherein said at least the first partitioning splits the first data column into a plurality of first subsets of rows, each of the first subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the first subsets of rows are handled by processing units that differ from one another; providing, by one or more processors, at least a second partitioning of the second data column, wherein said at least the second partitioning splits the second data column into a plurality of second subsets of rows, each of the second subsets of rows being correlated with a distinct processing unit from the multiple processing units, wherein each of the second subsets of rows are handled by processing units that differ from one another; estimating, by one or more processors, cardinalities of sub-tables derived by a respective joining of a subset of rows of the first data column and a subset of rows of the second data column which are processed by a same processing unit from the multiple processing units, wherein the cardinalities of the sub-tables describe a quantity of rows in the sub-tables, and wherein the cardinalities of the sub-tables derived by the respective joining of the subset of rows of the first data column and the subset of rows of the second data column are estimated according to: F ⁡ ( a , b ) = ∑ i = 0 n - 1 ⁢ ( max ⁡ ( ∫ x i x i + 1 ⁢ f ⁡ ( x ) ⁢ dx , 0 ) u 1 i * max ⁡ ( ∫ x i x i + 1 ⁢ g ⁡ ( x ) ⁢ dx , 0 ) u 2 i ) * max ⁡ ( u 1 i , u 2 i ) where x 0 =a; x n =b; x i ∈[a,b]; f(x) is a density distribution function of a first data column R; g(x) is a density distribution function of a second data column S; a, b are a starting row and end row of respective columns R and S which are incorporated into a join; U 1 i is a number of unique values of the first data column R in a respective interval i; U 2 i is a number of unique values of the second data column S in the respective interval i; F(a,b) is an estimated number of rows of a resulting joined table T by joining the first and second data columns R, S in an interval [a,b]; max ⁡ ( ∫ x i x i + 1 ⁢ f ⁡

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10061804B2 cover?
A computer-implemented method, system, and/or computer program product optimizes an order of execution of column join operations. A first partitioning of the first data column splits the first data column into first subsets of rows. A second partitioning of the second data column splits the second data column into a second subsets of rows. Cardinalities of sub-tables derived by a respective joi…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/24544. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 28 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).