Optimal operator placement for distributed query processing

US9892163B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9892163-B2
Application numberUS-201414450114-A
CountryUS
Kind codeB2
Filing dateAug 1, 2014
Priority dateNov 26, 2013
Publication dateFeb 13, 2018
Grant dateFeb 13, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Total global minimum costs can be determined for multiple sub-plans for completing a multi-operation database process to be performed in a distributed database management system that includes a plurality of nodes. The multiple sub-plans can include different distributions of node locations of a plurality of operators among the plurality of nodes. An optimal plan having a lowest total minimum global cost can be selected from the multiple sub-plans.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising a non-transitory machine-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: generating, for a multi-operation database process to be performed in a distributed database management system comprising a plurality of computing nodes comprising respective data processors and having respective node locations, a plurality of sub-plans, each of the plurality of sub-plans comprising a different distribution of node locations of a plurality of operators among the plurality of computing nodes, the plurality of operators being those necessary to complete the multi-operation database process; calculating a total minimum global cost for each sub-plan of the plurality of sub-plans; and selecting an optimal plan from the plurality of sub-plans, the optimal plan having a lowest total minimum global cost, the selecting including pruning at least one sub-plan from the plurality of sub-plans by at least eliminating at least one sub-plan having a cost greater than a best sub-tree cost for a location of a node location; wherein the plurality of computing nodes includes a first node and a second node, and a first sub-plan of the plurality of sub-plans includes a first join operation and a second join operation, wherein: the first join operation comprises transferring a first number of rows of a first table from the second node to the first node, and joining the first number of rows with a second number of rows of a second table at the first node, the first number of rows being greater than the second number of rows, and the second join operation comprises joining a result of the first join operation with a third number of rows of a third table at the first node, the third number of rows greater than the first number of rows. 2. The computer program product as in claim 1 , the pruning further comprising: tracking, for each operator of the plurality of operators, a best sub-tree cost for each node location, one or more child locations corresponding to the best sub-tree costs, and one or more child plans corresponding to the best sub-tree costs. 3. The computer program product as in claim 1 , the pruning further comprising eliminating at least one sub-plan having a cost greater than a best sub-plan cost for a second sub-plan that involves action at another node location plus a transfer cost for transferring one or more tables between node locations as part of the second sub-plan. 4. The computer program product as in claim 1 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of the multi-operation database process. 5. The computer program product as in claim 1 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of assuming placement of a physical operator on either of at least two nodes generates a same amount of output. 6. The computer program product as in claim 1 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises assuming that each computing node of the plurality of computing nodes has a same and symmetric network configuration. 7. A system comprising: computer hardware configured to perform operations comprising: generating, for a multi-operation database process to be performed in a distributed database management system comprising a plurality of computing nodes comprising respective data processors and having respective node locations, a plurality of sub-plans, each of the plurality of sub-plans comprising a different distribution of node locations of a plurality of operators among the plurality of computing nodes, the plurality of operators being those necessary to complete the multi-operation database process; calculating a total minimum global cost for each sub-plan of the plurality of sub-plans; and selecting an optimal plan from the plurality of sub-plans, the optimal plan having a lowest total minimum global cost, the selecting including pruning at least one sub-plan from the plurality of sub-plans by at least eliminating at least one sub-plan having a cost greater than a best sub-tree cost for a location of a node location; wherein the plurality of computing nodes includes a first node and a second node, and a first sub-plan of the plurality of sub-plans includes a first join operation and a second join operation, wherein: the first join operation comprises transferring a first number of rows of a first table from the second node to the first node, and joining the first number of rows with a second number of rows of a second table at the first node, the first number of rows being greater than the second number of rows, and the second join operation comprises joining a result of the first join operation with a third number of rows of a third table at the first node, the third number of rows greater than the first number of rows. 8. The system as in claim 7 , the pruning further comprising: tracking, for each operator of the plurality of operators, a best sub-tree cost for each node location, one or more child locations corresponding to the best sub-tree costs, and one or more child plans corresponding to the best sub-tree costs. 9. The system as in claim 7 , the pruning further comprising: eliminating at least one sub-plan having a cost greater than a best sub-plan cost for a second sub-plan that involves action at another node location plus a transfer cost for transferring one or more tables between node locations as part of the second sub-plan. 10. The system as in claim 7 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of the multi-operation database process. 11. The system as in claim 7 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of assuming placement of a physical operator on either of at least two nodes generates a same amount of output. 12. The system as in claim 7 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises assuming that each node of the plurality of computing nodes has a same and symmetric network configuration. 13. The system as in claim 7 , wherein the computer hardware comprises: a programmable processor; and a machine-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform at least some of the operations. 14. A computer-implemented method comprising: generating, for a multi-operation database process to be performed in a distributed database management system comprising a plurality of computing nodes comprising respective data processors and having respective node locations, a plurality of sub-plans, each of the plurality of sub-plans comprising a different distribution of node locations of a plurality of operators among the plurality of computing nodes, the plurality of operators being those necessary to complete the multi-operation database process; calculating a total minimum global cost for each sub-plan of the plurality of sub-plans; and selecting an optimal plan from t

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9892163B2 cover?
Total global minimum costs can be determined for multiple sub-plans for completing a multi-operation database process to be performed in a distributed database management system that includes a plurality of nodes. The multiple sub-plans can include different distributions of node locations of a plurality of operators among the plurality of nodes. An optimal plan having a lowest total minimum gl…
Who is the assignee on this patent?
Kim Ki Hong, Hwang Sangyong, Wi Sung Heun, and 4 more
What technology area does this patent fall under?
Primary CPC classification G06F16/24545. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).