Database group-by query cardinality estimation
US-2024143586-A1 · May 2, 2024 · US
US9892163B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9892163-B2 |
| Application number | US-201414450114-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2014 |
| Priority date | Nov 26, 2013 |
| Publication date | Feb 13, 2018 |
| Grant date | Feb 13, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Total global minimum costs can be determined for multiple sub-plans for completing a multi-operation database process to be performed in a distributed database management system that includes a plurality of nodes. The multiple sub-plans can include different distributions of node locations of a plurality of operators among the plurality of nodes. An optimal plan having a lowest total minimum global cost can be selected from the multiple sub-plans.
Opening claim text (preview).
What is claimed is: 1. A computer program product comprising a non-transitory machine-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations comprising: generating, for a multi-operation database process to be performed in a distributed database management system comprising a plurality of computing nodes comprising respective data processors and having respective node locations, a plurality of sub-plans, each of the plurality of sub-plans comprising a different distribution of node locations of a plurality of operators among the plurality of computing nodes, the plurality of operators being those necessary to complete the multi-operation database process; calculating a total minimum global cost for each sub-plan of the plurality of sub-plans; and selecting an optimal plan from the plurality of sub-plans, the optimal plan having a lowest total minimum global cost, the selecting including pruning at least one sub-plan from the plurality of sub-plans by at least eliminating at least one sub-plan having a cost greater than a best sub-tree cost for a location of a node location; wherein the plurality of computing nodes includes a first node and a second node, and a first sub-plan of the plurality of sub-plans includes a first join operation and a second join operation, wherein: the first join operation comprises transferring a first number of rows of a first table from the second node to the first node, and joining the first number of rows with a second number of rows of a second table at the first node, the first number of rows being greater than the second number of rows, and the second join operation comprises joining a result of the first join operation with a third number of rows of a third table at the first node, the third number of rows greater than the first number of rows. 2. The computer program product as in claim 1 , the pruning further comprising: tracking, for each operator of the plurality of operators, a best sub-tree cost for each node location, one or more child locations corresponding to the best sub-tree costs, and one or more child plans corresponding to the best sub-tree costs. 3. The computer program product as in claim 1 , the pruning further comprising eliminating at least one sub-plan having a cost greater than a best sub-plan cost for a second sub-plan that involves action at another node location plus a transfer cost for transferring one or more tables between node locations as part of the second sub-plan. 4. The computer program product as in claim 1 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of the multi-operation database process. 5. The computer program product as in claim 1 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of assuming placement of a physical operator on either of at least two nodes generates a same amount of output. 6. The computer program product as in claim 1 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises assuming that each computing node of the plurality of computing nodes has a same and symmetric network configuration. 7. A system comprising: computer hardware configured to perform operations comprising: generating, for a multi-operation database process to be performed in a distributed database management system comprising a plurality of computing nodes comprising respective data processors and having respective node locations, a plurality of sub-plans, each of the plurality of sub-plans comprising a different distribution of node locations of a plurality of operators among the plurality of computing nodes, the plurality of operators being those necessary to complete the multi-operation database process; calculating a total minimum global cost for each sub-plan of the plurality of sub-plans; and selecting an optimal plan from the plurality of sub-plans, the optimal plan having a lowest total minimum global cost, the selecting including pruning at least one sub-plan from the plurality of sub-plans by at least eliminating at least one sub-plan having a cost greater than a best sub-tree cost for a location of a node location; wherein the plurality of computing nodes includes a first node and a second node, and a first sub-plan of the plurality of sub-plans includes a first join operation and a second join operation, wherein: the first join operation comprises transferring a first number of rows of a first table from the second node to the first node, and joining the first number of rows with a second number of rows of a second table at the first node, the first number of rows being greater than the second number of rows, and the second join operation comprises joining a result of the first join operation with a third number of rows of a third table at the first node, the third number of rows greater than the first number of rows. 8. The system as in claim 7 , the pruning further comprising: tracking, for each operator of the plurality of operators, a best sub-tree cost for each node location, one or more child locations corresponding to the best sub-tree costs, and one or more child plans corresponding to the best sub-tree costs. 9. The system as in claim 7 , the pruning further comprising: eliminating at least one sub-plan having a cost greater than a best sub-plan cost for a second sub-plan that involves action at another node location plus a transfer cost for transferring one or more tables between node locations as part of the second sub-plan. 10. The system as in claim 7 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of the multi-operation database process. 11. The system as in claim 7 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises quantifying child table transfer counts associated with at least one operation of assuming placement of a physical operator on either of at least two nodes generates a same amount of output. 12. The system as in claim 7 , wherein the calculating of the total minimum global cost for each of the plurality of sub-plans further comprises assuming that each node of the plurality of computing nodes has a same and symmetric network configuration. 13. The system as in claim 7 , wherein the computer hardware comprises: a programmable processor; and a machine-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform at least some of the operations. 14. A computer-implemented method comprising: generating, for a multi-operation database process to be performed in a distributed database management system comprising a plurality of computing nodes comprising respective data processors and having respective node locations, a plurality of sub-plans, each of the plurality of sub-plans comprising a different distribution of node locations of a plurality of operators among the plurality of computing nodes, the plurality of operators being those necessary to complete the multi-operation database process; calculating a total minimum global cost for each sub-plan of the plurality of sub-plans; and selecting an optimal plan from t
Selectivity estimation or determination · CPC title
Plan optimisation · CPC title
Join order optimisation · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.