Bloom filter costing estimation

US9454574B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9454574-B2
Application numberUS-201414229211-A
CountryUS
Kind codeB2
Filing dateMar 28, 2014
Priority dateMar 28, 2014
Publication dateSep 27, 2016
Grant dateSep 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Bloom filter cost estimation engine for improved performance and accuracy is described. An example method includes building an execution plan for a join operation having a plurality of levels, where the execution plan includes a top join operator at a top level, a leaf scan operator on a bottom level, and one or more intermediate operators between the top level and the bottom level. A row reduction effect of applying a Bloom filter is determined by simulating a semi-join operation over table statistic representation at each of the plurality of levels of the execution plan. A cost savings of the join operation is calculated based on the row reduction effect at the each of the plurality of the levels.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: building, by a computing device, an execution plan for a join operation having a plurality of levels, including a top level comprising a top join operator, a bottom level comprising a leaf scan operator, and an intermediate level comprising an intermediate operator between the top level and the bottom level; determining, by the computing device, a row reduction effect of applying a Bloom filter by simulating a semi-join operation over a table statistic representation at each of the plurality of levels of the execution plan; calculating, by the computing device, a cost savings of applying the Bloom filter to the join operation based on the row reduction effect at the each of the plurality of levels; and integrating, by the computing device, the Bloom filter to the execution plan upon determining that the cost savings of applying the Bloom filter to the join operation exceeds a joint cost, wherein the joint cost is based on a cost associated with build side and probe side of the Bloom filter. 2. The method of claim 1 , wherein the table statistic representation comprises a histogram of data distribution variables on a table involved in the join operation and row count variables associated with the table. 3. The method of claim 1 , where the simulating the semi-join operation comprising: simulating the semi-join operation between the table statistic representation at the each of the plurality of levels and a build side of an original derived table. 4. The method of claim 3 , further comprising: generating a new table statistic representation at the each of the plurality of levels, based on corresponding data distribution variables and row count variables. 5. The method of claim 4 , further comprising: updating the row count variable by adding a false positive rate to the row count variable at the each of the plurality of levels. 6. The method of claim 1 , wherein the integrating further comprises: integrating the Bloom filter to the execution plan upon determination that a final probing reduced row count is less than 90% of an original scan row count and ( ( c n - c n ′ ) c n > 10 ⁢ % , wherein Cn is a top join cost associated with the top join operator without applying the Bloom filter, and Cn′ is the top join cost after applying the Bloom filter. 7. The method of claim 1 , wherein the join operation is an equi-join operation. 8. The method of claim 1 , wherein the join operation comprises a hash join, a sort merge join, and a reformatting based nested loop join. 9. A system, comprising: a memory; and at least one processor coupled to the memory and configured to execute a plurality of modules, the modules comprising: a plan builder, configured to build art execution plan for a join operation having a plurality of levels, including a top level comprising a top join operator, a bottom level comprising a leaf scan operator, and an intermediate level comprising an intermediate operator between the top level and the bottom level, a operation simulator, configured to determine a row reduction effect of applying a Bloom filter by simulating a semi-join operation over a table statistic representation at each of the plurality of levels of the execution plan; a cost calculator, configured to calculate a cost savings of applying the Bloom filter to the join operation based on the row reduction effect at the each of the plurality of levels; and a plan integrator, configured to integrate the Bloom filter to the execution plan upon determining that the cost savings of applying the Bloom filter to the join operation exceeds a joint cost, wherein the joint cost is based on a cost associated with build side and probe side of the Bloom filter. 10. The system of claim 9 , wherein the table statistic representation comprises a histogram of data distribution variables on a table involved in the join operation and row count variables associated with the table. 11. The system of claim 9 , where the operation simulator is further configured to: simulate the semi-join operation between the table statistic representation at the each of the plurality of levels and a build side of an original derived table. 12. The system of claim 11 , where the operation simulator is further configured to: generate a new table statistic representation at the each of the plurality of levels, based on corresponding data distribution variables and row count variables. 13. The system of claim 12 , where the operation simulator is further configured to: update the row count variable by adding a false positive rate to the row count variable at the each of the plurality of levels. 14. The system of claim 9 , wherein the plan integrator is further configured to: integrate the Bloom filter to the execution plan upon determination that a final probing reduced row count is less than 90% of an original scan row count and ( ( c n - c n ′ ) c n > 10 ⁢ % , wherein Cn is a top join cost associated with the top join operator without applying the Bloom filter, and Cn′ is the top join cost after applying the Bloom filter. 15. The system of claim 9 , wherein the join operation is an equi-join operation. 16. The system of claim 9 , wherein the join operation comprises a hash join, a sort merge join, and a reformatting based nested loop join. 17. A non-transitory computer readable storage medium having instructions encoded thereon, execution of which, by a processor, cause the processor to perform operations comprising: building, by a computing device, an execution plan for a join operation having a plurality of levels, including a top level comprising a top join operator, a bottom level comprising a leaf scan operator, and an intermediate level comprising an intermediate operator between the top level and the bottom level; determining, by the computing device, a row reduction effec

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9454574B2 cover?
Bloom filter cost estimation engine for improved performance and accuracy is described. An example method includes building an execution plan for a join operation having a plurality of levels, where the execution plan includes a top join operator at a top level, a leaf scan operator on a bottom level, and one or more intermediate operators between the top level and the bottom level. A row reduc…
Who is the assignee on this patent?
Cheng Xun, Sybase Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24545. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).