Pipeline level optimization of aggregation operators in a query plan during runtime

US11144550B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11144550-B2
Application numberUS-202016857817-A
CountryUS
Kind codeB2
Filing dateApr 24, 2020
Priority dateSep 25, 2019
Publication dateOct 12, 2021
Grant dateOct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and a join operation, the join operation including a build side and a probe side. The subject technology inserts an aggregation operator below the probe side of the join operation. The subject technology causes the build side of the join operation to generate a hash table. The subject technology causes the build side of the join operation to generate a bloom filter based at least in part on the hash table and provide information, corresponding to properties of the build side, to a bloom filter. Based at least in part on the information, the subject technology determines at least one property of the join operation to determine whether to switch the aggregation operator to a pass through mode.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and a join operation, the join operation including a build side and a probe side, each of the query operation in the set of query operations included as different nodes in a tree structure corresponding to the query plan, the build side of the join operation providing information comprising a build side join key cardinality and a number of distinct values of a join key; providing information, corresponding to properties of the build side, to a bloom filter, the bloom filter utilized to pass the information to an aggregation operator; based at least in part on the information from the bloom filter, determining, during executing of the query plan, at least one property of the join operation to determine whether to switch the aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate, the reduction rate being based on a ratio of a first number of records that a duplicate-removal operation has removed to a second number of records that the duplicate-removal operation has ingested; switching, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan, the switching facilitating a reduction in utilization of computing resources by forgoing performing the aggregation operator; and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed, and the input stream of data matches an output stream of data flowing out of the aggregation operator; and after switching the aggregation operator to the pass through mode, sending new data received from the bloom filter to a second query operation included in the query plan, the new data being pass through the aggregation operator without performing the aggregation operation operator, the second query operation utilizing the new data to perform a particular operation of the query plan, the second query operation being above the aggregation operator in the tree structure corresponding to the query plan, the second query operation corresponding to a first node and the aggregation operator corresponding to a second node, the second query operation comprising a second aggregation operator different than the aggregation operator. 2. The method of claim 1 , wherein determining the at least one property of the join operation comprises: causing the aggregation operator to determine an explosiveness of the join operation based on the information from the build side of the join operation. 3. The method of claim 2 , wherein the explosiveness of the join operation is based at least in part on a number of distinct values of a join key as indicated by the bloom filter. 4. The method of claim 2 , further comprising: defining a threshold explosiveness for the aggregation operator such that the aggregation operator turns off at runtime in response to the join operation exceeding the threshold explosiveness. 5. The method of claim 1 , further comprising: inserting multiple aggregation operators within the query plan, wherein each of the multiple aggregation operators is configured to automatically turn off at runtime in response to determining an explosiveness of the join operation exceeds a threshold explosiveness or fails to meet a threshold reduction rate for removing duplicate values. 6. A system comprising: at least one processor; and a memory device including instructions, which when executed by the at least one processor, cause the at least one processor to perform operations comprising: receiving a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and a join operation, the join operation including a build side and a probe side, each of the query operation in the set of query operations included as different nodes in a tree structure corresponding to the query plan, the build side of the join operation providing information comprising a build side join key cardinality and a number of distinct values of a loin key; providing information, corresponding to properties of the build side, to a bloom filter, the bloom filter utilized to pass the information to an aggregation operator; based at least in part on the information from the bloom filter, determining, during executing of the query plan, at least one property of the join operation to determine whether to switch the aggregation operator to a pass through mode, the at least one property comprising at least a reduction rate, the reduction rate being based on a ratio of a first number of records that a duplicate-removal operation has removed to a second number of records that the duplicate-removal operation has ingested; switching, in response to the reduction rate being below a threshold value, the aggregation operator to the pass through mode during runtime of the query plan, the switching facilitating a reduction in utilization of computing resources by forgoing performing the aggregation operator; and, while the aggregation operator is in the pass through mode, an input stream of data goes through the aggregation operator without being analyzed, and the input stream of data matches an output stream of data flowing out of the aggregation operator; and after switching the aggregation operator to the pass through mode, sending new data received from the bloom filter to a second query operation included in the query plan, the new data being pass through the aggregation operator without performing the aggregation operation operator, the second query operation utilizing the new data to perform a particular operation of the query plan, the second query operation being above the aggregation operator in the tree structure corresponding to the query plan, the second query operation corresponding to a first node and the aggregation operator corresponding to a second node, the second query operation comprising a second aggregation operator different than the aggregation operator. 7. The system of claim 6 , wherein determining the at least one property of the join operation further causes the at least one processor to perform operations further comprising: inserting an aggregation operator below the probe side of the join operation; causing the build side of the join operation to generate a hash table; causing the build side of the join operation to generate a bloom filter based at least in part on the hash table; and causing the aggregation operator to determine an explosiveness of the join operation based on the information from the build side of the join operation. 8. The system of claim 7 , wherein the explosiveness of the join operation is based at least in part on a number of distinct values of a join key as indicated by the bloom filter. 9. The system of claim 7 , wherein the memory device includes further instructions, which when executed by the at least one processor, further cause the at least one processor to perform operations comprising: defining a threshold explosiveness for the aggregation operator such that the aggregation operator turns off at runtime in response to the join operation exceeding the threshold explosiveness. 10. The system of claim 7 , wherein the memory device includes further instructions, which when executed by the at least one processor, further cause the at least one processor to perform operations comprising: causing the aggregation operator to determine a locally observed reduction rate of the aggregation operator.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11144550B2 cover?
The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and a join operation, the join operation including a build side and a probe side. The subject technology inserts an aggregation operator below the probe side of the join operation. The subject technology causes the build side of the jo…
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24542. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).