Query optimization method in distributed query engine and apparatus thereof
US-2017270162-A1 · Sep 21, 2017 · US
US11971888B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11971888-B2 |
| Application number | US-202117180323-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 19, 2021 |
| Priority date | Sep 25, 2019 |
| Publication date | Apr 30, 2024 |
| Grant date | Apr 30, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The subject technology receives a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and at least one join operation. The subject technology analyzes the query plan to identify an aggregation that is redundant. The subject technology removes the aggregation based at least in part on the analyzing. The subject technology determines at least one aggregation property corresponding to at least one query operation of the query plan. The subject technology inserts at least one adaptive aggregation operator in the query plan based at least in part on the at least one aggregation property, the at least one aggregation property comprising a set of aggregation properties. The subject technology provides a modified query plan based at least in part on the inserted at least one adaptive aggregation operator in the query plan.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and at least one join operation; analyzing the query plan to identify an aggregation that is redundant; removing the aggregation based at least in part on the analyzing; determining at least one aggregation property corresponding to at least one query operation of the query plan; inserting at least one adaptive aggregation operator in the query plan based at least in part on the at least one aggregation property, the at least one aggregation property comprising a set of aggregation properties, the set of aggregation properties comprising at least a splitability property, the splitability property indicating that the at least one adaptive aggregation operator is inserted below one side of the at least one join operation, the adaptive aggregation operator being configurable between at least two different modes based at least on a threshold reduction rate, the at least two different modes including a passthrough mode and a processing mode, the passthrough mode configuring the adaptive aggregation operator to output the input to the adaptive aggregation operator, the processing mode configuring the adaptive aggregation operator to process the input to the adaptive aggregation operator generating an output that is different than the input to the adaptive aggregation operator; and generating a modified query plan based at least in part on the inserted at least one adaptive aggregation operator in the query plan, the generated modified query plan configured to, during the execution of the modified query plan, (1) identify a reduction rate during execution of the modified query plan, and (2) based on the identified reduction rate and the threshold reduction rate, switch configuration of the adaptive aggregation operator between the passthrough mode and the processing mode, the reduction rate indicative of a runtime cost-efficiency metric for the execution of the modified query plan. 2. The method of claim 1 , wherein analyzing the query plan to identify the aggregation that is redundant comprises: determining that a first aggregation is guaranteed when a second aggregation is included in the query plan, the first aggregation and the second aggregation corresponding to different query operations at separate portions of the query plan, the query plan comprising a tree structure with multiple nodes, each node corresponding to a particular query operation; and removing, as part of a pull-up phase, the first aggregation from the query plan. 3. The method of claim 1 , wherein determining the at least one aggregation property corresponding to at least one query operation of the query plan comprises: determining the at least one aggregation property for associating to the at least one query operation of the query plan; and assigning, as part of a push-down phase, the at least one aggregation property to the at least one query operation of the query plan. 4. The method of claim 3 , wherein each aggregation property is assigned to a particular query operation of the query plan, and a number of query operations that are assigned an aggregation property corresponds to a set of query operations in the query plan that are semantically equivalent when evaluating a particular aggregation at each query operation of the set of query operations. 5. The method of claim 1 , wherein the splitability property is based at least in part on a vector of aggregation operations including an aggregation operation that accesses attributes from one of two mutually exclusive subsets of input attributes, and the set of aggregation properties further comprises a decomposability property, or a duplicate-sensitivity property. 6. The method of claim 1 , wherein inserting the at least one adaptive aggregation operator in the query plan comprises: analyzing a particular aggregation property to determine a type of aggregation operator to assign to a particular query operation of the query plan; and generating at least one adaptive aggregation operation based at least in part on a threshold value, the threshold value corresponding to a number of keys or key lengths associated with the aggregation. 7. A system comprising: at least one processor; and a memory device including instructions, which when executed by the at least one processor, cause the at least one processor to perform operations comprising: receiving a query plan, the query plan comprising a set of query operations, the set of query operations including at least one aggregation and at least one join operation; analyzing the query plan to identify an aggregation that is redundant; removing the aggregation based at least in part on the analyzing; determining at least one aggregation property corresponding to at least one query operation of the query plan; inserting at least one adaptive aggregation operator in the query plan based at least in part on the at least one aggregation property, the at least one aggregation property comprising a set of aggregation properties, the set of aggregation properties comprising at least a splitability property, the splitability property indicating that the at least one adaptive aggregation operator is inserted below one side of the at least one join operation, the adaptive aggregation operator being configurable between at least two different modes based at least on a threshold reduction rate, the at least two modes including a passthrough mode and a processing mode, the passthrough mode configuring the adaptive aggregation operator to output the input to the adaptive aggregation operator, the processing mode configuring the adaptive aggregation operator to process the input to the adaptive aggregation operator generating an output that is different than the input to the adaptive aggregation operator; and generating a modified query plan based at least in part on the inserted at least one adaptive aggregation operator in the query plan, the generated modified query plan configured to, during the execution of the modified query plan, (1) identify a reduction rate during execution of the modified query plan, and (2) based on the identified reduction rate and the threshold reduction rate, switch configuration of the adaptive aggregation operator between the passthrough mode and the processing mode, the reduction rate indicative of a runtime cost-efficiency metric for the execution of the modified query plan. 8. The system of claim 7 , wherein analyzing the query plan to identify the aggregation that is redundant further causes the at least one processor to perform operations further comprising: determining that a first aggregation is guaranteed when a second aggregation is included in the query plan, the first aggregation and the second aggregation corresponding to different query operations at separate portions of the query plan, the query plan comprising a tree structure with multiple nodes, each node corresponding to a particular query operation; and removing, as part of a pull-up phase, the first aggregation from the query plan. 9. The system of claim 7 , wherein determining the at least one aggregation property corresponding to at least one query operation of the query plan further causes the at least one processor to perform operations further comprising: determining the at least one aggregation property for associating to the at least one query operation of the query plan; and assigning, as part of a push-down phase, the at least one aggregation property to the at least one query operation of the query plan. 10. The system of claim 9 , wherein each aggregation prop
Plan optimisation · CPC title
Hash tables · CPC title
of operators · CPC title
Join order optimisation · CPC title
Selectivity estimation or determination · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.