Dynamic operation scheduling for distributed data processing
US-2018314733-A1 · Nov 1, 2018 · US
US11301469B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11301469-B2 |
| Application number | US-202017013439-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 4, 2020 |
| Priority date | Nov 6, 2016 |
| Publication date | Apr 12, 2022 |
| Grant date | Apr 12, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method dynamically selects query execution operators. A database engine receives a query, parses the query to form a query execution tree, and compiles the tree to form a first executable plan that includes in-memory operators. The database engine executes the first plan, including executing in-memory operators in parallel. While executing a first in-memory operator, insufficient memory is detected. In response, the database engine aborts the execution, and recompiles the query tree in two ways, forming a second executable plan that replaces the first in-memory operator with a first spooling operator. The first spooling operator executes within a fixed volatile memory budget and swaps to non-volatile memory according to the budget. A third executable plan retains the first in-memory operator, but schedules it to run serially. The database engine selects either the second plan or the third plan, and executes the selected plan to return results for the query.
Opening claim text (preview).
What is claimed is: 1. A method for dynamically selecting query execution operators, comprising: at a computing device having one or more processors, volatile memory, and non-volatile memory, executing one or more programs to retrieve data from a database, including: receiving a query; parsing the query to form a query execution tree; compiling the query execution tree to form a first executable plan that includes a plurality of in-memory operators that execute within the volatile memory without swapping to the non-volatile memory; initiating execution of the first executable plan, including initiating execution of a first plurality of in-memory operators in the first executable plan to run in parallel; while executing a first in-memory operator of the first plurality, detecting insufficient memory to complete execution of the first in-memory operator; and in response to detecting insufficient memory to complete execution of the first in-memory operator: aborting execution of the first executable plan; recompiling the query execution tree to form a second executable plan that replaces the first in-memory operator with a first spooling operator, wherein the first spooling operator executes within a fixed volatile memory budget and is configured to swap to non-volatile memory according to the fixed volatile memory budget; recompiling the query execution tree to form a third executable plan that retains the first in-memory operator, but schedules the first in-memory operator to execute not in parallel with the other in-memory operators of the first plurality; selecting a final executable plan to be the second executable plan or the third executable plan, according to estimated available volatile memory; executing the final executable plan to identify a set of results from the database that is responsive to the query; and returning the set of results. 2. The method of claim 1 , wherein: selecting the final executable plan is further based on analyzing historical information that includes execution statistics of the first in-memory operator and/or similar in-memory operators. 3. The method of claim 1 , wherein forming the second executable plan further replaces a second in-memory operator of the first plurality with a second spooling operator. 4. The method of claim 1 , wherein the third execution plan schedules one or more in-memory operators of the first plurality, in addition to the first in-memory operator, to execute not in parallel with the other in-memory operators of the first plurality. 5. The method of claim 1 , wherein aborting execution of the first executable plan includes: identifying a first portion of the first executable plan that has completed execution before the detection of insufficient memory; and storing intermediate results corresponding to the first portion; wherein executing the final executable plan includes reusing the stored intermediate results. 6. The method of claim 5 , wherein data storage for the first executable plan is in a first format that is different from a second data storage format used by the final executable plan, and storing the intermediate results comprises transforming data from the first format to the second format. 7. The method of claim 1 , further comprising: prior to compiling the query execution tree: analyzing historical information that includes execution statistics of one or more queries similar to the query to determine if usage of in-memory operators caused insufficient memory to complete execution; and in accordance with a determination that usage of in-memory operators caused insufficient memory to complete execution, compiling the query execution tree so that the first executable plan includes at least one spooling operator. 8. A database engine, comprising: one or more computing devices, each having one or more processors, memory, and one or more programs stored in the memory, wherein the one or more programs are configured for execution by the one or more processors and comprise instructions for: receiving a query; parsing the query to form a query execution tree; compiling the query execution tree to form a first executable plan that includes a plurality of in-memory operators that execute within the volatile memory without swapping to the non-volatile memory; initiating execution of the first executable plan, including initiating execution of a first plurality of in-memory operators in the first executable plan to run in parallel; while executing a first in-memory operator of the first plurality, detecting insufficient memory to complete execution of the first in-memory operator; and in response to detecting insufficient memory to complete execution of the first in-memory operator: aborting execution of the first executable plan; recompiling the query execution tree to form a second executable plan that replaces the first in-memory operator with a first spooling operator, wherein the first spooling operator executes within a fixed volatile memory budget and is configured to swap to non-volatile memory according to the fixed volatile memory budget; recompiling the query execution tree to form a third executable plan that retains the first in-memory operator, but schedules the first in-memory operator to execute not in parallel with the other in-memory operators of the first plurality; selecting a final executable plan to be the second executable plan or the third executable plan, according to estimated available volatile memory; executing the final executable plan to identify a set of results from the database that is responsive to the query; and returning the set of results. 9. The database engine of claim 8 , wherein: selecting the final executable plan is further based on analyzing historical information that includes execution statistics of the first in-memory operator and/or similar in-memory operators. 10. The database engine of claim 8 , wherein forming the second executable plan further replaces a second in-memory operator of the first plurality with a second spooling operator. 11. The database engine of claim 8 , wherein the third execution plan schedules one or more in-memory operators of the first plurality, in addition to the first in-memory operator, to execute not in parallel with the other in-memory operators of the first plurality. 12. The database engine of claim 8 , wherein aborting execution of the first executable plan includes: identifying a first portion of the first executable plan that has completed execution before the detection of insufficient memory; and storing intermediate results corresponding to the first portion; wherein executing the final executable plan includes reusing the stored intermediate results. 13. The database engine of claim 12 , wherein data storage for the first executable plan is in a first format that is different from a second data storage format used by the final executable plan, and storing the intermediate results comprises transforming data from the first format to the second format. 14. The database engine of claim 8 , wherein the one or more programs further comprise instructions for: prior to compiling the query execution tree: analyzing historical information that includes execution statistics of one or more queries similar to the query to determine if usage of in-memory operators caused insufficient memory to complete execution; and in accordance with a determination that usage of in-memory operators caused insufficient memory to complete execution, compiling the query execution tree so that the first executable plan includes at least one spooling operator.
Run-time optimisation · CPC title
Plan optimisation · CPC title
Query execution · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.