Dynamic operation scheduling for distributed data processing
US-2018314733-A1 · Nov 1, 2018 · US
US11475005B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11475005-B2 |
| Application number | US-202017064490-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 6, 2020 |
| Priority date | Dec 21, 2018 |
| Publication date | Oct 18, 2022 |
| Grant date | Oct 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A database engine includes one or more computing devices, each having one or more processors and memory. The memory stores programs configured for execution by the processors. The database engine receives a database query from a client, and parses the database query to build a query operator tree. The query operator tree includes a plurality of query operators. The database engine performs one or more optimization passes on the query operator tree, including a deduplication optimization pass, to form an optimized execution plan. The deduplication optimization pass includes determining that a first query operator is equivalent to a second query operator during a traversal of the query operator tree, and replacing the second query operator with a link to reuse results from the first query operator. The database engine executes the optimized execution plan to retrieve a result set from the database and returns the result set to the client.
Opening claim text (preview).
What is claimed is: 1. A database engine, comprising: one or more computing devices, each having one or more processors and memory, wherein the memory stores one or more programs configured for execution by the one or more processors and the one or more programs comprise instructions for: receiving a database query from a client; parsing the database query to build a query operator tree, the query operator tree including a plurality of query operators; performing one or more optimization passes on the query operator tree, including a deduplication optimization pass, to form an optimized execution plan, the deduplication optimization pass including: determining that a first query operator is equivalent to a second query operator during a traversal of the query operator tree; and replacing the second query operator with a link to reuse results from the first query operator; executing the optimized execution plan to retrieve a result set from the database; and returning the result set to the client. 2. The database engine of claim 1 , wherein the one or more programs further comprise instructions for computing a list of dependencies among the query operators in the tree, wherein the traversal comprises a breadth-first post-order traversal of the query operator tree based on the list of dependencies so that query operators that do not have dependencies are visited before query operators with dependencies. 3. The database engine of claim 2 , wherein the one or more programs further comprise instructions for: in accordance with a determination that a third query operator has no equivalent query operators in the query operator tree, updating the list of dependencies to specify that the third query operator has dependencies so that a parent of the third query operator is not selected during the breadth-first post-order traversal. 4. The database engine of claim 1 , wherein the deduplication optimization pass further includes determining that the first query operator can be materialized. 5. The database engine of claim 4 , wherein the first query operator is one of: a GROUPBy operator, a GROUPJOIN operator, a SORT operator, a WINDOW operator, or a TEMP operator. 6. The database engine of claim 1 , wherein the one or more programs further comprise instructions for performing a tree refactoring optimization pass before the deduplication optimization pass to refactor the query operator tree so as to increase duplicate query operators in the query operator tree. 7. The database engine of claim 1 , wherein determining if the first query operator is equivalent to the second query operator comprises: determining if input operators of the first query operator and the second query operator are equivalent; and determining if the first query operator and the second query operator have equivalent properties. 8. The database engine of claim 7 , wherein determining if the first query operator is equivalent to the second query operator uses commutativity and associativity rules. 9. A non-transitory computer-readable storage medium storing one or more programs configured for execution by a computer system having one or more processors and memory, the one or more programs comprising instructions for: receiving a database query from a client; parsing the database query to build a query operator tree, the query operator tree including a plurality of query operators; performing one or more optimization passes on the query operator tree, including a deduplication optimization pass, to form an optimized execution plan, the deduplication optimization pass including: determining that a first query operator is equivalent to a second query operator during a traversal of the query operator tree; and replacing the second query operator with a link to reuse results from the first query operator; executing the optimized execution plan to retrieve a result set from the database; and returning the result set to the client. 10. The computer-readable storage medium of claim 9 , wherein the deduplication optimization pass further includes determining that the first query operator can be materialized. 11. The computer-readable storage medium of claim 9 , wherein the one or more programs further comprise instructions for performing a tree refactoring optimization pass before the deduplication optimization pass to refactor the query operator tree so as to increase duplicate query operators in the query operator tree. 12. The computer-readable storage medium of claim 9 , wherein determining if the first query operator is equivalent to the second query operator comprises: determining if input operators of the first query operator and the second query operator are equivalent; and determining if the first query operator and the second query operator have equivalent properties. 13. A method of retrieving data from a database, comprising: at a computer system having one or more computing devices, each computing device having one or more processors and memory storing one or more programs configured for execution by the one or more processors: receiving a database query from a client; parsing the database query to build a query operator tree, the query operator tree including a plurality of query operators; performing one or more optimization passes on the query operator tree, including a deduplication optimization pass, to form an optimized execution plan, the deduplication optimization pass including: determining that a first query operator is equivalent to a second query operator during a traversal of the query operator tree; and replacing the second query operator with a link to reuse results from the first query operator; executing the optimized execution plan to retrieve a result set from the database; and returning the result set to the client. 14. The method of claim 13 , further comprising computing a list of dependencies among the query operators in the tree, wherein the traversal comprises a breadth-first post-order traversal of the query operator tree based on the list of dependencies so that query operators that do not have dependencies are visited before query operators with dependencies. 15. The method of claim 14 , further comprising: in accordance with a determination that a third query operator has no equivalent query operators in the query operator tree, updating the list of dependencies to specify that the third query operator has dependencies so that a parent of the third query operator is not selected during the breadth-first post-order traversal. 16. The method of claim 13 , wherein the deduplication optimization pass further includes determining that the first query operator can be materialized. 17. The method of claim 16 , wherein the first query operator is one of: a GROUPBy operator, a GROUPJOIN operator, a SORT operator, a WINDOW operator, or a TEMP operator. 18. The method of claim 13 , further comprising performing a tree refactoring optimization pass before the deduplication optimization pass to refactor the query operator tree so as to increase duplicate query operators in the query operator tree. 19. The method of claim 13 , wherein determining if the first query operator is equivalent to the second query operator comprises: determining if input operators of the first query operator and the second query operator are equivalent; and determining if the first query operator and the second query operator have equivalent properties. 20. The method of claim 19 , wherein determining if the first query
using cached or materialised query results · CPC title
Plan optimisation · CPC title
Optimisation of common expressions · CPC title
of operators · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.