Parallel processing database tree structure
US-2015379078-A1 · Dec 31, 2015 · US
US9229979B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9229979-B2 |
| Application number | US-201213710470-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 11, 2012 |
| Priority date | Dec 11, 2012 |
| Publication date | Jan 5, 2016 |
| Grant date | Jan 5, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention extends to methods, systems, and computer program products for optimizing parallel queries using interesting distributions. For each logical operator in an SQL server MEMO, in a top down manner from a root operator to the leaf operators, interesting distributions for the operators can be identified based on the properties of the operators. Identified interesting distributions can be propagated down to lower operators by annotating the lower operators with the interesting distributions. Thus, a SQL server MEMO can be annotated with interesting distributions propagated top down from root to leaf logical operators to generate an annotated SQL server MEMO. Parallel query plans can then be generated from the annotated SQL server MEMO in a bottom up manner from leaf operators to a root operator. Annotated interesting properties can be used to prune operators, thereby facilitating a more tractable search space for a parallel query plan.
Opening claim text (preview).
What is claimed: 1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for identifying and propagating interesting properties within a query plan search space, the method comprising: accessing a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and formulating an annotated query plan search space by, for at least one group selected from among the root group and the one or more intermediate groups: for at least one child group of the at least one group: identifying a distribution property indicating an interesting type of distribution relevant to the child group, the distribution property identifying a column that data for a parent group of the child group is distributed on; and annotating the child group with the interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group for use in subsequent query plan pruning based on the annotated query plan search space. 2. The method of claim 1 , wherein identifying an interesting type of distribution relevant to the child group comprises identifying one or more of: a hash-distribution on equi-join predicates for joins, a hash distribution of group-by/partitioning columns for grouping/partitioning operators, a replicated distribution for a join operator, a replicated distribution for a grouping operator, a replicated distribution for a partitioning operator, and an indication that a table is located on a control node of the distributed database. 3. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying a distribution property for a top operator. 4. The method of claim 3 , wherein identifying a distribution property for a top operator comprises identifying a distribution property that includes a combination of a replicated distribution and an indication that a table is located on a control node of the distributed database. 5. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying a distribution property for an insert operator, the insert operator inserting rows from a source select statement into a table, the table being hash distributed on the identified column. 6. The method of claim 5 , wherein identifying a distribution property for an insert operator comprises identifying the hash distribution of the identified column as an interesting distribution for the insert operator. 7. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying an inherited distribution property that originated at and was previously propagated down to the group from a parent group of the group. 8. The method of claim 7 , wherein annotating the child group comprises attaching the indication of the identified column to the child group within the hierarchical structure to further propagate the inherited interesting distribution. 9. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying distribution property generated by the one or more logical operators in the child group. 10. The method of claim 9 , wherein annotating the interesting distribution to the child group comprises attaching a plurality of indications to the child group, each of the plurality of indications identifying a column at a different parent group of the child group. 11. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for pruning a search space of query plans, the method comprising: accessing an annotated query plan search space for a query of the distributed database, the annotated query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups, each of one or more groups in the annotated query plan search space annotated with indication of an interesting type of distribution by having at least one attached indication of an identified column a parent group of the group is distributed on, the identified column relevant to the group and propagated down from the parent group to annotate the group; and for each of the plurality of groups, starting at the leaf groups and in a bottom up manner: for each logical operator in the group: examining a plurality of possible input physical operators for implementing the logical operator; for each of the possible physical input operators, inserting a corresponding appropriate data movement operator to make the logical operator distribution compatible; costing each of the plurality of possible input physical operators, including corresponding inserted data movement operators; and pruning the plurality of possible physical operators by: retaining the physical operator and corresponding inserted movement operator with the overall cheapest cost; retaining the physical operator and corresponding inserted movement operator with the cheapest cost that has an output distribution matching an attached indication of an interesting type of distribution propagated down from a parent group; and removing any other physical operators. 12. The method of claim 11 , wherein at least one group having an attached indication of an interesting distribution, has an attached indication of an interesting distribution selected from among: a hash-distribution on equi-join predicates for joins, a hash distribution of group-by/partitioning columns for grouping/partitioning operators, a replicated distribution for a join operator, a replicated distribution for a grouping operator, a replicated distribution for a partitioning operator, and an indication that a table is located on a control node of the distributed database. 13. The method of claim 11 , further comprising generating a pruned query plan search space from the pruned pluralities of possible physical operators. 14. The method of claim 11 , wherein each of one or more groups in the annotated query plan search space annotated with indication of an interesting type of distribution by having at least one attached indication of an identified column a parent group of the group is distributed on comprises at least one of the one or more
Physics · mapped topic
of parallel queries · CPC title
Join order optimisation · CPC title
using data annotations, e.g. user-defined metadata · CPC title
Unary operations; Data partitioning operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.