Optimizing parallel queries using interesting distributions

US9229979B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9229979-B2
Application numberUS-201213710470-A
CountryUS
Kind codeB2
Filing dateDec 11, 2012
Priority dateDec 11, 2012
Publication dateJan 5, 2016
Grant dateJan 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention extends to methods, systems, and computer program products for optimizing parallel queries using interesting distributions. For each logical operator in an SQL server MEMO, in a top down manner from a root operator to the leaf operators, interesting distributions for the operators can be identified based on the properties of the operators. Identified interesting distributions can be propagated down to lower operators by annotating the lower operators with the interesting distributions. Thus, a SQL server MEMO can be annotated with interesting distributions propagated top down from root to leaf logical operators to generate an annotated SQL server MEMO. Parallel query plans can then be generated from the annotated SQL server MEMO in a bottom up manner from leaf operators to a root operator. Annotated interesting properties can be used to prune operators, thereby facilitating a more tractable search space for a parallel query plan.

First claim

Opening claim text (preview).

What is claimed: 1. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for identifying and propagating interesting properties within a query plan search space, the method comprising: accessing a query plan search space for a query of the distributed database, the query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups; and formulating an annotated query plan search space by, for at least one group selected from among the root group and the one or more intermediate groups: for at least one child group of the at least one group: identifying a distribution property indicating an interesting type of distribution relevant to the child group, the distribution property identifying a column that data for a parent group of the child group is distributed on; and annotating the child group with the interesting type of distribution by attaching an indication of the identified column to the child group within the hierarchical structure to propagate the identified interesting type of distribution down to the child group for use in subsequent query plan pruning based on the annotated query plan search space. 2. The method of claim 1 , wherein identifying an interesting type of distribution relevant to the child group comprises identifying one or more of: a hash-distribution on equi-join predicates for joins, a hash distribution of group-by/partitioning columns for grouping/partitioning operators, a replicated distribution for a join operator, a replicated distribution for a grouping operator, a replicated distribution for a partitioning operator, and an indication that a table is located on a control node of the distributed database. 3. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying a distribution property for a top operator. 4. The method of claim 3 , wherein identifying a distribution property for a top operator comprises identifying a distribution property that includes a combination of a replicated distribution and an indication that a table is located on a control node of the distributed database. 5. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying a distribution property for an insert operator, the insert operator inserting rows from a source select statement into a table, the table being hash distributed on the identified column. 6. The method of claim 5 , wherein identifying a distribution property for an insert operator comprises identifying the hash distribution of the identified column as an interesting distribution for the insert operator. 7. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying an inherited distribution property that originated at and was previously propagated down to the group from a parent group of the group. 8. The method of claim 7 , wherein annotating the child group comprises attaching the indication of the identified column to the child group within the hierarchical structure to further propagate the inherited interesting distribution. 9. The method of claim 1 , wherein identifying a distribution property indicating an interesting distribution relevant to the child group comprises identifying distribution property generated by the one or more logical operators in the child group. 10. The method of claim 9 , wherein annotating the interesting distribution to the child group comprises attaching a plurality of indications to the child group, each of the plurality of indications identifying a column at a different parent group of the child group. 11. At a computer system, the computer system including one or more processors and system memory, the computer system connected to a plurality of compute nodes configured in a shared-nothing architecture, a distributed database distributed across the plurality of compute nodes, each compute node in the plurality of compute nodes maintaining a portion of the database in a local database instance, a method for pruning a search space of query plans, the method comprising: accessing an annotated query plan search space for a query of the distributed database, the annotated query plan search space including a plurality of groups of logical operators arranged in a hierarchically structure, the hierarchical structure including a root group, one or more intermediate groups, and one or more leaf groups, each group of logical operators including one or more logical operators on one or more input groups, each of one or more groups in the annotated query plan search space annotated with indication of an interesting type of distribution by having at least one attached indication of an identified column a parent group of the group is distributed on, the identified column relevant to the group and propagated down from the parent group to annotate the group; and for each of the plurality of groups, starting at the leaf groups and in a bottom up manner: for each logical operator in the group: examining a plurality of possible input physical operators for implementing the logical operator; for each of the possible physical input operators, inserting a corresponding appropriate data movement operator to make the logical operator distribution compatible; costing each of the plurality of possible input physical operators, including corresponding inserted data movement operators; and pruning the plurality of possible physical operators by: retaining the physical operator and corresponding inserted movement operator with the overall cheapest cost; retaining the physical operator and corresponding inserted movement operator with the cheapest cost that has an output distribution matching an attached indication of an interesting type of distribution propagated down from a parent group; and removing any other physical operators. 12. The method of claim 11 , wherein at least one group having an attached indication of an interesting distribution, has an attached indication of an interesting distribution selected from among: a hash-distribution on equi-join predicates for joins, a hash distribution of group-by/partitioning columns for grouping/partitioning operators, a replicated distribution for a join operator, a replicated distribution for a grouping operator, a replicated distribution for a partitioning operator, and an indication that a table is located on a control node of the distributed database. 13. The method of claim 11 , further comprising generating a pruned query plan search space from the pruned pluralities of possible physical operators. 14. The method of claim 11 , wherein each of one or more groups in the annotated query plan search space annotated with indication of an interesting type of distribution by having at least one attached indication of an identified column a parent group of the group is distributed on comprises at least one of the one or more

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9229979B2 cover?
The present invention extends to methods, systems, and computer program products for optimizing parallel queries using interesting distributions. For each logical operator in an SQL server MEMO, in a top down manner from a root operator to the leaf operators, interesting distributions for the operators can be identified based on the properties of the operators. Identified interesting distributi…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30445. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).