Transforming a query to reuse stored data
US-2015088919-A1 · Mar 26, 2015 · US
US9990398B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9990398-B2 |
| Application number | US-201314033285-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 20, 2013 |
| Priority date | Sep 20, 2013 |
| Publication date | Jun 5, 2018 |
| Grant date | Jun 5, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, computer systems, and stored instructions are described herein for inferring dimensional metadata from content of a query that references tables. A server analyzes the content of the query to determine which table(s) of the referenced tables could be fact tables and/or which table(s) of the referenced tables could be dimension tables. The server transforms the query to a transformed query that operates on a candidate fact table and candidate dimension table(s) of the referenced tables other than the candidate fact table. The transformed query, when executed, processes at least some data from dimension(s) using the candidate dimension table(s) before processing at least some data from the candidate fact table. Alternatively or additionally, the server generates an execution plan that operates on the candidate fact table and the candidate dimension table(s). The execution plan, when performed, processes data from dimension(s) using the candidate dimension table(s) before processing data from the candidate fact table.
Opening claim text (preview).
What is claimed is: 1. A method comprising: storing a query that references multiple tables; analyzing operations the query specifies on the multiple tables to detect at least one of: two particular tables of the multiple tables could be dimension tables by detecting a join of the two particular tables of said multiple tables that does not comprise an equijoin, or one or more particular tables of the multiple tables could be fact tables by detecting at least one of: a minimum operation, a maximum operation, an average operation, a summation operation, an online analytical processing (OLAP) function based on a table of said multiple tables, or a group by operation based on multiple columns of a table of said multiple tables; identifying one or more candidate dimension tables of the multiple tables at least in part by determining, based at least in part on content of the query, that a particular candidate fact table of the multiple tables appears in one or more equijoins with the one or more candidate dimension tables; based at least in part on determining which of the multiple tables could be fact tables, generating an execution plan for the query, the execution plan operating on the particular candidate fact table and the one or more candidate dimension tables; wherein the execution plan, when performed, processes at least some data from at least one dimension using at least one of the one or more candidate dimension tables before processing at least some data from the particular candidate fact table; wherein the method is performed by one or more computing devices. 2. The method of claim 1 , wherein determining which of the multiple tables could be fact tables comprises determining which of the multiple tables are referenced by aggregation operators. 3. The method of claim 1 , wherein determining which of the multiple tables could be fact tables does not account for sizes of the multiple tables. 4. The method of claim 1 , wherein generating an execution plan for the query comprises generating multiple execution plans, the multiple execution plans including one or more execution plans for each of one or more candidate fact tables of the multiple tables, the method further comprising selecting, from among the multiple execution plans, the execution plan that operates on the candidate fact table of the one or more candidate fact tables and the one or more candidate dimension tables based at least in part on a size of the candidate fact table. 5. The method of claim 1 , wherein analyzing operations the query specifies on the multiple tables comprises detecting that one or more particular tables of the multiple tables could be fact tables by detecting at least one of: an online analytical processing (OLAP) function based on a table of said multiple tables, or a group by operation based on multiple columns of a table of said multiple tables. 6. The method of claim 1 , wherein detecting that one or more particular tables of the multiple tables could be fact tables comprises detecting a minimum operation, a maximum operation, an average operation, or a summation operation. 7. A method comprising: storing a query that references multiple tables; analyzing operations the query specifies on the multiple tables to detect at least one of: two particular tables of the multiple tables could be dimension tables by detecting a join of the two particular tables of said multiple tables that does not comprise an equijoin, or one or more particular tables of the multiple tables could be fact tables by detecting at least one of: a minimum operation, a maximum operation, an average operation, a summation operation, an online analytical processing (OLAP) function based on a table of said multiple tables, or a group by operation based on multiple columns of a table of said multiple tables; identifying one or more candidate dimension tables of the multiple tables at least in part by determining, based at least in part on content of the query, that a particular candidate fact table of the multiple tables appears in one or more equijoins with the one or more candidate dimension tables; based at least in part on determining which of the multiple tables could be fact tables, transforming the query to a transformed query that operates on the particular candidate fact table and the one or more candidate dimension tables wherein the transformed query, when executed, processes at least some data from at least one dimension using at least one of the one or more candidate dimension tables before processing at least some data from the particular candidate fact table; wherein the method is performed by one or more computing devices. 8. The method of claim 7 , wherein determining which of the multiple tables could be fact tables comprises determining which of the multiple tables are referenced by aggregation operators. 9. The method of claim 7 , wherein determining which of the multiple tables could be fact tables does not account for sizes of the multiple tables. 10. The method of claim 7 , wherein the query is transformed into multiple transformed queries, including one or more transformed queries for each of one or more candidate fact tables of the multiple tables, the method further comprising selecting, from among the multiple transformed queries, the transformed query that operates on the candidate fact table of the one or more candidate fact tables and the one or more candidate dimension tables based at least in part on a size of the candidate fact table. 11. The method of claim 7 , further comprising: determining that the query includes a particular fact key that is joined to a particular dimension key and that the query includes a group by operation that references the particular fact key; and based at least in part on determining that the query includes the particular fact key that is joined to the particular dimension key and that the query includes a group by operation that references the particular fact key, replacing the particular fact key in the group by operation with the particular dimension key. 12. The method of claim 7 , further comprising: determining that the query includes a maximum operation or minimum operation that references a group by operation and a particular column of a table of the one or more candidate dimension tables; and based at least in part on determining that the query includes the maximum operation or minimum operation that references the particular column, moving the particular column out of the maximum operation or minimum operation and into the group by operation. 13. The method of claim 7 , further comprising: determining that the query includes one or more redundant join operations; and based at least in part on determining that the query includes the one or more redundant join operations, removing at least one of the one or more redundant join operations from the query. 14. One or more non-transitory computer-readable storage media storing sequences of instructions which, when executed by one or more processors, cause: storing a query that references multiple tables analyzing operations the query specifies on the multiple tables to detect at least one of: two particular tables of the multiple tables could be dimension tables by detecting a join of the two particular tables of said multiple tables that does not comprise an equijoin, or one or more particular tables of the multiple tables could be fact tables by detecting at least one of: a minimum operation, a maximum operation, an average operation, a summation operation, an online analytical processing (OLAP) function based on a ta
Join order optimisation · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.