Optimization of database queries including grouped aggregation functions

US9244974B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9244974-B2
Application numberUS-85518607-A
CountryUS
Kind codeB2
Filing dateSep 14, 2007
Priority dateSep 14, 2007
Publication dateJan 26, 2016
Grant dateJan 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the invention provide techniques for generating predicted cardinality statistics for grouped aggregation functions included in database queries. In general, characteristics of a database query are determined, and are then supplied to a probability function configured to generate a predicted cardinality statistic. The generated statistic represents a prediction of the probable cardinality of the results of a grouped aggregation function in the event that the query is executed. The predicted cardinality statistic may be used by a query optimizer to determine an efficient query plan for executing the database query.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method to optimize database queries having grouped aggregation functions, the computer-implemented method comprising: determining characteristics of a database query having a grouped aggregation function; generating, by operation of one or more computer processors, a predicted cardinality statistic based on a probability function and the determined characteristics and without executing the database query, wherein the predicted cardinality statistic predicts a probable cardinality of results of the grouped aggregation function; selecting, from a plurality of candidate query plans and based at least on the predicted cardinality statistic, a first candidate query plan to use in executing the database query, thereby optimizing the database query based on the predicted cardinality statistic; and executing the selected query plan in order to generate a set of query results, wherein the set of query results is returned to a requesting entity. 2. The computer-implemented method of claim 1 , wherein the probability function is configured to determine a probability that multiple independent events will occur. 3. The computer-implemented method of claim 1 , wherein the probability function is configured to determine a probability that multiple dependent events will occur. 4. The computer-implemented method of claim 1 , wherein the characteristics of the database query are selected from the group comprising: (i) a number of rows returned by the database query if executed without grouping, (ii) a number of groups returned by the database query if executed with grouping, and (iii) an average number of rows per group. 5. The computer-implemented method of claim 1 , wherein the database query is composed in the Structured Query Language (SQL) query language. 6. The computer-implemented method of claim 5 , wherein the grouped aggregation function included in the database query is selected from the COUNT, AVG, SUM, MIN, MAX, VARIANCE, and STANDARD_DEVIATION aggregation functions. 7. The computer-implemented method of claim 5 , wherein the characteristics of the database query are selected from the group comprising: (i) attributes of a GROUP BY clause, (ii) attributes of a WHERE clause, (ii) attributes of a JOIN clause, (iv) attributes of a HAVING clause, and (v) attributes of the grouped aggregation function. 8. A computer readable storage medium containing a program which, when executed, performs an operation to optimize database queries having grouped aggregation functions, the operation comprising: determining characteristics of a database query having a grouped aggregation function; generating, by operation of one or more computer processors when executing the program, a predicted cardinality statistic based on a probability function and the determined characteristics and without executing the database query, wherein the predicted cardinality statistic predicts a probable cardinality of results of the grouped aggregation function; selecting, from a plurality of candidate query plans and based at least on the predicted cardinality statistic, a first candidate query plan to use in executing the database query, thereby optimizing the database query based on the predicted cardinality statistic; and executing the selected query plan in order to generate a set of query results, wherein the set of query results is returned to a requesting entity. 9. The computer readable storage medium of claim 8 , wherein the probability function is configured to determine a probability that multiple independent events will occur. 10. The computer readable storage medium of claim 8 , wherein the probability function is configured to determine a probability that multiple dependent events will occur. 11. The computer readable storage medium of claim 8 , wherein the characteristics of the database query are selected from the group comprising (i) a number of rows returned by the database query if executed without grouping, (ii) a number of groups returned by the database query if executed with grouping, and (iii) an average number of rows per group. 12. The computer readable storage medium of claim 8 , wherein the database query is composed in the Structured Query Language (SQL) query language. 13. The computer readable storage medium of claim 12 , wherein the grouped aggregation function included in the database query is selected from the COUNT, AVG, SUM, MIN, MAX, VARIANCE, and STANDARD_DEVIATION aggregation functions. 14. The computer readable storage medium of claim 12 , wherein the characteristics of the database query are selected from the group comprising: (i) attributes of a GROUP BY clause, (ii) attributes of a WHERE clause, (ii) attributes of a JOIN clause, (iv) attributes of a HAVING clause, and (v) attributes of the grouped aggregation function. 15. A system to optimize database queries having grouped aggregation functions, the system comprising: a database; a processor; and a memory containing a program, wherein the program is configured to: determine characteristics of a database query having a grouped aggregation function; generate a predicted cardinality statistic based on a probability function and the determined characteristics and without executing the database query, wherein the predicted cardinality statistic predicts a probable cardinality of results of the grouped aggregation function; select, from a plurality of candidate query plans and based at least on the predicted cardinality statistic, a first candidate query plan to use in executing the database query, thereby optimizing the database query based on the predicted cardinality statistic; and execute the selected query plan in order to generate a set of query results, wherein the set of query results is returned to a requesting entity. 16. The system of claim 15 , wherein the probability function is configured to determine a probability that multiple independent events will occur. 17. The system of claim 15 , wherein the probability function is configured to determine a probability that multiple dependent events will occur. 18. The system of claim 15 , wherein the characteristics of the database query are selected from the group comprising: (i) a number of rows returned by the database query if executed without grouping, (ii) a number of groups returned by the database query if executed with grouping, and (iii) an average number of rows per group. 19. The system of claim 15 , wherein the database query is composed in the Structured Query Language (SQL) query language. 20. The system of claim 19 , wherein the grouped aggregation function included in the database query is selected from the COUNT, AVG, SUM, MIN, MAX, VARIANCE, and STANDARD_DEVIATION aggregation functions. 21. The system of claim 19 , wherein the characteristics of the database query are selected from the group comprising: (i) attributes of a GROUP BY clause, (ii) attributes of a WHERE clause, (ii) attributes of a JOIN clause, (iv) attributes of a HAVING clause, and (v) attributes of the grouped aggregation function. 22. The computer-implemented method of claim 1 , wherein the predicted cardinality statistic causes selection of the first candidate query plan, wherein without the predicted cardinality statistic, a second candidate query plan, different from the first candidate query plan, is selected. 23. The computer-implemented method of claim 22 , wherein in candidate query plan selection, the predicted

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9244974B2 cover?
Embodiments of the invention provide techniques for generating predicted cardinality statistics for grouped aggregation functions included in database queries. In general, characteristics of a database query are determined, and are then supplied to a probability function configured to generate a predicted cardinality statistic. The generated statistic represents a prediction of the probable car…
Who is the assignee on this patent?
Muras Brian Robert, Smith Mark Steven, IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2453. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).