Cardinality estimation for database query planning

US10534775B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10534775-B2
Application numberUS-201715603319-A
CountryUS
Kind codeB2
Filing dateMay 23, 2017
Priority dateMay 23, 2017
Publication dateJan 14, 2020
Grant dateJan 14, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for cardinality estimation for database query planning is provided. In some implementations, the system performs operations comprising selecting a subset of data from a set of data on which a database query is to be executed, the set of data including a first quantity of tuples and the subset of data including a second quantity of tuples. The operations can further comprise determining, based on evaluating one or more predicates on the subset, a third quantity of tuples in the subset which satisfy the one or more predicates. The operations can further comprise determining, based on the first quantity, the second quantity, and the third quantity, a range within the subset that comprises estimated cardinalities of the one or more predicates within a predetermined error threshold range. Related systems, methods, and articles of manufacture are also described.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one processor; and at least one memory storing instructions which, when executed by the at least one processor, cause operations comprising: selecting, from a set of data on which a database query is to be executed, a subset of data, wherein the set of data includes a first quantity of tuples, and wherein the subset of data includes a second quantity of tuples; determining, based on evaluating one or more predicates on the subset, a third quantity of tuples in the subset that satisfy the one or more predicates; and determining, based on the first quantity, the second quantity, and the third quantity, a range within the subset, the range comprising estimated cardinalities of the one or more predicates within a predetermined error threshold range, wherein the determining the range further comprises: calculating, based on the first quantity, the second quantity, and the third quantity, a first estimate of a minimum quantity of tuples in the set of data that satisfy the one or more predicates; and calculating, based on the first quantity, the second quantity, and the third quantity, a second estimate of a maximum quantity of tuples in the set that satisfy the one or more predicates. 2. The system of claim 1 , wherein the operations further comprise: calculating, within the range, one or more cardinality estimates for the one or more predicates; and generating, based on the one or more cardinality estimates, a query plan for execution of the database query. 3. The system of claim 1 , wherein the minimum quantity of tuples is a representative sample of the set of data; and wherein the maximum quantity of tuples is the representative sample of the set of data. 4. The system of claim 1 , wherein the calculating the first estimate of the minimum quantity comprises: calculating a probability based on ( n - l m - k ) ⁢ ( l m ) ( n m ) , wherein n represents the first quantity, wherein m represents the second quantity, wherein k represents the third quantity, and wherein l represents a variable; and setting the minimum value to equal the variable when the probability is greater than or equal to a probability threshold. 5. The system of claim 1 , wherein the first estimate of the minimum quantity and the second estimate of the maximum quantity are calculated based on a fixed-point calculation. 6. The system of claim 1 , wherein the operations further comprise: generating, based on the database query, one or more algebra representations comprising the set of data, wherein the database query comprises the one or more predicates. 7. The system of claim 1 , wherein the subset is selected randomly from the set of data. 8. A method comprising: selecting, from a set of data on which a database query is to be executed, a subset of data, wherein the set of data includes a first quantity of tuples, and wherein the subset of data includes a second quantity of tuples; determining, based on evaluating one or more predicates on the subset, a third quantity of tuples in the subset that satisfy the one or more predicates; and determining, based on the first quantity, the second quantity, and the third quantity, a range within the subset, the range comprising estimated cardinalities of the one or more predicates within a predetermined error threshold range, wherein the determining the range further comprises: calculating, based on the first quantity, the second quantity, and the third quantity, a first estimate of a minimum quantity of tuples in the set of data that satisfy the one or more predicates; and calculating, based on the first quantity, the second quantity, and the third quantity, a second estimate of a maximum quantity of tuples in the set that satisfy the one or more predicates. 9. The method of claim 8 , further comprising: calculating, within the range, one or more cardinality estimates for the one or more predicates; and generating, based on the one or more cardinality estimates, a query plan for execution of the database query. 10. The method of claim 8 , wherein the minimum quantity of tuples is a representative sample of the set of data; and wherein the maximum quantity of tuples is the representative sample of the set of data. 11. The method of claim 8 , wherein the calculating the first estimate of the minimum quantity comprises: calculating a probability based on ( n - l m - k ) ⁢ ( l m ) ( n m ) , wherein n represents the first quantity, wherein m represents the second quantity, wherein k represents the third quantity, and wherein l represents a variable; and setting the mi

Assignees

Inventors

Classifications

  • Selectivity estimation or determination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10534775B2 cover?
A system for cardinality estimation for database query planning is provided. In some implementations, the system performs operations comprising selecting a subset of data from a set of data on which a database query is to be executed, the set of data including a first quantity of tuples and the subset of data including a second quantity of tuples. The operations can further comprise determining…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/24545. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 14 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).