Computing domain cardinality estimates for optimizing database query execution

US11514048B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11514048-B2
Application numberUS-202017088439-A
CountryUS
Kind codeB2
Filing dateNov 3, 2020
Priority dateDec 28, 2018
Publication dateNov 29, 2022
Grant dateNov 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method implements optimization of database queries by computing domain cardinality estimates. A client sends a database query to a server. The method parses the query to identify data columns. For each of the data columns, the method computes a lower bound and an upper bound of distinct data values using a pre-computed table size. The method also computes a patch factor by applying a pre-computed function to a ratio between a number of distinct data values that appear exactly once in a data sample and a number of distinct data values in the sample. Based on the patch factor, the lower bound, and the upper bound, the method computes an estimate of distinct values for each of the data columns. The method subsequently generates an execution plan for the query according to the computed estimates, executes the execution plan, and returns a result set to the client.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for retrieving data from a database, comprising: at a computer system having one or more processors and memory storing one or more programs configured for execution by the one or more processors: receiving a database query from a client; parsing the query to identify a plurality of data columns specified in the query; for each of the plurality of data columns: computing a respective patch factor p for the respective data column by applying a pre-computed function f to a respective ratio r between a number of distinct data values that appear exactly once in a respective data sample for the respective data column and a number of distinct data values in the respective data sample; and computing a respective estimate e of distinct data values for the respective data column according to the respective patch factor p; generating an execution plan for the query according to the estimates; executing the execution plan to retrieve a result set, from the database, corresponding to the query; and returning the result set to the client. 2. The method of claim 1 , further comprising: calculating the pre-computed function f by fitting a curve to plotted data points of table domain sizes versus ratio values for a respective pre-determined set of tables and a respective pre-determined set of data samples corresponding to the respective data column. 3. The method of claim 2 , wherein the pre-computed function f is computed in an offline mode, before receiving the database query from the client. 4. The method of claim 3 , wherein the pre-computed function f is computed in the offline mode using a machine learning technique by training a neural network using various table sizes corresponding to the respective data column. 5. The method of claim 1 , further comprising: computing a respective pre-computed table size t according to a size of the respective data sample and a number of rows in the respective data sample that have been deleted. 6. The method of claim 1 , further comprising: pre-selecting the respective data sample for the respective data column in an offline mode, before receiving the database query from the client. 7. The method of claim 1 , further comprising: updating the respective data sample when a respective table corresponding to the respective data column changes by at least 10%. 8. The method of claim 1 , wherein computing the respective estimate e of distinct data values for the respective data column comprises calculating e based on the following formula: e:=p*u +(1 −p )* l wherein: p is the respective patch factor for the respective data column; l is a respective lower bound of distinct data values for the respective data column; and u is a respective upper bound of distinct data values for the respective data column. 9. A system for retrieving data from a database, comprising: a display; one or more processors; memory; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs comprising instructions for: receiving a database query from a client; parsing the query to identify a plurality of data columns specified in the query; for each of the plurality of data columns: computing a respective patch factor p for the respective data column by applying a pre-computed function f to a respective ratio r between a number of distinct data values that appear exactly once in a respective data sample for the respective data column and a number of distinct data values in the respective data sample; and computing a respective estimate e of distinct data values for the respective data column according to the respective patch factor p; generating an execution plan for the query according to the estimates; executing the execution plan to retrieve a result set, from the database, corresponding to the query; and returning the result set to the client. 10. The system of claim 9 , wherein the one or more programs further comprise instructions for: calculating the pre-computed function f by fitting a curve to plotted data points of table domain sizes versus ratio values for a respective pre-determined set of tables and a respective pre-determined set of data samples corresponding to the respective data column. 11. The system of claim 10 , wherein the pre-computed function f is computed in an offline mode, before receiving the database query from the client. 12. The system of claim 11 , wherein the pre-computed function f is computed in the offline mode using a machine learning technique by training a neural network using various table sizes corresponding to the respective data column. 13. The system of claim 9 , wherein the one or more programs further comprise instructions for: computing a respective pre-computed table size t according to a size of the respective data sample and a number of rows in the respective data sample that have been deleted. 14. The system of claim 9 , wherein the one or more programs further comprise instructions for: pre-selecting the respective data sample for the respective data column in an offline mode, before receiving the database query from the client. 15. The system of claim 9 , wherein computing the respective estimate e of distinct data values for the respective data column comprises calculating e based on the following formula: e:=p*u +(1− p )*l wherein: p is the respective patch factor for the respective data column; l is a respective lower bound of distinct data values for the respective data column; and u is a respective upper bound of distinct data values for the respective data column. 16. A non-transitory computer readable storage medium storing one or more programs configured for execution by a computer system having a display, one or more processors and memory, the one or more programs comprising instructions for: receiving a database query from a client; parsing the query to identify a plurality of data columns specified in the query; for each of the plurality of data columns: computing a respective patch factor p for the respective data column by applying a pre-computed function f to a respective ratio r between a number of distinct data values that appear exactly once in a respective data sample for the respective data column and a number of distinct data values in the respective data sample; and computing a respective estimate e of distinct data values for the respective data column according to the respective patch factor p; generating an execution plan for the query according to the estimates; executing the execution plan to retrieve a result set, from the database, corresponding to the query; and returning the result set to the client. 17. The computer readable storage medium of claim 16 , wherein the one or more programs further comprise instructions for: calculating the pre-computed function f by fitting a curve to plotted data points of table domain sizes versus ratio values for a respective pre-determined set of tables and a respective pre-determined set of data samples corresponding to the respective data column. 18. The computer readable storage medium of claim 17 , wherein the pre-computed function f is computed in an offline mode, before receiving the database query from the client. 19. The computer readable storage medium of claim 18 , wherein the pre-computed function f is computed in the offline mode using a machine learning technique by training a neural network using various table sizes corr

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • Learning methods · CPC title

  • Aggregation; Duplicate elimination · CPC title

  • Selectivity estimation or determination · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11514048B2 cover?
A method implements optimization of database queries by computing domain cardinality estimates. A client sends a database query to a server. The method parses the query to identify data columns. For each of the data columns, the method computes a lower bound and an upper bound of distinct data values using a pre-computed table size. The method also computes a patch factor by applying a pre-comp…
Who is the assignee on this patent?
Tableau Software Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24556. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).