Workload-based sampling

US11455307B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11455307-B2
Application numberUS-202016797106-A
CountryUS
Kind codeB2
Filing dateFeb 21, 2020
Priority dateFeb 21, 2020
Publication dateSep 27, 2022
Grant dateSep 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system includes determination of a plurality of queries of a workload, determination of a data source comprising a plurality of data rows, and determination of a sample data source based on a cardinality of each of the plurality of queries with respect to the data source and an estimated cardinality of each of the plurality of queries with respect to the data source, wherein the estimated cardinality of a query with respect to the data source is determined based on the sample data source.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: a memory storing processor-executable program code; and a processing unit to execute the processor-executable program code in order to cause the system to: determine a plurality of queries of a workload; determine a data source comprising a plurality of data rows; and determine a sample data source based on a cardinality of each of the plurality of queries with respect to the data source and an estimated cardinality of each of the plurality of queries with respect to the data source by: for each of the plurality of queries, determining a decrease in a cardinality estimation error associated with addition of each of candidate rows of the data source to the sample data source; and selecting a candidate row to add to the sample data source based on the determined decreases, wherein the estimated cardinality of a query with respect to the data source is determined based on the sample data source. 2. A system according to claim 1 , the processing unit to execute the processor-executable program code in order to cause the system to: receive a runtime query on the data source; determine an estimated cardinality of the runtime query with respect to the data source based on the sample data source; and determine a query execution plan for the runtime query based on the estimated cardinality of the runtime query with respect to the data source. 3. A system according to claim 1 , wherein each of the plurality of queries is associated with one or more predicates, and wherein the candidate rows associated each one of the plurality of queries are rows of the data source selected by the one or more predicates of the query. 4. A system according to claim 3 , wherein determination, for one of the plurality of queries, of a decrease in a cardinality estimation error associated with addition of a candidate row comprises: determination of a true cardinality of the query with respect to the data source by execution of the query on the data source; determination of a current estimated cardinality of the query with respect to the data source by execution of the query on the sample data source not including the candidate row; determination of a current cardinality estimation error based on the true cardinality and the current estimated cardinality; determination of a new estimated cardinality of the query with respect to the data source by execution of the query on the sample data source including the candidate row; determination of a new cardinality estimation error based on the true cardinality and the new estimated cardinality; and determination of the decrease in the cardinality estimation error based on the current cardinality estimation error and the new cardinality estimation error. 5. A system according to claim 1 , wherein determination, for one of the plurality of queries, of a decrease in a cardinality estimation error associated with addition of a candidate row comprises: determination of a true cardinality of the query with respect to the data source by execution of the query on the data source; determination of a current estimated cardinality of the query with respect to the data source by execution of the query on the sample data source not including the candidate row; determination of a current cardinality estimation error based on the true cardinality and the current estimated cardinality; determination of a new estimated cardinality of the query with respect to the data source by execution of the query on the sample data source including the candidate row; determination of a new cardinality estimation error based on the true cardinality and the new estimated cardinality; and determination of the decrease in the cardinality estimation error based on the current cardinality estimation error and the new cardinality estimation error. 6. A computer-implemented method, comprising: determining a plurality of queries; determining a data source comprising a plurality of data rows; and determining a sample data source comprising a plurality of the plurality of data rows based on a cardinality of each of the plurality of queries with respect to the data source and an estimated cardinality of each of the plurality of queries with respect to the data source by: for each of the plurality of queries, determining a decrease in a cardinality estimation error associated with addition of each of candidate rows of the data source to the sample data source; and selecting a candidate row to add to the sample data source based on the determined decreases, wherein the estimated cardinality of a query with respect to the data source is determined based on data rows of the sample data source. 7. A method according to claim 6 , further comprising: receiving a runtime query on the data source; determining an estimated cardinality of the runtime query with respect to the data source based on the sample data source; and determining a query execution plan for the runtime query based on the estimated cardinality of the runtime query with respect to the data source. 8. A method according to claim 6 , wherein each of the plurality of queries is associated with one or more predicates, and wherein the candidate rows associated each one of the plurality of queries are rows of the data source selected by the one or more predicates of the query. 9. A method according to claim 8 , wherein determining, for one of the plurality of queries, of a decrease in a cardinality estimation error associated with addition of a candidate row comprises: determining a true cardinality of the query with respect to the data source by execution of the query on the data source; determining a current estimated cardinality of the query with respect to the data source by execution of the query on the sample data source not including the candidate row; determining a current cardinality estimation error based on the true cardinality and the current estimated cardinality; determining a new estimated cardinality of the query with respect to the data source by execution of the query on the sample data source including the candidate row; determining a new cardinality estimation error based on the true cardinality and the new estimated cardinality; and determining the decrease in the cardinality estimation error based on the current cardinality estimation error and the new cardinality estimation error. 10. A method according to claim 6 , wherein determining, for one of the plurality of queries, of a decrease in a cardinality estimation error associated with addition of a candidate row comprises: determining a true cardinality of the query with respect to the data source by execution of the query on the data source; determining a current estimated cardinality of the query with respect to the data source by execution of the query on the sample data source not including the candidate row; determining a current cardinality estimation error based on the true cardinality and the current estimated cardinality; determining a new estimated cardinality of the query with respect to the data source by execution of the query on the sample data source including the candidate row; determining a new cardinality estimation error based on the true cardinality and the new estimated cardinality; and determining the decrease in the cardinality estimation error based on the current cardinality estimation error and the new cardinality estimation error. 11. A non-transitory computer-readable medium storing program code executable by a processing unit to: determine a plurality of queries of a workload; and determine a sample data source based on a cardinality of each of the pluralit

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Run-time optimisation · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11455307B2 cover?
A system includes determination of a plurality of queries of a workload, determination of a data source comprising a plurality of data rows, and determination of a sample data source based on a cardinality of each of the plurality of queries with respect to the data source and an estimated cardinality of each of the plurality of queries with respect to the data source, wherein the estimated car…
Who is the assignee on this patent?
Sap Se
What technology area does this patent fall under?
Primary CPC classification G06F16/24549. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).