Efficiently constructing regression models for selectivity estimation
US-2021406744-A1 · Dec 30, 2021 · US
US12437522B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12437522-B2 |
| Application number | US-202117566996-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 31, 2021 |
| Priority date | Dec 31, 2021 |
| Publication date | Oct 7, 2025 |
| Grant date | Oct 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of updating a trained cardinality estimation model includes receiving a cardinality estimation model with cardinality labels and detecting a drift in underlying data or predicates of the cardinality estimation model. The type of the detected drift is determined and new test queries that mimic test queries for the detected drift are synthesized. A portion of the synthesized test queries is selected to reduce annotation cost and used to update the cardinality estimation model.
Opening claim text (preview).
What is claimed is: 1. A method of updating a trained cardinality estimation model implement in a computing system, the method comprising: receiving a cardinality estimation model with training predicates and cardinality labels; detecting a drift in underlying data or predicates of the cardinality estimation model; determining a type of the detected drift; based on the type of the detected drift, synthesizing new test queries that mimic test queries for the detected drift; selecting a portion of the new or synthesized test queries to annotate with cardinality labels so as to reduce annotation cost; and updating the cardinality estimation model with newer predicates and cardinality labels. 2. The method of claim 1 , wherein the detecting is performed periodically. 3. The method of claim 1 , wherein the detecting is performed when an evaluation error of the cardinality estimation model on the test queries exceeds a threshold beyond the error observed during training. 4. The method of claim 1 , wherein the determining the type of the detected drift comprises counting a fraction of rows that are new or have changed since the cardinality estimation model was last trained and measuring a change in ground truth cardinality for one or more canary predicates. 5. The method of claim 1 , wherein the determining the type of the detected drift comprises determining that the number of new queries available is below the number of annotated queries necessary to train the cardinality estimation model or when an insufficient number of queries have ground truth labels. 6. The method of claim 1 , further comprising: injecting newly arrived predicates into a query pool; computing and using embeddings for the query predicates; updating a generator and discriminator if synthetic queries are needed; and updating the embeddings. 7. The method of claim 1 , further comprising determining a plurality of types of the drifts. 8. The method of claim 6 , further comprising using learned embeddings of query predicates to decouple adaptation components from featurizations used by the cardinality estimation model. 9. The method of claim 6 , further comprising synthesizing new query predicates using predicate embeddings in the query pool. 10. The method of claim 9 , further comprising receiving a predicate embedding as input and predicting whether a given predicate resembles a training, test, or generated workload. 11. A computing system, comprising: one or more processors; and a computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by the processor, cause the computing system to perform operations comprising: detecting a drift in underlying data or predicates of a cardinality estimation model; determining a type of the detected drift; based on the type of the detected drift, synthesizing new test queries that mimic test queries for the detected drift; selecting a portion of the new or synthesized test queries to annotate with cardinality labels so as to reduce annotation cost; and outputting newer predicates and cardinality labels for updating the cardinality estimation model. 12. The computing system of claim 11 , wherein the determining the type of the detected drift comprises counting a fraction of rows that are new or have changed since the cardinality estimation model was last trained, and measuring a change in ground truth cardinality for one or more canary predicates. 13. The computing system of claim 11 , wherein the determining the type of the drift comprises determining that the number of new queries available is below the number of annotated queries necessary to train the cardinality estimation model or when an insufficient number of queries have ground truth labels. 14. A computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: detecting a drift in underlying data or predicates of a cardinality estimation model; determining a type of the detected drift; based on the type of the detected drift, synthesizing new test queries that mimic test queries for the detected drift; selecting a portion of the new or synthesized test queries to annotate with cardinality labels so as to reduce annotation cost; and outputting newer predicates and cardinality labels for updating the cardinality estimation model. 15. The computer-readable storage medium of claim 14 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: injecting newly arrived predicates into a query pool; computing embeddings for the newly arrived predicates; updating a generator and discriminator if synthetic queries are needed; and updating the embeddings. 16. The computer-readable storage medium of claim 15 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: using learned embeddings of query predicates to decouple components from featurizations used by the cardinality estimation model. 17. The computer-readable storage medium of claim 15 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: synthesizing new query predicates using predicate embeddings in the query pool. 18. The computer-readable storage medium of claim 15 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: receiving a predicate embedding as input and predicting whether a given predicate resembles a training, test, or generated workload. 19. The computer-readable storage medium of claim 14 , wherein the detecting is performed periodically. 20. The computer-readable storage medium of claim 14 , wherein the detecting is performed when an evaluation error of the cardinality estimation model on the test queries exceeds a threshold beyond the error observed during training.
Combinations of networks · CPC title
Organisation of the process, e.g. bagging or boosting · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Adversarial learning · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.