Adapting learned cardinality estimators to data and workload drifts

US12437522B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12437522-B2
Application numberUS-202117566996-A
CountryUS
Kind codeB2
Filing dateDec 31, 2021
Priority dateDec 31, 2021
Publication dateOct 7, 2025
Grant dateOct 7, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of updating a trained cardinality estimation model includes receiving a cardinality estimation model with cardinality labels and detecting a drift in underlying data or predicates of the cardinality estimation model. The type of the detected drift is determined and new test queries that mimic test queries for the detected drift are synthesized. A portion of the synthesized test queries is selected to reduce annotation cost and used to update the cardinality estimation model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of updating a trained cardinality estimation model implement in a computing system, the method comprising: receiving a cardinality estimation model with training predicates and cardinality labels; detecting a drift in underlying data or predicates of the cardinality estimation model; determining a type of the detected drift; based on the type of the detected drift, synthesizing new test queries that mimic test queries for the detected drift; selecting a portion of the new or synthesized test queries to annotate with cardinality labels so as to reduce annotation cost; and updating the cardinality estimation model with newer predicates and cardinality labels. 2. The method of claim 1 , wherein the detecting is performed periodically. 3. The method of claim 1 , wherein the detecting is performed when an evaluation error of the cardinality estimation model on the test queries exceeds a threshold beyond the error observed during training. 4. The method of claim 1 , wherein the determining the type of the detected drift comprises counting a fraction of rows that are new or have changed since the cardinality estimation model was last trained and measuring a change in ground truth cardinality for one or more canary predicates. 5. The method of claim 1 , wherein the determining the type of the detected drift comprises determining that the number of new queries available is below the number of annotated queries necessary to train the cardinality estimation model or when an insufficient number of queries have ground truth labels. 6. The method of claim 1 , further comprising: injecting newly arrived predicates into a query pool; computing and using embeddings for the query predicates; updating a generator and discriminator if synthetic queries are needed; and updating the embeddings. 7. The method of claim 1 , further comprising determining a plurality of types of the drifts. 8. The method of claim 6 , further comprising using learned embeddings of query predicates to decouple adaptation components from featurizations used by the cardinality estimation model. 9. The method of claim 6 , further comprising synthesizing new query predicates using predicate embeddings in the query pool. 10. The method of claim 9 , further comprising receiving a predicate embedding as input and predicting whether a given predicate resembles a training, test, or generated workload. 11. A computing system, comprising: one or more processors; and a computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by the processor, cause the computing system to perform operations comprising: detecting a drift in underlying data or predicates of a cardinality estimation model; determining a type of the detected drift; based on the type of the detected drift, synthesizing new test queries that mimic test queries for the detected drift; selecting a portion of the new or synthesized test queries to annotate with cardinality labels so as to reduce annotation cost; and outputting newer predicates and cardinality labels for updating the cardinality estimation model. 12. The computing system of claim 11 , wherein the determining the type of the detected drift comprises counting a fraction of rows that are new or have changed since the cardinality estimation model was last trained, and measuring a change in ground truth cardinality for one or more canary predicates. 13. The computing system of claim 11 , wherein the determining the type of the drift comprises determining that the number of new queries available is below the number of annotated queries necessary to train the cardinality estimation model or when an insufficient number of queries have ground truth labels. 14. A computer-readable storage medium having computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: detecting a drift in underlying data or predicates of a cardinality estimation model; determining a type of the detected drift; based on the type of the detected drift, synthesizing new test queries that mimic test queries for the detected drift; selecting a portion of the new or synthesized test queries to annotate with cardinality labels so as to reduce annotation cost; and outputting newer predicates and cardinality labels for updating the cardinality estimation model. 15. The computer-readable storage medium of claim 14 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: injecting newly arrived predicates into a query pool; computing embeddings for the newly arrived predicates; updating a generator and discriminator if synthetic queries are needed; and updating the embeddings. 16. The computer-readable storage medium of claim 15 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: using learned embeddings of query predicates to decouple components from featurizations used by the cardinality estimation model. 17. The computer-readable storage medium of claim 15 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: synthesizing new query predicates using predicate embeddings in the query pool. 18. The computer-readable storage medium of claim 15 , further comprising computer-executable instructions stored thereupon which, when executed by one or more processors of a computing device, cause the computing device to perform operations comprising: receiving a predicate embedding as input and predicting whether a given predicate resembles a training, test, or generated workload. 19. The computer-readable storage medium of claim 14 , wherein the detecting is performed periodically. 20. The computer-readable storage medium of claim 14 , wherein the detecting is performed when an evaluation error of the cardinality estimation model on the test queries exceeds a threshold beyond the error observed during training.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Organisation of the process, e.g. bagging or boosting · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Adversarial learning · CPC title

  • Generative networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12437522B2 cover?
A method of updating a trained cardinality estimation model includes receiving a cardinality estimation model with cardinality labels and detecting a drift in underlying data or predicates of the cardinality estimation model. The type of the detected drift is determined and new test queries that mimic test queries for the detected drift are synthesized. A portion of the synthesized test queries…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/776. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 07 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).