Systems and methods for synthetic database query generation

US11513869B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11513869-B2
Application numberUS-201916298463-A
CountryUS
Kind codeB2
Filing dateMar 11, 2019
Priority dateJul 6, 2018
Publication dateNov 29, 2022
Grant dateNov 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for returning synthetic database query results. The system may include a memory unit for storing instructions, and a processor configured to execute the instructions to perform operations comprising: receiving a query input by a user at a user interface; determining, based on natural language processing, a type of the query input; determining, based on the received query input and a database language interpreter, an output data format; returning, based on a generation model and the output data format, a result of the query input; providing, to a plurality of training models and based on the determined query type, the query input and the result; and training the training models, based on the query input and the result.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for training models for outputting synthetic database query results, the system comprising: at least one memory unit for storing instructions; and at least one processor configured to execute the instructions to perform operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input; selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions using a subclass-specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; and replacing the sensitive data portions with the synthetic data portions; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model. 2. The system of claim 1 , wherein the operations further comprise generating an expected database result from the first query input. 3. The system of claim 1 , wherein the operations further comprise evaluating, by a model optimizer, performance criteria of the plurality of training models. 4. The system of claim 3 , wherein the performance criteria includes the similarity metric. 5. The system of claim 1 , wherein the operations further comprise: determining, based on the first query input and a database language interpreter, an output data format; returning, based on a generation model and the output data format, a synthetic result of the first query input; extracting information from the generation model; and displaying the extracted information on the user interface. 6. The system of claim 1 , wherein the reference dataset includes data represented in least one of a structured data format, a semi-structured data format, or an unstructured data format. 7. The system of claim 1 , wherein the first query input comprises customer financial information. 8. The system of claim 7 , wherein determining the type of the second query input includes using a database language interpreter to interpret a language associated with the second query input. 9. The system of claim 7 , wherein the operations further comprise predicting the subclass of sensitive data portions. 10. The system of claim 9 , wherein the subclass of sensitive data portions is predicted based on at least one additional subclass. 11. A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform operations, the operations comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input; selecting a subclass of sensitive data portions within the class based on a distribution model: generating synthetic data portions to replace the sensitive data portions using a subclass-specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model. 12. A computer-implemented method for training models for outputting synthetic database query results, the method comprising: receiving a first query input entered by a user at a user interface; determining a type of the first query input; generating a synthetic dataset using a dataset generator comprising a trained generative adversarial network, the synthetic dataset: differing by at least a predetermined amount from a reference dataset according to a similarity metric; and comprising synthetic data portions generated by: determining a class of sensitive data portions in the first query input; and selecting a subclass of sensitive data portions within the class based on a distribution model; generating synthetic data portions to replace the sensitive data portions using a subclass-specific model trained to generate synthetic values for the selected subclass and not for other subclasses within the class; based on the determined first query input type, providing the first query input and the synthetic dataset to a plurality of training models; training the plurality of training models based on the first query input and the synthetic dataset; receiving a second query input; determining a type of the second query input; and routing the second query input to a selected training model of the plurality of training models based on the determined second query input type and an output format of the selected training model. 13. The computer-implemented method of claim 12 , wherein the method further comprises generating an expected database result from the first query input. 14. The computer-implemented method of claim 12 , wherein the method further comprises evaluating, by a model optimizer, performance criteria of the plurality of training models. 15. The computer-implemented method of claim 14 , wherein the performance criteria includes the similarity metric. 16. The computer-implemented method of claim 12 , wherein the method further comprises: determining, based on the first query input and a database language interpreter, an output data format; returning, based on a generation model and the output data format, a synthetic result of the first query input; extracting information from the generation model; and displaying the extracted information on the user interface. 17. The computer-implemented method of claim 12 , wherein the reference dataset includes data represented in at least one of a structured data format or a semi-structured data format. 18. The computer-implemented method of claim 2 , herein the first query input comprises customer financial information. 19. The computer-implemented method of claim 12 , wherein determining the type of the second query input includes using a database language interpreter to interpret a language associated with the second query input. 20. The comput

Assignees

Inventors

Classifications

  • G06F9/541Primary

    via adapters, e.g. between incompatible applications · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

  • Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title

  • based on specific statistical tests · CPC title

  • Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11513869B2 cover?
A system for returning synthetic database query results. The system may include a memory unit for storing instructions, and a processor configured to execute the instructions to perform operations comprising: receiving a query input by a user at a user interface; determining, based on natural language processing, a type of the query input; determining, based on the received query input and a da…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F9/541. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).