What technology area does this patent fall under?

Primary CPC classification G06F9/541. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for synthetic data generation

US11615208B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11615208-B2
Application number	US-201816151407-A
Country	US
Kind code	B2
Filing date	Oct 4, 2018
Priority date	Jul 6, 2018
Publication date	Mar 28, 2023
Grant date	Mar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A cloud computing system can be configured to generate data models. A model optimizer of the cloud computing system can provision computing resources of the cloud computing system with a data model. A dataset generator of the cloud computing system can generate a synthetic dataset for training the data model. The computing resources can train the data model using the synthetic dataset. The model optimizer can store the data model and metadata of the data model in a model storage. The cloud computing system can receive production data from a data source by a production instance of the cloud computing system using a common file system. The production data can be processed using the data model by the production instance. The computing resources, the dataset generator, and the model optimizer can be hosted by separate virtual computing instances of the cloud computing system.

First claim

Opening claim text (preview).

What is claimed is: 1. A cloud computing system for generating data models, comprising: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the cloud computing system to perform operations comprising: provisioning, by a model optimizer, computing resources with a data model; retrieving, by the model optimizer, a reference dataset; normalizing, by a dataset generator, the reference dataset, the normalizing comprising: identifying categorical data within the reference dataset, and converting categorical data to numerical values; receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized reference dataset and an output dataset of the data model; generating, by the dataset generator, a synthetic dataset for training the data model; training, by the computing resources, the data model using the synthetic dataset, the training comprising: generating an output dataset using the data model, generating, based on a comparison of the output dataset and the normalized reference dataset, a similarity metric of the data model, generating a prediction metric of the data model, evaluating the similarity metric against the similarity criterion, evaluating the prediction metric against a prediction criterion, and updating the data model based on the evaluations of the similarity metric and prediction metric, the updating comprising: penalizing generation of synthetic data by adding a penalty term to a loss function to penalize a calculated loss if a dissimilarity between the synthetic data and the normalized reference dataset goes below a certain threshold; repeating the training until the similarity criterion is met by the similarity metric and the prediction criterion is met by the prediction metric; in response to the similarity criterion being met by the similarity metric and the prediction criterion being met by the prediction metric, storing, by the model optimizer in a model storage, the data model and metadata of the data model, wherein the metadata of the data model comprises at least the similarity metric and the prediction metric; receiving production data from a data source by a production instance; and processing the production data using the data model. 2. The cloud computing system of claim 1 , wherein the data model is provisioned in response to a model generation request received by the model optimizer from an interface, wherein the generation request comprises at least data describing a type of the data model to be generated. 3. The cloud computing system of claim 1 , wherein the operations further comprise: extracting, by the model optimizer, from the metadata of the data model, the similarity metric and the prediction metric, evaluating, by the model optimizer, the similarity metric of the data model; evaluating, by the model optimizer, the prediction metric of the data model; and determining, by a model curator, that the data model satisfies governance criteria. 4. The cloud computing system of claim 1 , wherein the similarity metric comprises at least one of a statistical correlation score, data similarity score, or data quality score, and the prediction metric includes at least one of a prediction accuracy verification, a prediction accuracy cross validation, a regression verification, a regression cross validation, or a principal component analysis. 5. The cloud computing system of claim 1 , wherein generating the synthetic dataset for training the data model comprises: retrieving a synthetic dataset model from the model storage; retrieving a training dataset from a database; and generating the synthetic dataset using the synthetic dataset model and the training dataset. 6. The cloud computing system of claim 5 , wherein generating the synthetic dataset using the synthetic dataset model and the training dataset comprises: identifying a sensitive portion of the training dataset using a recurrent neural network, wherein the sensitive portion comprises personal information. 7. The cloud computing system of claim 6 , wherein the operations further comprise: receiving a data sequence, wherein the data sequence comprises at least one of an account number, a social security number, a name, or an address; receiving a context sequence, wherein the context sequence comprises snippets of data drawn from a text database; generating a training sequence by inserting the data sequence into the context sequence; generating a label sequence indicating a position of the inserted data sequence in the training sequence, wherein the label sequence comprises at least two characters identifying different types of data; and training the recurrent neural network using the training sequence and the label sequence. 8. The cloud computing system of claim 7 , wherein: the training sequence includes inserted data sequences; and the label sequence indicates at least one of differing classes among the inserted data sequences and differing subclasses among the inserted data sequences. 9. The cloud computing system of claim 8 , wherein training the recurrent neural network using the training sequence and the label sequence comprises: estimating a label by applying a subset of the training sequence to the recurrent neural network; comparing the estimated label to an actual label in the label sequence, the actual label corresponding to the subset; and updating the recurrent neural network according to a loss function based on a result of the comparison. 10. The cloud computing system of claim 9 , wherein the actual label corresponds to an element of the subset occupying the same position in the training sequence as the actual label occupies in the label sequence. 11. A method for generating data models, comprising: receiving, by a model optimizer from an interface, a data model generation request, wherein the generation request comprises at least data describing a type of the data model to be generated; provisioning, by the model optimizer, computing resources with a data model; retrieving, by the model optimizer, a reference dataset; normalizing, by a dataset generator, the reference dataset, the normalizing comprising: identifying categorical data within the reference dataset, and converting categorical data to numerical values; receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized reference dataset and an output dataset of the data model; generating, by the dataset generator, a synthetic dataset for training the data model; training, by the computing resources, the data model using the synthetic dataset the training comprising: generating an output dataset using the data model; generating, based on a comparison of the output dataset and the normalized reference dataset, a similarity metric of the data model, generating a prediction metric of the data model, evaluating the similarity metric against the similarity criterion, evaluating the prediction metric against a prediction criterion, and updating the data model based on the evaluations of the similarity metric and prediction metric, the updating comprising: penalizing generation of synthetic data by adding a penalty term to a loss function to penalize a calculated loss if a dissimilarity between the synthetic data and the normalized reference dataset goes below a certain threshold; repeating the training until the similarity criterion is met by the similarity metric and the prediction criterion is met by

Assignees

Capital One Services Llc

Inventors

Classifications

G06T11/10
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0985
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/094
Adversarial learning · CPC title

Patent family

Related publications grouped by family.

View patent family 67543579

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11615208B2 cover?: A cloud computing system can be configured to generate data models. A model optimizer of the cloud computing system can provision computing resources of the cloud computing system with a data model. A dataset generator of the cloud computing system can generate a synthetic dataset for training the data model. The computing resources can train the data model using the synthetic dataset. The mode…
Who is the assignee on this patent?: Capital One Services Llc
What technology area does this patent fall under?: Primary CPC classification G06F9/541. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).