Distributed, multi-model, self-learning platform for machine learning
US-2016132787-A1 · May 12, 2016 · US
US11615208B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11615208-B2 |
| Application number | US-201816151407-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 4, 2018 |
| Priority date | Jul 6, 2018 |
| Publication date | Mar 28, 2023 |
| Grant date | Mar 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A cloud computing system can be configured to generate data models. A model optimizer of the cloud computing system can provision computing resources of the cloud computing system with a data model. A dataset generator of the cloud computing system can generate a synthetic dataset for training the data model. The computing resources can train the data model using the synthetic dataset. The model optimizer can store the data model and metadata of the data model in a model storage. The cloud computing system can receive production data from a data source by a production instance of the cloud computing system using a common file system. The production data can be processed using the data model by the production instance. The computing resources, the dataset generator, and the model optimizer can be hosted by separate virtual computing instances of the cloud computing system.
Opening claim text (preview).
What is claimed is: 1. A cloud computing system for generating data models, comprising: at least one processor; and at least one non-transitory memory storing instructions that, when executed by the at least one processor cause the cloud computing system to perform operations comprising: provisioning, by a model optimizer, computing resources with a data model; retrieving, by the model optimizer, a reference dataset; normalizing, by a dataset generator, the reference dataset, the normalizing comprising: identifying categorical data within the reference dataset, and converting categorical data to numerical values; receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized reference dataset and an output dataset of the data model; generating, by the dataset generator, a synthetic dataset for training the data model; training, by the computing resources, the data model using the synthetic dataset, the training comprising: generating an output dataset using the data model, generating, based on a comparison of the output dataset and the normalized reference dataset, a similarity metric of the data model, generating a prediction metric of the data model, evaluating the similarity metric against the similarity criterion, evaluating the prediction metric against a prediction criterion, and updating the data model based on the evaluations of the similarity metric and prediction metric, the updating comprising: penalizing generation of synthetic data by adding a penalty term to a loss function to penalize a calculated loss if a dissimilarity between the synthetic data and the normalized reference dataset goes below a certain threshold; repeating the training until the similarity criterion is met by the similarity metric and the prediction criterion is met by the prediction metric; in response to the similarity criterion being met by the similarity metric and the prediction criterion being met by the prediction metric, storing, by the model optimizer in a model storage, the data model and metadata of the data model, wherein the metadata of the data model comprises at least the similarity metric and the prediction metric; receiving production data from a data source by a production instance; and processing the production data using the data model. 2. The cloud computing system of claim 1 , wherein the data model is provisioned in response to a model generation request received by the model optimizer from an interface, wherein the generation request comprises at least data describing a type of the data model to be generated. 3. The cloud computing system of claim 1 , wherein the operations further comprise: extracting, by the model optimizer, from the metadata of the data model, the similarity metric and the prediction metric, evaluating, by the model optimizer, the similarity metric of the data model; evaluating, by the model optimizer, the prediction metric of the data model; and determining, by a model curator, that the data model satisfies governance criteria. 4. The cloud computing system of claim 1 , wherein the similarity metric comprises at least one of a statistical correlation score, data similarity score, or data quality score, and the prediction metric includes at least one of a prediction accuracy verification, a prediction accuracy cross validation, a regression verification, a regression cross validation, or a principal component analysis. 5. The cloud computing system of claim 1 , wherein generating the synthetic dataset for training the data model comprises: retrieving a synthetic dataset model from the model storage; retrieving a training dataset from a database; and generating the synthetic dataset using the synthetic dataset model and the training dataset. 6. The cloud computing system of claim 5 , wherein generating the synthetic dataset using the synthetic dataset model and the training dataset comprises: identifying a sensitive portion of the training dataset using a recurrent neural network, wherein the sensitive portion comprises personal information. 7. The cloud computing system of claim 6 , wherein the operations further comprise: receiving a data sequence, wherein the data sequence comprises at least one of an account number, a social security number, a name, or an address; receiving a context sequence, wherein the context sequence comprises snippets of data drawn from a text database; generating a training sequence by inserting the data sequence into the context sequence; generating a label sequence indicating a position of the inserted data sequence in the training sequence, wherein the label sequence comprises at least two characters identifying different types of data; and training the recurrent neural network using the training sequence and the label sequence. 8. The cloud computing system of claim 7 , wherein: the training sequence includes inserted data sequences; and the label sequence indicates at least one of differing classes among the inserted data sequences and differing subclasses among the inserted data sequences. 9. The cloud computing system of claim 8 , wherein training the recurrent neural network using the training sequence and the label sequence comprises: estimating a label by applying a subset of the training sequence to the recurrent neural network; comparing the estimated label to an actual label in the label sequence, the actual label corresponding to the subset; and updating the recurrent neural network according to a loss function based on a result of the comparison. 10. The cloud computing system of claim 9 , wherein the actual label corresponds to an element of the subset occupying the same position in the training sequence as the actual label occupies in the label sequence. 11. A method for generating data models, comprising: receiving, by a model optimizer from an interface, a data model generation request, wherein the generation request comprises at least data describing a type of the data model to be generated; provisioning, by the model optimizer, computing resources with a data model; retrieving, by the model optimizer, a reference dataset; normalizing, by a dataset generator, the reference dataset, the normalizing comprising: identifying categorical data within the reference dataset, and converting categorical data to numerical values; receiving, by the dataset generator, a similarity criterion, the similarity criterion including a predetermined difference in value between the normalized reference dataset and an output dataset of the data model; generating, by the dataset generator, a synthetic dataset for training the data model; training, by the computing resources, the data model using the synthetic dataset the training comprising: generating an output dataset using the data model; generating, based on a comparison of the output dataset and the normalized reference dataset, a similarity metric of the data model, generating a prediction metric of the data model, evaluating the similarity metric against the similarity criterion, evaluating the prediction metric against a prediction criterion, and updating the data model based on the evaluations of the similarity metric and prediction metric, the updating comprising: penalizing generation of synthetic data by adding a penalty term to a loss function to penalize a calculated loss if a dissimilarity between the synthetic data and the normalized reference dataset goes below a certain threshold; repeating the training until the similarity criterion is met by the similarity metric and the prediction criterion is met by
Texturing; Colouring; Generation of textures or colours (retouching, inpainting or scratch removal G06T5/77) · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
Supervised learning · CPC title
Adversarial learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.