Processing dynamic data within an adaptive oracle-trained learning system using dynamic data set distribution optimization

US11210604B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11210604-B1
Application numberUS-201414578210-A
CountryUS
Kind codeB1
Filing dateDec 19, 2014
Priority dateDec 23, 2013
Publication dateDec 28, 2021
Grant dateDec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in deriving a model. Once a model is trained, the framework monitors the performance of the model and, in embodiments, leverages active learning and the oracle to generate feedback about the changing data for modifying training data sets while maintaining data quality to enable incremental adaptation of the model.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system comprising at least one repository and at least one server comprising at least one processor and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the system to implement an adaptive learning framework for automatically building and maintaining a predictive model for processing dynamic data, wherein the adaptive learning framework is configured to: generate, using the predictive model, model output based on processing an input data instance received by the adaptive learning framework, wherein the model output comprises a judgment and a confidence value representing certainty of the judgment, and wherein the predictive model is generated using machine learning and based upon a training data set; determine feature data indicative of a set of features associated with the training data set; determine, based on the feature data associated with the training data set, a training distribution representative of a goal for the predictive model; evaluate the input data instance and the model output based in part on a configuration parameter; and store, in a data reservoir, data instances that have been processed by the predictive model, wherein the data reservoir comprises a pool of possible training data for the training data set, and wherein the data reservoir is further configured to be discretized into a set of data bins maintained by a data set optimizer; and wherein the data set optimizer is configured to perform operations comprising: determining, based on the evaluating, using a supervised machine learning model, of the input data instance and the model output, whether to update the data reservoir using the input data instance; and in an instance in which the data reservoir is to be updated, adding the input data instance to a selected data bin of the set of data bins based on a label associated with a data source for the input data instance and to facilitate training of the predictive model based on the training data set, each data bin of the set of data bins storing data representing a unique data source associated with a unique label, the selected data bin selected based on the label associated with the data source of the input data instance satisfying evaluation criteria associated with the selected data bin; in response to a determination that the training distribution representative of the goal for the predictive model comprises goal criteria associated with at least one label that corresponds to at least the label, select at least a portion of the training data from at least the selected data bin of the set of data bins that comprises the input data instance; and train the predictive model based on the training data. 2. The system of claim 1 , wherein the data reservoir is discretized into the set of data bins based on a desired overall statistical distribution of data in the data reservoir. 3. The system of claim 2 , wherein the set of data bins store data used to assess calibration of the predictive model. 4. The system of claim 2 , wherein the set of data bins store data used to optimize feature modeling by the adaptive learning framework. 5. The system of claim 1 , wherein the unique label for the unique data source represents a unique location. 6. The system of claim 1 , wherein the input data instance and the model output are evaluated based on matching at least one of the judgment and the confidence value to attributes of data that are respectively stored within each data bin. 7. The system of claim 1 , wherein determining, based on the evaluating, using the supervised machine learning model, of the input data instance and the model output, whether to update the data reservoir using the input data instance further comprises: determining, by an input data evaluator associated with the data bin, whether attributes of the input data instance satisfy evaluation criteria associated with the data bin; and updating the data bin using the input data instance in an instance in which the attributes of the input data instance satisfy the evaluation criteria. 8. The system of claim 7 , wherein the evaluation criteria include at least one of a size capacity for the data bin and a range of evaluation values. 9. The system of claim 1 , wherein the training distribution is associated with a distribution of labels. 10. An apparatus comprising at least one processor and at least one memory storing instructions that, when executed on the at least one processor, cause the processor to implement an adaptive learning framework for automatically building and maintaining a predictive model for processing dynamic data, wherein the adaptive learning framework is configured to: generate, using the predictive model, model output based on processing an input data instance received by the adaptive learning framework, wherein the model output comprises a judgment and a confidence value representing certainty of the judgment, and wherein the predictive model is generated using machine learning and based upon a training data set; evaluate the input data instance and the model output based in part on a configuration parameter; determine feature data indicative of a set of features associated with the training data set; determine, based on the feature data associated with the training data set, a training distribution representative of a goal for the predictive model; store, in a data reservoir, data instances that have been processed by the predictive model, wherein the data reservoir comprises a pool of possible training data for the training data set, and wherein the data reservoir is further configured to be discretized into a set of data bins maintained by a data set optimizer; and wherein the data set optimizer is configured to perform operations comprising: determining, based on the evaluating, using a supervised machine learning model, of the input data instance and the model output, whether to update the data reservoir using the input data instance; and in an instance in which the data reservoir is to be updated, adding the input data instance to a selected data bin of the set of data bins based on a label associated with a data source for the input data instance and to facilitate training of the predictive model based on the training data set, each data bin of the set of data bins storing data representing a unique data source associated with a unique label, the selected data bin selected based on the label associated with the data source of the input data instance satisfying evaluation criteria associated with the selected data bin; in response to a determination that the training distribution representative of the goal for the predictive model comprises goal criteria associated with at least one label that corresponds to at least the label, select at least a portion of the training data from at least the selected data bin of the set of data bins that comprises the input data instance; and train the predictive model based on the training data. 11. The apparatus of claim 10 , wherein the data reservoir is discretized into the set of data bins based on a desired overall statistical distribution of data in the data reservoir. 12. The apparatus of claim 11 , wherein the set of data bins store data used to assess calibration of the predictive model. 13. The apparatus of claim 11 , wherein the set of data bins store data used to optimize feature modeling by the adaptive learning framework. 14. The apparatus of claim 10 , wherein the unique label for the unique data source represents a uniq

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210604B1 cover?
In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in …
Who is the assignee on this patent?
Groupon Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).