Processing dynamic data within an adaptive oracle-trained learning system using dynamic data set distribution optimization

US2022180250A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022180250-A1
Application numberUS-202117528514-A
CountryUS
Kind codeA1
Filing dateNov 17, 2021
Priority dateDec 23, 2013
Publication dateJun 9, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in deriving a model. Once a model is trained, the framework monitors the performance of the model and, in embodiments, leverages active learning and the oracle to generate feedback about the changing data for modifying training data sets while maintaining data quality to enable incremental adaptation of the model.

First claim

Opening claim text (preview).

1 - 20 . (canceled) 21 . A system, comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to: determine feature data indicative of a set of features associated with a training data set for a predictive model; determine, based on the feature data associated with the training data set, a training distribution representative of a goal for the predictive model; apply an input data instance to the predictive model to determine a label for the input data instance; add the input data instance to a data bin of a data reservoir based on the label, wherein the data reservoir comprises candidate training data for the training data set; in response to a determination that the training distribution representative of the goal for the predictive model comprises goal criteria associated with at least one label that corresponds to at least the label, select at least a portion of the candidate training data from at least the data bin of the data reservoir that comprises the input data instance; train the predictive model based on at least the portion of the candidate training data to generate an updated predictive model; and classify data collected from one or more network sources based on the updated predictive model. 22 . The system of claim 21 , wherein the one or more storage devices store instructions that are operable, when executed by the one or more computers, to further cause the one or more computers to: determine a confidence value associated with the label for the input data instance; and add the input data instance to the data bin of the data reservoir in response to the confidence value satisfying a defined confidence value threshold. 23 . The system of claim 21 , wherein the one or more storage devices store instructions that are operable, when executed by the one or more computers, to further cause the one or more computers to: configure a set of data bins of the data reservoir based on the training distribution representative of the goal for the predictive model. 24 . The system of claim 21 , wherein the one or more storage devices store instructions that are operable, when executed by the one or more computers, to further cause the one or more computers to: configure a set of data bins of the data reservoir based on configuration data related to size capacity for respective data bins. 25 . The system of claim 24 , wherein the one or more storage devices store instructions that are operable, when executed by the one or more computers, to further cause the one or more computers to: add the input data instance to a data bin of a data reservoir in response to the data bin satisfying size capacity criterion. 26 . The system of claim 24 , wherein the one or more storage devices store instructions that are operable, when executed by the one or more computers, to further cause the one or more computers to: replace a particular input data instance stored in the data bin of the data reservoir with the input data instance in response to the data bin not satisfying size capacity criterion. 27 . The system of claim 21 , wherein the one or more storage devices store instructions that are operable, when executed by the one or more computers, to further cause the one or more computers to: allocate respective labels for respective data bins of the data reservoir; and compare the label for the input data instance to the respective labels for the respective data bins of the data reservoir. 28 . A computer-implemented method, comprising: determining, by a computing device comprising a processor, feature data indicative of a set of features associated with a training data set for a predictive model; determining, by the computing device and based on the feature data associated with the training data set, a training distribution representative of a goal for the predictive model; applying, by the computing device, an input data instance to the predictive model to determine a label for the input data instance; adding, by the computing device, the input data instance to a data bin of a data reservoir based on the label, wherein the data reservoir comprises candidate training data for the training data set; in response to a determination that the training distribution representative of the goal for the predictive model comprises goal criteria associated with at least one label that corresponds to at least the label, selecting, by the computing device, at least a portion of the candidate training data from at least the data bin of the data reservoir that comprises the input data instance; training, by the computing device, the predictive model based on at least the portion of the candidate training data to generate an updated predictive model; and classifying, by the computing device, data collected from one or more network sources based on the updated predictive model. 29 . The computer-implemented method of claim 28 , further comprising: determining, by the computing device, a confidence value associated with the label for the input data instance; and adding, by the computing device, the input data instance to the data bin of the data reservoir in response to the confidence value satisfying a defined confidence value threshold. 30 . The computer-implemented method of claim 28 , further comprising: configuring, by the computing device, a set of data bins of the data reservoir based on the training distribution representative of the goal for the predictive model. 31 . The computer-implemented method of claim 28 , further comprising: configuring, by the computing device, a set of data bins of the data reservoir based on configuration data related to size capacity for respective data bins. 32 . The computer-implemented method of claim 31 , further comprising: adding, by the computing device, the input data instance to a data bin of a data reservoir in response to the data bin satisfying size capacity criterion. 33 . The computer-implemented method of claim 31 , further comprising: replacing, by the computing device, a particular input data instance stored in the data bin of the data reservoir with the input data instance in response to the data bin not satisfying size capacity criterion. 34 . The computer-implemented method of claim 28 , further comprising: allocating, by the computing device, respective labels for respective data bins of the data reservoir; and comparing, by the computing device, the label for the input data instance to the respective labels for the respective data bins of the data reservoir. 35 . A computer program product, stored on a computer readable medium, comprising instructions that when executed by one or more computers cause the one or more computers to: determine feature data indicative of a set of features associated with a training data set for a predictive model; determine, based on the feature data associated with the training data set, a training distribution representative of a goal for the predictive model; apply an input data instance to the predictive model to determine a label for the input data instance; add the input data instance to a data bin of a data reservoir based on the label, wherein the data reservoir comprises candidate training data for the training data set; in response to a determination that the training distribution representative of the goal for the predictive model comprises goal criteria associated with at least one label that corresponds to at least the label, select at least

Assignees

Inventors

Classifications

  • G06N20/00Primary

    Machine learning · CPC title

  • Inference or reasoning models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022180250A1 cover?
In general, embodiments of the present invention provide systems, methods and computer readable media for an adaptive oracle-trained learning framework for automatically building and maintaining models that are developed using machine learning algorithms. In embodiments, the framework leverages at least one oracle (e.g., a crowd) for automatic generation of high-quality training data to use in …
Who is the assignee on this patent?
Groupon Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 09 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).