Adaptive sampling of training data for machine learning models based on PAC-bayes analysis of risk bounds

US11200511B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11200511-B1
Application numberUS-201715817068-A
CountryUS
Kind codeB1
Filing dateNov 17, 2017
Priority dateNov 17, 2017
Publication dateDec 14, 2021
Grant dateDec 14, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

At a machine learning service, an indication of a training data set for a model is obtained. One or more training iterations of the model are conducted using an adaptive input sampling strategy. In a particular iteration, index values for a set of training observations are selected based on a set of sampling weights, parameters of the model are updated based on results using training observations identified by the index values, and sampling weights are modified. A result obtained from a trained version of the machine learning model is provided.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: one or more computing devices of a machine learning service; wherein the one or more computing devices are configured to: obtain an indication of a training data set to be used for a machine learning model, wherein the training data set comprises a plurality of observation records, and wherein a learning algorithm of the machine learning model meets one or more stability criteria associated with determining risk bounds for the learning algorithm using PAC-Bayesian analysis; determine, based at least in part on an analysis of at least a portion of the training data set, that adaptive sampling is to be used to select observation records in one or more training iterations of the machine learning model; implement the one or more training iterations, wherein a particular training iteration of the one or more training iterations comprises: selecting one or more index values based at least in part on a set of sampling weights assigned to the plurality of observation records, wherein the selected one or more index values respectively indicate one or more observation records selected out of the plurality of observation records of the training data set to train the machine learning model in the particular training iteration; updating a set of parameters of the machine learning model using the learning algorithm, based at least in part on a result, obtained using the machine learning model, with respect to the one or more observation records selected out of the plurality of observation records indicated respectively by the one or more selected index values; and modifying the set of sampling weights assigned to the plurality of observation records based at least in part on a utility function, an amplitude parameter, and a decay parameter, wherein modification of the set of sampling weights updates the selection of one or more index values in a next training iteration such that a probability of selection for at least some of the observation records is changed for the next training iteration, wherein the utility function is related to an objective of the learning algorithm, wherein the amplitude parameter controls aggressiveness of the modification of the set of sampling weights, and wherein the decay parameter decreases an effect of prior modification of the set of sampling weights in a past training iteration; and provide a result obtained from a trained version of the machine learning model with respect to a particular observation record, wherein the particular observation record is not part of the training data set. 2. The system as recited in claim 1 , wherein the learning algorithm comprises a stochastic gradient descent algorithm. 3. The system as recited in claim 1 , wherein the one or more computing devices are configured to: determine that a request to train the machine learning model has been received via a programmatic interface of a machine learning service of a provider network. 4. The system as recited in claim 1 , wherein selecting the one or more index values comprises traversing a tree data structure, wherein individual ones of leaf nodes of the tree data structure correspond to respective index values. 5. The system as recited in claim 1 , wherein a result of the utility function is based at least in part on one or more of: (a) a training iteration count, (b) a current set of parameters of the machine learning model or (c) the one or more observation records indicated respectively by the one or more selected index values. 6. A method, comprising: performing, by one or more computing devices: obtaining an indication of a training data set to be used for a machine learning model, wherein the training data set comprises a plurality of observation records, and wherein a learning algorithm of the machine learning model meets one or more stability criteria; implementing one or more training iterations of the machine learning model using an adaptive input sampling strategy, wherein a particular training iteration of the one or more training iterations comprises: selecting one or more index values based at least in part on a set of sampling weights assigned to the plurality of observation records, wherein an individual index value of the one or more index values indicates a particular observation record selected out of the training data set to train the machine learning model in the particular training iteration; updating a set of parameters of the machine learning model using the learning algorithm, based at least in part on a result, obtained using the machine learning model, with respect to one or more observation records indicated respectively by the one or more selected index values; and modifying the set of sampling weights assigned to the plurality of observation records based at least in part on a utility function, an amplitude parameter, and a decay parameter, wherein modification of the set of sampling weights updates the selection of one or more index values in a next training iteration such that a probability of selection for at least some of the observation records is changed for the next training iteration, wherein the utility function is related to an objective of the learning algorithm, wherein the amplitude parameter controls aggressiveness of the modification of the set of sampling weights, and wherein the decay parameter decreases an effect of prior modification of the set of sampling weights in a past training iteration; and providing a result obtained from a trained version of the machine learning model with respect to a particular observation record, wherein the particular observation record is not part of the training data set. 7. The method as recited in claim 6 , wherein the learning algorithm comprises a stochastic gradient descent algorithm. 8. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: determining that adaptive sampling is to be used for the one or more iterations based at least in part on an analysis of at least a portion of the training data set. 9. The method as recited in claim 8 , wherein the analysis of at least the portion of the training data set comprises one or more of: (a) determining that a distribution of one or more attributes of the observation records meets a skew criterion, or (b) determining that a size of the training data set exceeds a threshold. 10. The method as recited in claim 6 , wherein a result of the utility function is based at least in part on one or more of: (a) a training iteration count, (b) a current set of parameters of the machine learning model or (c) the one or more observation records indicated respectively by the one or more selected index values. 11. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: determining that a request to train the machine learning model has been received via a programmatic interface of a machine learning service of a provider network. 12. The method as recited in claim 6 , further comprising performing, by the one or more computing devices: determining that a request to utilize adaptive sampling for the machine learning model has been received via a programmatic interface of a machine learning service of a provider network. 13. The method as recited in claim 6 , wherein selecting the one or more index values comprises traversing a tree data structure, wherein individual ones of leaf nodes of the tree data structure correspond to respective index values. 14. The method as recited in claim 13 , wherein a particular training iteration of the one or more training iterations co

Assignees

Inventors

Classifications

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Combinations of networks · CPC title

  • Supervised learning · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11200511B1 cover?
At a machine learning service, an indication of a training data set for a model is obtained. One or more training iterations of the model are conducted using an adaptive input sampling strategy. In a particular iteration, index values for a set of training observations are selected based on a set of sampling weights, parameters of the model are updated based on results using training observatio…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 14 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).