Multi-stage training of machine learning models

US2023259769A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023259769-A1
Application numberUS-202318168027-A
CountryUS
Kind codeA1
Filing dateFeb 13, 2023
Priority dateFeb 16, 2022
Publication dateAug 17, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model to perform a machine learning task. In one aspect, a method includes: obtaining a set of training examples; obtaining, for each training example, a respective metadata label that characterizes the training example; and training the machine learning model over a sequence of training stages, including, at each training stage: identifying a selection criterion corresponding to the current training stage that defines a criterion for selecting training examples based on the metadata labels of the training examples; selecting a proper subset of the set training examples as training data for the current training stage in accordance with the selection criterion for the current training stage; and updating the machine learning model by training the machine learning model on the training data for the current training stage.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method performed by one or more computers for training a machine learning model to perform a machine learning task, the method comprising: obtaining a set of training examples; obtaining, for each training example, a respective metadata label that characterizes the training example; and training the machine learning model over a sequence of training stages, comprising, at each training stage before a last training stage in the sequence of training stages: identifying a selection criterion corresponding to the current training stage that defines a criterion for selecting training examples based on the metadata labels of the training examples, selecting a proper subset of the set training examples as training data for the current training stage in accordance with the selection criterion for the current training stage, updating the machine learning model by training the machine learning model on the training data for the current training stage, and providing the updated machine learning model for further training at a next training stage in the sequence of training stages. 2 . The method of claim 1 , wherein for each training example, the metadata label for the training example defines a timestamp corresponding to the training example. 3 . The method of claim 1 , wherein for each training example, the metadata label for the training example defines a geographic feature corresponding to the training example. 4 . The method of claim 1 , wherein for each training stage in the sequence of training stages: the selection criterion corresponding to the training stage specifies a set of allowable metadata labels for the training stage; and each training example is eligible for selection at the training stage only if the metadata label of the training example is included in the set of allowable metadata labels for the training stage. 5 . The method of claim 4 , wherein for each training stage after a first training stage in the sequence of training stages: a maximum metadata label in the set of allowable metadata labels for the training stage is greater than a maximum metadata label in the set of allowable metadata labels for a preceding training stage. 6 . The method of claim 4 , wherein for one or more training stages in the sequence of training stages: the selection criterion corresponding to the training stage specifies a respective selection weight for each metadata label in the set of allowable metadata labels; and selecting a proper subset of the set of training examples as training data for the current training stage comprises: determining a probability distribution, over training examples having metadata labels included in the set of allowable metadata labels for the training stage, using the selection weights for the allowable metadata labels; and sampling a plurality of training examples having metadata labels included in set of allowable metadata labels in accordance with the probability distribution. 7 . The method of claim 6 , wherein for one or more training stages in the sequence of training stages: the set of allowable metadata labels for the training stage comprises a plurality of metadata labels; and the selection criterion corresponding to the training stage specifies a higher selection weight for a maximum metadata label in the set of allowable metadata labels than for a minimum metadata label in the set of allowable metadata labels. 8 . The method of claim 1 , wherein for one or more training stages in the sequence of training stages, updating the machine learning model by training the machine learning model on the training data for the current training stage comprises: determining, for each training example in the training data for the current training stage, an error in a prediction generated by the machine learning model for the training example; updating the machine learning model using the errors in the predictions generated by the machine learning model for the training examples in the training data for the current training stage. 9 . The method of claim 8 , wherein the machine learning model is an ensemble model that comprises a plurality of base models, and wherein updating the machine learning model using the errors in the predictions generated by the machine learning model for training examples in the training data for the current training stage comprises: determining a prediction target for each training example in the training data for the current training stage based on the error in the prediction generated by the machine learning model for the training example; generating one or more new base models that are each trained to generate the prediction targets for the training examples in the training data for the current training stage; and adding the new base models to the ensemble model. 10 . The method of claim 9 , wherein the new base models are decision trees. 11 . The method of claim 8 , wherein updating the machine learning model using the errors in the predictions generated by the machine learning model for the training examples in the training data for the current training stage comprises: determining a respective weight factor for each training example in the training data for the current training stage based on the error in the prediction generated by the machine learning model for the training example; training the machine learning model on the training data for the current training stage using the weight factors for the training examples, wherein the weight factor for a training example controls an impact of the training example on the training of the machine learning model. 12 . The method of claim 11 , wherein the machine learning model comprises a neural network, and wherein training the machine learning model on the training data for the current training stage using the weight factors for the training examples comprises, for each training example: generating a prediction for the training example using the neural network; determining gradients, with respect to neural network parameters of the neural network, of an objective function that depends on the prediction for the training example; scaling the gradients using the weight factor for the training example; and updating the neural network parameters of the neural network using the scaled gradients. 13 . The method of claim 1 , wherein training the machine learning model at the last stage in the sequence of training stages comprises: identifying a selection criterion corresponding to the last training stage that defines a criterion for selecting training examples based on the metadata labels of the training examples; selecting a proper subset of the set training examples as training data for the last training stage in accordance with the selection criterion for the current training stage; updating the machine learning model by training the machine learning model on the training data for the last training stage; and providing the updated machine learning model for use in performing the machine learning task. 14 . The method of claim 1 , wherein each training example in the set of training examples comprises: (i) a training input to the machine learning model, and (ii) a target output to be generated by the machine learning model by processing the training input. 15 . The method of claim 14 , wherein the machine learning model performs a fire prediction task, wherein for each training example: (i) the training input characterizes a geographic region, and (ii) the target output defines, for each of one or more spatial locations in the

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • G06N20/20Primary

    Ensemble learning · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Combinations of networks · CPC title

  • using kernel methods, e.g. support vector machines [SVM] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023259769A1 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model to perform a machine learning task. In one aspect, a method includes: obtaining a set of training examples; obtaining, for each training example, a respective metadata label that characterizes the training example; and training the machine learning model over …
Who is the assignee on this patent?
X Dev Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Aug 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).