Training machine learning models using task selection policies to increase learning progress

US10936949B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10936949-B2
Application numberUS-201916508042-A
CountryUS
Kind codeB2
Filing dateJul 10, 2019
Priority dateFeb 24, 2017
Publication dateMar 2, 2021
Grant dateMar 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of training a machine learning model having a plurality of model parameters to determine trained values of the model parameters from initial values of the model parameters, wherein values of the model parameters are defined by a posterior distribution over possible values of the model parameters, the method comprising: receiving training data for training the machine learning model on a plurality of tasks, wherein each task comprises a respective plurality of batches of training data; and training the machine learning model on the training data, wherein during the training, posterior distribution parameters that parameterize the posterior distribution are optimized such that the trained values of the model parameters are defined by trained values of the posterior distribution parameters, wherein the training comprises, at each of a plurality of training iterations: selecting a task from the plurality of tasks in accordance with a current task selection policy; selecting a batch of training data from the plurality of batches of training data for the selected task; training the machine learning model on the selected batch of training data to determine updated values of the model parameters from current values of the model parameters, comprising training the machine learning model on the selected batch of training data to determine adjusted values of the posterior distribution parameters from current values of the posterior distribution parameters; determining a learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data; and updating the current task selection policy based on the learning progress measure, comprising: determining a payoff achieved at the training iteration from the learning progress measure; and updating the current task selection policy using the payoff to encourage selection of tasks that maximize a cumulative measure of payoffs achieved over the plurality of training iterations. 2. The method of claim 1 , wherein training the machine learning model on the selected batch comprises training the machine learning model to decrease a loss on the selected batch as measured by a loss function, and wherein the learning progress measure is based on a decrease in the loss as a result of training the machine learning model on the selected batch of training data. 3. The method of claim 2 , wherein determining the learning progress measure comprises: determining a first loss on the selected batch in accordance with the current values of the model parameters; and determining a second loss on the selected batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 4. The method claim 1 , wherein determining the learning progress measure comprises: sampling a new batch from the plurality of batches in the selected task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 5. The method of claim 1 , wherein one of the tasks is identified as a target task that includes training inputs that are most similar to inputs to be processed by the machine learning model after the training of the machine learning model on the training data, wherein determining the learning progress measure comprises: sampling a new batch from the plurality of batches in the target task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 6. The method of claim 1 , wherein determining the learning progress measure comprises: sampling a task randomly from the plurality of tasks; sampling a new batch from the plurality of batches in the sampled task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 7. The method of claim 1 , wherein the learning progress measure comprises a norm of a gradient vector of gradients of the loss function with respect to the model parameters generated by training the machine learning model on the selected batch. 8. The method of claim 1 , wherein the learning progress measure is based on an increase in model complexity of the machine learning model as a result of training the machine learning model on the selected batch of training data. 9. The method of claim 1 , wherein determining the learning progress measure comprises: determining a first Kullback-Leibler (KL) divergence between (i) the posterior distribution as defined by the updated values of the posterior distribution parameters and (ii) a prior distribution over possible values for the model parameters; and determining a second KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) a prior distribution over possible values for the model parameters, and wherein the learning progress measure comprises a difference between the first KL divergence and the second KL divergence. 10. The method of claim 9 , wherein the prior distribution is defined by prior distribution parameters, wherein training the machine learning model on the selected batch of training data comprises determining adjusted values of the prior distribution parameters from current values of the prior distribution parameters, wherein the first KL divergence is a KL divergence between (i) the posterior distribution as defined by the updated values of the posterior distribution parameters and (ii) the prior distribution as defined by the updated values of the prior distribution parameters, and wherein the second KL divergence is a KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) the prior distribution as defined by the current values of the prior distribution parameters. 11. The method of claim 1 , wherein training the machine learning model on the selected batch of training data comprises determining adjusted values of prior distribution parameters from current values of prior distribution parameters, wherein the prior distribution parameters parametrize a prior distribution over possible values for the model parameters, and wherein the learning progress measure is based on a) a gradient with respect to the posterior distribution parameters and the prior distribution parameters of a KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) the prior distribution as defined by the current values of the prior distribution parameters and b) a gradient with respect to the posterior distribution parameters of the expectation of a loss on the selected batch as measured by a loss function. 12. The method of claim 1 , wherein the learning progress measure is based on a difference between a first norm of a vector of

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06N3/09Primary

    Supervised learning · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10936949B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batc…
Who is the assignee on this patent?
Deepmind Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).