What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Training machine learning models using task selection policies to increase learning progress

US10936949B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10936949-B2
Application number	US-201916508042-A
Country	US
Kind code	B2
Filing date	Jul 10, 2019
Priority date	Feb 24, 2017
Publication date	Mar 2, 2021
Grant date	Mar 2, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of training a machine learning model having a plurality of model parameters to determine trained values of the model parameters from initial values of the model parameters, wherein values of the model parameters are defined by a posterior distribution over possible values of the model parameters, the method comprising: receiving training data for training the machine learning model on a plurality of tasks, wherein each task comprises a respective plurality of batches of training data; and training the machine learning model on the training data, wherein during the training, posterior distribution parameters that parameterize the posterior distribution are optimized such that the trained values of the model parameters are defined by trained values of the posterior distribution parameters, wherein the training comprises, at each of a plurality of training iterations: selecting a task from the plurality of tasks in accordance with a current task selection policy; selecting a batch of training data from the plurality of batches of training data for the selected task; training the machine learning model on the selected batch of training data to determine updated values of the model parameters from current values of the model parameters, comprising training the machine learning model on the selected batch of training data to determine adjusted values of the posterior distribution parameters from current values of the posterior distribution parameters; determining a learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data; and updating the current task selection policy based on the learning progress measure, comprising: determining a payoff achieved at the training iteration from the learning progress measure; and updating the current task selection policy using the payoff to encourage selection of tasks that maximize a cumulative measure of payoffs achieved over the plurality of training iterations. 2. The method of claim 1 , wherein training the machine learning model on the selected batch comprises training the machine learning model to decrease a loss on the selected batch as measured by a loss function, and wherein the learning progress measure is based on a decrease in the loss as a result of training the machine learning model on the selected batch of training data. 3. The method of claim 2 , wherein determining the learning progress measure comprises: determining a first loss on the selected batch in accordance with the current values of the model parameters; and determining a second loss on the selected batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 4. The method claim 1 , wherein determining the learning progress measure comprises: sampling a new batch from the plurality of batches in the selected task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 5. The method of claim 1 , wherein one of the tasks is identified as a target task that includes training inputs that are most similar to inputs to be processed by the machine learning model after the training of the machine learning model on the training data, wherein determining the learning progress measure comprises: sampling a new batch from the plurality of batches in the target task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 6. The method of claim 1 , wherein determining the learning progress measure comprises: sampling a task randomly from the plurality of tasks; sampling a new batch from the plurality of batches in the sampled task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 7. The method of claim 1 , wherein the learning progress measure comprises a norm of a gradient vector of gradients of the loss function with respect to the model parameters generated by training the machine learning model on the selected batch. 8. The method of claim 1 , wherein the learning progress measure is based on an increase in model complexity of the machine learning model as a result of training the machine learning model on the selected batch of training data. 9. The method of claim 1 , wherein determining the learning progress measure comprises: determining a first Kullback-Leibler (KL) divergence between (i) the posterior distribution as defined by the updated values of the posterior distribution parameters and (ii) a prior distribution over possible values for the model parameters; and determining a second KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) a prior distribution over possible values for the model parameters, and wherein the learning progress measure comprises a difference between the first KL divergence and the second KL divergence. 10. The method of claim 9 , wherein the prior distribution is defined by prior distribution parameters, wherein training the machine learning model on the selected batch of training data comprises determining adjusted values of the prior distribution parameters from current values of the prior distribution parameters, wherein the first KL divergence is a KL divergence between (i) the posterior distribution as defined by the updated values of the posterior distribution parameters and (ii) the prior distribution as defined by the updated values of the prior distribution parameters, and wherein the second KL divergence is a KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) the prior distribution as defined by the current values of the prior distribution parameters. 11. The method of claim 1 , wherein training the machine learning model on the selected batch of training data comprises determining adjusted values of prior distribution parameters from current values of prior distribution parameters, wherein the prior distribution parameters parametrize a prior distribution over possible values for the model parameters, and wherein the learning progress measure is based on a) a gradient with respect to the posterior distribution parameters and the prior distribution parameters of a KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) the prior distribution as defined by the current values of the prior distribution parameters and b) a gradient with respect to the posterior distribution parameters of the expectation of a loss on the selected batch as measured by a loss function. 12. The method of claim 1 , wherein the learning progress measure is based on a difference between a first norm of a vector of

Assignees

Deepmind Tech Ltd

Inventors

Classifications

G06N3/08Primary
Learning methods · CPC title
G06N3/0985
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09Primary
Supervised learning · CPC title
G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

View patent family 61244625

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10936949B2 cover?: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batc…
Who is the assignee on this patent?: Deepmind Tech Ltd
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 02 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).