Deep reinforcement learning for robotic manipulation
US-2019232488-A1 · Aug 1, 2019 · US
US10936949B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10936949-B2 |
| Application number | US-201916508042-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 10, 2019 |
| Priority date | Feb 24, 2017 |
| Publication date | Mar 2, 2021 |
| Grant date | Mar 2, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.
Opening claim text (preview).
What is claimed is: 1. A method of training a machine learning model having a plurality of model parameters to determine trained values of the model parameters from initial values of the model parameters, wherein values of the model parameters are defined by a posterior distribution over possible values of the model parameters, the method comprising: receiving training data for training the machine learning model on a plurality of tasks, wherein each task comprises a respective plurality of batches of training data; and training the machine learning model on the training data, wherein during the training, posterior distribution parameters that parameterize the posterior distribution are optimized such that the trained values of the model parameters are defined by trained values of the posterior distribution parameters, wherein the training comprises, at each of a plurality of training iterations: selecting a task from the plurality of tasks in accordance with a current task selection policy; selecting a batch of training data from the plurality of batches of training data for the selected task; training the machine learning model on the selected batch of training data to determine updated values of the model parameters from current values of the model parameters, comprising training the machine learning model on the selected batch of training data to determine adjusted values of the posterior distribution parameters from current values of the posterior distribution parameters; determining a learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data; and updating the current task selection policy based on the learning progress measure, comprising: determining a payoff achieved at the training iteration from the learning progress measure; and updating the current task selection policy using the payoff to encourage selection of tasks that maximize a cumulative measure of payoffs achieved over the plurality of training iterations. 2. The method of claim 1 , wherein training the machine learning model on the selected batch comprises training the machine learning model to decrease a loss on the selected batch as measured by a loss function, and wherein the learning progress measure is based on a decrease in the loss as a result of training the machine learning model on the selected batch of training data. 3. The method of claim 2 , wherein determining the learning progress measure comprises: determining a first loss on the selected batch in accordance with the current values of the model parameters; and determining a second loss on the selected batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 4. The method claim 1 , wherein determining the learning progress measure comprises: sampling a new batch from the plurality of batches in the selected task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 5. The method of claim 1 , wherein one of the tasks is identified as a target task that includes training inputs that are most similar to inputs to be processed by the machine learning model after the training of the machine learning model on the training data, wherein determining the learning progress measure comprises: sampling a new batch from the plurality of batches in the target task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 6. The method of claim 1 , wherein determining the learning progress measure comprises: sampling a task randomly from the plurality of tasks; sampling a new batch from the plurality of batches in the sampled task; determining a first loss on the new batch in accordance with the current values of the model parameters; and determining a second loss on the new batch in accordance with the updated values of the model parameters, and wherein the learning progress measure comprises a difference between the first loss and the second loss. 7. The method of claim 1 , wherein the learning progress measure comprises a norm of a gradient vector of gradients of the loss function with respect to the model parameters generated by training the machine learning model on the selected batch. 8. The method of claim 1 , wherein the learning progress measure is based on an increase in model complexity of the machine learning model as a result of training the machine learning model on the selected batch of training data. 9. The method of claim 1 , wherein determining the learning progress measure comprises: determining a first Kullback-Leibler (KL) divergence between (i) the posterior distribution as defined by the updated values of the posterior distribution parameters and (ii) a prior distribution over possible values for the model parameters; and determining a second KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) a prior distribution over possible values for the model parameters, and wherein the learning progress measure comprises a difference between the first KL divergence and the second KL divergence. 10. The method of claim 9 , wherein the prior distribution is defined by prior distribution parameters, wherein training the machine learning model on the selected batch of training data comprises determining adjusted values of the prior distribution parameters from current values of the prior distribution parameters, wherein the first KL divergence is a KL divergence between (i) the posterior distribution as defined by the updated values of the posterior distribution parameters and (ii) the prior distribution as defined by the updated values of the prior distribution parameters, and wherein the second KL divergence is a KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) the prior distribution as defined by the current values of the prior distribution parameters. 11. The method of claim 1 , wherein training the machine learning model on the selected batch of training data comprises determining adjusted values of prior distribution parameters from current values of prior distribution parameters, wherein the prior distribution parameters parametrize a prior distribution over possible values for the model parameters, and wherein the learning progress measure is based on a) a gradient with respect to the posterior distribution parameters and the prior distribution parameters of a KL divergence between (i) the posterior distribution as defined by the current values of the posterior distribution parameters and (ii) the prior distribution as defined by the current values of the prior distribution parameters and b) a gradient with respect to the posterior distribution parameters of the expectation of a loss on the selected batch as measured by a loss function. 12. The method of claim 1 , wherein the learning progress measure is based on a difference between a first norm of a vector of
Learning methods · CPC title
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.