Training machine learning models by determining update rules using recurrent neural networks

US11615310B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11615310-B2
Application numberUS-201716302592-A
CountryUS
Kind codeB2
Filing dateMay 19, 2017
Priority dateMay 20, 2016
Publication dateMar 28, 2023
Grant dateMar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update rule for the model parameters using a recurrent neural network (RNN); and applying a determined update rule for a final time step in a sequence of multiple time steps to the model parameters.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by one or more computers, comprising: obtaining a machine learning model, wherein (i) the machine learning model has a plurality of model parameters, and (ii) the machine learning model is trained using gradient descent techniques to optimize an objective function; and for each time step in a plurality of time steps: determining an update rule for the plurality of model parameters for the time step using a recurrent neural network (RNN) having a plurality of RNN parameters, wherein the RNN is different from the machine learning model and the RNN parameters are different from the plurality of model parameters, wherein the RNN is configured to operate coordinate-wise with respect to the plurality of model parameters, wherein operating coordinate-wise with respect to the plurality of model parameters comprises operating the RNN independently on each of the plurality of model parameters of the machine learning model, and wherein the determining comprises: for each particular model parameter of the plurality of model parameters, processing, using the RNN and in accordance with values of the RNN parameters for the time step, a parameter-specific input that is specific for the particular model parameter of the plurality of model parameters for the time step that comprises a gradient of the objective function with respect to the particular model parameter for the time step to generate a respective RNN output for the particular model parameter for the time step that specifies the update rule for the particular model parameter of the plurality of model parameters for the time step, wherein the RNN shares one or more of the plurality of RNN parameters across the plurality of model parameters and maintains a separate hidden state for each particular model parameter of the plurality of model parameters; applying the update rule for the time step generated by the RNN to values of the plurality of model parameters for the time step to update the values of the model parameters; and training the RNN on an RNN objective function that depends on respective values of the plurality of model parameters that have been at the time step and at each of one or more preceding time steps in the plurality of time steps, comprising determining an update to the values of the RNN parameters at the time step that minimizes the RNN objective function for the time step using gradient descent techniques. 2. The method of claim 1 , wherein applying the update rule for a final time step in the plurality of time steps to the plurality of model parameters generates trained values of the plurality of model parameters. 3. The method of claim 1 , wherein the machine learning model comprises a neural network. 4. The method of claim 1 , wherein the determined update rule for the plurality of model parameters that minimizes the objective function is given by θ t+1 =θ t +g t (∇ f (θ t ),ϕ) wherein θ t represents values of the plurality of model parameters at time t, ∇f(θ t ) represents the gradient of objective function ƒ, ϕ represents RNN parameters and g t represents the RNN output for a time step t. 5. The method of claim 1 , wherein the RNN implements separate activations for each model parameter of the plurality of model parameters. 6. The method of claim 1 , wherein the RNN is a long short-term memory (LSTM) neural network. 7. The method of claim 6 , wherein the LSTM neural network comprises two LSTM layers. 8. The method of claim 6 , wherein the LSTM neural network shares one or more of the plurality of RNN parameters across different coordinates of the objective function. 9. The method of claim 6 , wherein a subset of cells in each of one or more LSTM layers of the LSTM neural network comprise global average units, wherein a global average unit is a unit whose update includes averaging activations of the global average units globally at each time step across different coordinates of the objective function. 10. The method of claim 1 , wherein the RNN is invariant to an order of the plurality of model parameters. 11. The method of claim 1 , further comprising providing a previous hidden state of the RNN as input to the RNN at each time step. 12. The method of claim 1 , wherein, at each time step, the update rule for the time step depends on the hidden state of the RNN for the time step. 13. The method of claim 1 , wherein the RNN objective function is given by ℒ ⁡ ( ϕ ) = E f ⁡ [ ∑ t = 1 T ⁢ w t ⁢ f ⁡ ( θ t ) ] ⁢ where θ t + 1 = θ t + g t , ⁢ [ g t h t + 1 ] = m ⁡ ( ∇ t ⁢ , h t ,

Assignees

Inventors

Classifications

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11615310B2 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media for training machine learning models. One method includes obtaining a machine learning model, wherein the machine learning model comprises one or more model parameters, and the machine learning model is trained using gradient descent techniques to optimize an objective function; determining an update …
Who is the assignee on this patent?
Deepmind Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).