Memory-efficient backpropagation through time

US11256990B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11256990-B2
Application numberUS-201716303101-A
CountryUS
Kind codeB2
Filing dateMay 19, 2017
Priority dateMay 20, 2016
Publication dateFeb 22, 2022
Grant dateFeb 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a recurrent neural network on training sequences using backpropagation through time. In one aspect, a method includes receiving a training sequence including a respective input at each of a number of time steps; obtaining data defining an amount of memory allocated to storing forward propagation information for use during backpropagation; determining, from the number of time steps in the training sequence and from the amount of memory allocated to storing the forward propagation information, a training policy for processing the training sequence, wherein the training policy defines when to store forward propagation information during forward propagation of the training sequence; and training the recurrent neural network on the training sequence in accordance with the training policy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method performed by one or more computers for training a recurrent neural network on a plurality of training sequences using backpropagation through time, the method comprising: receiving a training sequence comprising a respective input at each of a number of time steps; determining a training policy for processing the training sequence, wherein the training policy defines when to store forward propagation information during forward propagation of the training sequence, the training policy being such as to balance a trade-off between caching of forward propagation information and re-computation of forward propagation information during backpropagation; wherein determining the training policy comprises: (i) obtaining data defining an amount of memory allocated to storing forward propagation information for use during backpropagation; (ii) determining forward propagation information in the form of candidates for storage, wherein the candidate forward propagation information can include hidden states for time steps of the recurrent neural network and/or internal states for time steps of the recurrent neural network; (iii) identifying one or more strategies for storing forward propagation information from memory, wherein each strategy includes a sequence of candidates for storage that will be stored if following the strategy; (iv) determining a computational cost for each strategy; wherein the computational cost Q 1 (t, m, y) of a strategy that includes saving a hidden state of a time step after choosing not to save hidden states for y time steps is given by: Q 1 ( t,m,y )= y+C ( y,m )+ C ( t−y,m− 1) where t is a number of time steps over which backpropagation is performed, m is a number of available memory units, C(y, m) is the lowest possible computational cost of backpropagating over a sequence of y time steps given the amount of allocated memory equal to m; and C(t−y, m−1) is the lowest possible computational cost of backpropagating over a sequence of t−y time steps given the amount of allocated memory equal to m−1, and the computational cost Q 2 (t, m, y) of a strategy that includes saving internal states of a time step after choosing not to save internal states for y time steps is given by: Q 2 ( t,m,y )= y+C ( y− 1, m )+ C ( t−y,m −α) where t is a number of time steps over which backpropagation is performed, m is a number of available memory units, α is a ratio of size of internal states for time steps to the size of hidden states for time steps, C(y−1, m) is the lowest possible computational cost of backpropagating over a sequence of y−1 time steps given the amount of allocated memory equal to m, and C(t−y, m−α) is the lowest possible computational cost of backpropagating over a sequence of t−y time steps given the amount of allocated memory equal to m−α; (v) selecting the strategy having the lowest computational cost; (vi) determining the position of next storage and type of forward propagation from the selected strategy; (vii) dividing the sequence of time steps in the training sequence to obtain a first subsequence and second subsequence, the first subsequence including time steps before the determined position of next storage, the second subsequence including time steps after the determined position of next storage; and recursively repeating steps (i)-(vii) for each obtained subsequence to select a position of next forward propagation information to save; the method further comprising: training the recurrent neural network on the training sequence in accordance with the training policy by: forward-propagating the inputs in the training sequence from a first time step in the sequence to a last time step in the sequence; during the forward propagating, storing forward propagation information in accordance with the training policy; and backpropagating gradients from the last time step in the sequence to the first time step in the sequence, comprising: determining, for each time step, whether additional forward propagation information is necessary to backpropagate the gradient for the time step and, if so, regenerating the additional forward propagation information using the stored forward propagation information. 2. The method of claim 1 , wherein the forward propagation information includes hidden states. 3. The method of claim 1 , wherein the forward propagation information includes internal states. 4. The method of claim 1 , wherein the forward propagation information includes both hidden states and internal states. 5. The method of claim 4 , wherein the training policy defines, for each time step, whether to store a hidden state, an internal state, or neither for the time step. 6. The method of claim 1 , wherein: determining the policy comprises: for each time slot, determining a computational cost of storing an associated piece of forward propagation information based on the number of time steps and the amount of memory allocated to storing the forward propagation information; adding to the training policy a set of pieces of forward propagation information that are determined to have the lowest computational cost; and training the recurrent neural network on the training sequence in accordance with the training policy comprises storing the set of pieces of forward propagation information that have been added to the training policy. 7. The method of claim 1 , further comprising: determining a computational cost of training the recurrent neural network on the training sequence in accordance with the policy. 8. The method of claim 7 , further comprising: providing data identifying the computational cost for presentation to a user. 9. One or more non-transitory computer storage media encoded with instructions that, when executed by one or more computers, cause the one or more computers to perform operations for training a recurrent neural network on a plurality of training sequences using backpropagation through time, the operations comprising: receiving a training sequence comprising a respective input at each of a number of time steps; determining a training policy for processing the training sequence, wherein the training policy defines when to store forward propagation information during forward propagation of the training sequence, the training policy being such as to balance a trade-off between caching of forward propagation information and re-computation of forward propagation information during backpropagation; wherein determining the training policy comprises: (i) obtaining data defining an amount of memory allocated to storing forward propagation information for use during backpropagation; (ii) determining forward propagation information in the form of candidates for storage, wherein the candidate forward propagation information can include hidden states for time steps of the recurrent neural network and/or internal states for time steps of the recurrent neural network; (iii) identifying one or more strategies for storing forward propagation information from memory, wherein each strategy includes a sequence of candidates for storage that will be stored if following the strategy; (iv) determining a computational cost for each strategy; wherein the computational cost Q 1 (t, m, y) of a strategy that includes saving a hidden state of a time step after choosing not to save hidden states for y time steps is given by: Q 1 ( t,m,y )= y+C ( y,m )+ C ( t−y,m− 1) where t is a number of time steps over which backpropagation is performed, m is a number of available memory units, C(y, m) is the lowest possible computational cost of backpropagating over a sequence of y time steps given the amount of allocated memory equal to m; and C(t−y, m−1) is the lowe

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11256990B2 cover?
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a recurrent neural network on training sequences using backpropagation through time. In one aspect, a method includes receiving a training sequence including a respective input at each of a number of time steps; obtaining data defining an amount of memory allocated to storing forward …
Who is the assignee on this patent?
Deepmind Tech Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).