Training recurrent neural networks to generate sequences

US11954594B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11954594-B1
Application numberUS-202117315695-A
CountryUS
Kind codeB1
Filing dateMay 10, 2021
Priority dateJun 5, 2015
Publication dateApr 9, 2024
Grant dateApr 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This document generally describes a neural network training system, including one or more computers, that trains a recurrent neural network (RNN) to receive an input, e.g., an input sequence, and to generate a sequence of outputs from the input sequence. In some implementations, training can include, for each position after an initial position in a training target sequence, selecting a preceding output of the RNN to provide as input to the RNN at the position, including determining whether to select as the preceding output (i) a true output in a preceding position in the output order or (ii) a value derived from an output of the RNN for the preceding position in an output order generated in accordance with current values of the parameters of the recurrent neural network.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training a sequence generation model, wherein: for each particular position after an initial position in an output order of a target sequence, the sequence generation model is configured (i) to receive an input that is based on a set of preceding output scores that the sequence generation model generated for a prediction at a preceding position in the output order and (ii) to generate a current set of output scores for the particular position in the output order, and the current set of output scores for the particular position in the output order comprises a respective score for each of a plurality of possible predictions for the particular position, and the method comprising: obtaining a plurality of training data pairs for the sequence generation model, each training data pair comprising a training input and a training target sequence for the training input, each training target sequence comprising a respective plurality of true outputs arranged according to an output order; and training the sequence generation model on the training data pairs, comprising, for each training data pair and for each particular position after an initial position in the output order of the training target sequence of the training data pair, selecting an input to provide to the sequence generation model for generating an output at the particular position in the output order of the training target sequence, wherein the input is selected from a group comprising (i) a non-predicted input that is based on the true output from a preceding position in the output order of the training target sequence of the training data pair, and (ii) a predicted input that is based on a set of preceding output scores that the sequence generation model generated at a preceding model. 2. The method of claim 1 , wherein training the sequence generation model on the training data pairs comprises, for each training data pair and for each particular position after the initial position in the output order of the training target sequence of the training data pair: determining an error between the output generated by the sequence generation model at the particular position in the output order and the true output indicated by the training data pair for the particular position in the output order; and using the error to adjust values of the trainable parameters of the sequence generation model. 3. The method of claim 1 , wherein the predicted input from a preceding position in the output order is a prediction from the preceding position in the output order that scored highest among all possible predictions for the preceding position. 4. The method of claim 1 , wherein selecting the input to provide to the sequence generation model for generating the output at the particular position in the output order of the training target sequence comprises evaluating a stochastic function, wherein the stochastic function assigns a probability of 1−ε to the option of selecting the non-predicted input from the preceding position in the output order as the input to the sequence generation model at the particular position in the output order, and wherein the stochastic function assigns a probability of c to the option of selecting the predicted input from the preceding position in the output order as the input to the sequence generation model at the particular position in the output order. 5. The method of claim 4 , comprising increasing the value of c as training of the sequence generation model progresses, such that relatively lower values of c are applied earlier in the training of the sequence generation model and relatively higher values of c are applied later in the training of the sequence generation model. 6. The method of claim 5 , wherein increasing the value of c comprises increasing the value of c using linear decay. 7. The method of claim 5 , wherein increasing the value of c comprises increasing the value of c using exponential decay or inverse sigmoid decay. 8. A system comprising one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations for training a sequence generation model, wherein: for each particular position after an initial position in an output order of a target sequence, the sequence generation model is configured (i) to receive an input that is based on a set of preceding output scores that the sequence generation model generated for a prediction at a preceding position in the output order and (ii) to generate a current set of output scores for the particular position in the output order, and the current set of output scores for the particular position in the output order comprises a respective score for each of a plurality of possible predictions for the particular position, and the operations comprising: obtaining a plurality of training data pairs for the sequence generation model, each training data pair comprising a training input and a training target sequence for the training input, each training target sequence comprising a respective plurality of true outputs arranged according to an output order; and training the sequence generation model on the training data pairs, comprising, for each training data pair and for each particular position after an initial position in the output order of the training target sequence of the training data pair, selecting an input to provide to the sequence generation model for generating an output at the particular position in the output order of the training target sequence, wherein the input is selected from a group comprising (i) a non-predicted input that is based on the true output from a preceding position in the output order of the training target sequence of the training data pair, and (ii) a predicted input that is based on a set of preceding output scores that the sequence generation model generated at a preceding model. 9. The system of claim 8 , wherein training the sequence generation model on the training data pairs comprises, for each training data pair and for each particular position after the initial position in the output order of the training target sequence of the training data pair: determining an error between the output generated by the sequence generation model at the particular position in the output order and the true output indicated by the training data pair for the particular position in the output order; and using the error to adjust values of the trainable parameters of the sequence generation model. 10. The system of claim 8 , wherein the predicted input from a preceding position in the output order is a prediction from the preceding position in the output order that scored highest among all possible predictions for the preceding position. 11. The system of claim 8 , wherein selecting the input to provide to the sequence generation model for generating the output at the particular position in the output order of the training target sequence comprises evaluating a stochastic function, wherein the stochastic function assigns a probability of 1−ε to the option of selecting the non-predicted input from the preceding position in the output order as the input to the sequence generation model at the particular position in the output order, and wherein the stochastic function assigns a probability of c to the option of selecting the predicted input from the preceding position in the output order as the input to the sequence generation model at the particular position in the output order. 12. The system of claim 11 , wherein the operations comprise increasing the value of ε as

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Supervised learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11954594B1 cover?
This document generally describes a neural network training system, including one or more computers, that trains a recurrent neural network (RNN) to receive an input, e.g., an input sequence, and to generate a sequence of outputs from the input sequence. In some implementations, training can include, for each position after an initial position in a training target sequence, selecting a precedin…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).