Cold fusing sequence-to-sequence models with language models

US11620986B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620986-B2
Application numberUS-202017061455-A
CountryUS
Kind codeB2
Filing dateOct 1, 2020
Priority dateMay 19, 2017
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training a sequential-to-sequential (Seq2Seq) model, the method comprising: given a set of training input sequences, for each input sequence from the set of training input sequences: generating a language model (LM) output using a pretrained language model, which has been pretrained with one or more sets of training data to obtain the pre trained pretrained language model; generating a state output of the Seq2Seq model using the Seq2Seq model, in which the Seq2Seq model has not been pretrained; combining a gated LM output, which is generated using the LM output from the pretrained language model and a gate, with the state output obtained from the Seq2Seq model to form a fused state; and using outputs obtained using the fused states for the input sequences from the set of training input sequences to train the Seq2Seq model. 2. The computer-implemented method of claim 1 wherein at least some of the training data of at least one of the one or more sets of training data are unlabeled training data. 3. The computer-implemented method of claim 1 further comprising the step of: generating the language model output using a neural network that takes as input a logit output of the language model and outputs the language model output. 4. The computer-implemented method of claim 3 wherein neural network comprises an affine layer and activation. 5. The computer-implemented method of claim 1 wherein the step of using outputs obtained using the fused states for the input sequences from the set of training input sequences to train the Seq2Seq model comprises: for a fused state, inputting the fused state into a neural network that takes as input the fused state and outputs a neural network output. 6. The computer-implemented method of claim 5 wherein the neural network output of the neural network is fed into a softmax to generate the output used in training the Seq2Seq model. 7. A computer-implemented method comprising: receiving, at an encoder of a sequence-to-sequence (Seq2Seq) model, an input sequence; generating, by the encoder, an intermediate representation of the input sequence; receiving, with at least one recurrent layer within a decoder of the Seq2Seq model, the intermediate representation; generating, by the least one recurrent layer, a state of the Seq2Seq model based at least on the intermediate representation; combining the state from the Seq2Seq model with a gated LM state, which is generated using a gate and a LM state obtained using an output from a language model into a fused state; and generating an output using the fused state; wherein the Seq2Seq was trained without pretraining the Seq2Seq model but the language model was pretrained. 8. The computer-implemented method of claim 7 wherein the Seq2Seq model was trained by performing the steps comprising: receiving, at the encoder of the Seq2Seq model, which has not been pretrained, an input sequence; generating, by the encoder, an intermediate representation of the input sequence; receiving, with at least one recurrent layer within a decoder of the Seq2Seq model, the intermediate representation; generating, by the least one recurrent layer, a state of the Seq2Seq model based at least on the intermediate representation; combining a hidden state from the Seq2Seq model with a gated LM state, which is generated using a gate and a LM state obtained using an output from a language model that has been pretrained, into a fused state; generating a training output using the fused state; and using the training output in training the Seq2Seq model. 9. The computer-implemented method of claim 8 further comprises fine-tuning the Seq2Seq model with new data in a different domain. 10. The computer-implemented method of claim 7 wherein the output of the language model is the LM state, which represents an output state of the language model. 11. The computer-implemented method of claim 7 wherein the LM state is an output of a neural network that receives as input a logit output of the language model and outputs the LM state. 12. The computer-implemented method of claim 7 wherein the gated LM state and the hidden state from the Seq2Seq model are concatenated in generating the fused state. 13. The computer-implemented method of claim 7 wherein the step of generating a training output using the fused state comprises: using a neural network that receives the fused state as an input. 14. The computer-implemented method of claim 13 wherein the neural network comprises an affine layer prior to a softmax, the affine layer integrated with rectified linear unit (ReLU) activation. 15. A computer-implemented method comprising: generating an intermediate state for an input using a sequence-to-sequence (Seq2Seq) model that was trained by performing steps comprising: generating a training intermediate state for an input training sequence using the Seq2Seq model; obtaining a training fused state based at least on the training intermediate state of the Seq2Seq model and a training gated state, in which the training gated state is generated using a gate and a training language model output from a pretrained language model for the input training sequence; generating a training output using the training fused state; and using a training output in training the Seq2Seq model; obtaining a fused state based at least on the intermediate state of the Seq2Seq model and a gated state, in which the gated state is generated using a gate and a language model output from a language model for the input; and generating an output using the fused state. 16. The computer-implemented of claim 15 wherein the step of generating a training output using the fused state comprises: using a neural network that operates on the training fused state in generating the training output. 17. The computer-implemented of claim 16 wherein the neural network comprises an affine layer integrated with an activation and a softmax. 18. The computer-implemented of claim 15 wherein the output from the pretrained language model for the input sequence is a probability projection of the pretrained language model. 19. The computer-implemented of claim 15 wherein the language model is different from the pretrained language model. 20. The computer-implemented of claim 15 wherein the output is a natural language phrase or sentence.

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Learning methods · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Transfer learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620986B2 cover?
Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are …
Who is the assignee on this patent?
Baidu Usa Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).