What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Cold fusing sequence-to-sequence models with language models

US11620986B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11620986-B2
Application number	US-202017061455-A
Country	US
Kind code	B2
Filing date	Oct 1, 2020
Priority date	May 19, 2017
Publication date	Apr 4, 2023
Grant date	Apr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training a sequential-to-sequential (Seq2Seq) model, the method comprising: given a set of training input sequences, for each input sequence from the set of training input sequences: generating a language model (LM) output using a pretrained language model, which has been pretrained with one or more sets of training data to obtain the pre trained pretrained language model; generating a state output of the Seq2Seq model using the Seq2Seq model, in which the Seq2Seq model has not been pretrained; combining a gated LM output, which is generated using the LM output from the pretrained language model and a gate, with the state output obtained from the Seq2Seq model to form a fused state; and using outputs obtained using the fused states for the input sequences from the set of training input sequences to train the Seq2Seq model. 2. The computer-implemented method of claim 1 wherein at least some of the training data of at least one of the one or more sets of training data are unlabeled training data. 3. The computer-implemented method of claim 1 further comprising the step of: generating the language model output using a neural network that takes as input a logit output of the language model and outputs the language model output. 4. The computer-implemented method of claim 3 wherein neural network comprises an affine layer and activation. 5. The computer-implemented method of claim 1 wherein the step of using outputs obtained using the fused states for the input sequences from the set of training input sequences to train the Seq2Seq model comprises: for a fused state, inputting the fused state into a neural network that takes as input the fused state and outputs a neural network output. 6. The computer-implemented method of claim 5 wherein the neural network output of the neural network is fed into a softmax to generate the output used in training the Seq2Seq model. 7. A computer-implemented method comprising: receiving, at an encoder of a sequence-to-sequence (Seq2Seq) model, an input sequence; generating, by the encoder, an intermediate representation of the input sequence; receiving, with at least one recurrent layer within a decoder of the Seq2Seq model, the intermediate representation; generating, by the least one recurrent layer, a state of the Seq2Seq model based at least on the intermediate representation; combining the state from the Seq2Seq model with a gated LM state, which is generated using a gate and a LM state obtained using an output from a language model into a fused state; and generating an output using the fused state; wherein the Seq2Seq was trained without pretraining the Seq2Seq model but the language model was pretrained. 8. The computer-implemented method of claim 7 wherein the Seq2Seq model was trained by performing the steps comprising: receiving, at the encoder of the Seq2Seq model, which has not been pretrained, an input sequence; generating, by the encoder, an intermediate representation of the input sequence; receiving, with at least one recurrent layer within a decoder of the Seq2Seq model, the intermediate representation; generating, by the least one recurrent layer, a state of the Seq2Seq model based at least on the intermediate representation; combining a hidden state from the Seq2Seq model with a gated LM state, which is generated using a gate and a LM state obtained using an output from a language model that has been pretrained, into a fused state; generating a training output using the fused state; and using the training output in training the Seq2Seq model. 9. The computer-implemented method of claim 8 further comprises fine-tuning the Seq2Seq model with new data in a different domain. 10. The computer-implemented method of claim 7 wherein the output of the language model is the LM state, which represents an output state of the language model. 11. The computer-implemented method of claim 7 wherein the LM state is an output of a neural network that receives as input a logit output of the language model and outputs the LM state. 12. The computer-implemented method of claim 7 wherein the gated LM state and the hidden state from the Seq2Seq model are concatenated in generating the fused state. 13. The computer-implemented method of claim 7 wherein the step of generating a training output using the fused state comprises: using a neural network that receives the fused state as an input. 14. The computer-implemented method of claim 13 wherein the neural network comprises an affine layer prior to a softmax, the affine layer integrated with rectified linear unit (ReLU) activation. 15. A computer-implemented method comprising: generating an intermediate state for an input using a sequence-to-sequence (Seq2Seq) model that was trained by performing steps comprising: generating a training intermediate state for an input training sequence using the Seq2Seq model; obtaining a training fused state based at least on the training intermediate state of the Seq2Seq model and a training gated state, in which the training gated state is generated using a gate and a training language model output from a pretrained language model for the input training sequence; generating a training output using the training fused state; and using a training output in training the Seq2Seq model; obtaining a fused state based at least on the intermediate state of the Seq2Seq model and a gated state, in which the gated state is generated using a gate and a language model output from a language model for the input; and generating an output using the fused state. 16. The computer-implemented of claim 15 wherein the step of generating a training output using the fused state comprises: using a neural network that operates on the training fused state in generating the training output. 17. The computer-implemented of claim 16 wherein the neural network comprises an affine layer integrated with an activation and a softmax. 18. The computer-implemented of claim 15 wherein the output from the pretrained language model for the input sequence is a probability projection of the pretrained language model. 19. The computer-implemented of claim 15 wherein the language model is different from the pretrained language model. 20. The computer-implemented of claim 15 wherein the output is a natural language phrase or sentence.

Assignees

Baidu Usa Llc

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/08
Learning methods · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06N3/096
Transfer learning · CPC title

Patent family

Related publications grouped by family.

View patent family 64272033

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620986B2 cover?: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are …
Who is the assignee on this patent?: Baidu Usa Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).