Cold fusing sequence-to-sequence models with language models

US10867595B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10867595-B2
Application numberUS-201815913875-A
CountryUS
Kind codeB2
Filing dateMar 6, 2018
Priority dateMay 19, 2017
Publication dateDec 15, 2020
Grant dateDec 15, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for training a sequential-to-sequential (Seq2Seq) model, the method comprising: pre-training a language model (LM) with a set of training data to obtain a pre-trained language model; obtaining a hidden state of the Seq2Seq model based on an input sequence, in which the Seq2Seq model has not been pre-trained; combining a gated LM hidden state, which is generated using a LM hidden state obtained using an output of the pre-trained language model and a gate, with the hidden state obtained from the Seq2Seq model to form a combined hidden state; and using an output, which is obtained using a deep neural network (DNN) that takes as an input the combined hidden state, to train the Seq2Seq model. 2. The computer-implemented method of claim 1 wherein the set of training data are unlabeled training data. 3. The computer-implemented method of claim 1 wherein the language model was trained in at least one of a source domain and a target domain of the Seq2Seq model. 4. The computer-implemented method of claim 1 wherein the gate comprises the LM hidden state obtained using the output from the pre-trained language model and the hidden state from the Seq2Seq model as input. 5. The computer-implemented method of claim 4 wherein the LM hidden state obtained using the output from the pre-trained language model is an output of a deep neural network (DNN), which takes as input a logit output of the pre-trained language model and outputs the LM hidden state. 6. The computer-implemented method of claim 1 wherein the deep neural network (DNN) comprises an affine layer and activation. 7. The computer-implemented method of claim 6 wherein the output of the DNN is fed into a softmax to generate the output used for the Seq2Seq model training. 8. A computer-implemented method for training a sequential-to-sequential (Seq2Seq) model that has not been pretrained with a language model (LM) that has been pretrained, the method comprising: receiving, at an encoder of the Seq2Seq model, which has not been pretrained, an input sequence; generating, by the encoder, an intermediate representation of the input sequence; receiving, with at least one recurrent layer within a decoder of the Seq2Seq model, the intermediate representation; generating, by the least one recurrent layer, a hidden state of the Seq2Seq model based at least on the intermediate representation; combining the hidden state from the Seq2Seq model with a gated LM hidden state, which is generated using a gate and a LM hidden state obtained using an output from the language model that has been pretrained, into a combined hidden state; and generating an output using a deep neural network (DNN) that takes as an input the combined hidden state. 9. The computer-implemented method of claim 8 wherein the at least one recurrent layer within the decoder of the Seq2Seq model is gated recurrent unit (GRU) layer. 10. The computer-implemented method of claim 8 further comprises fine-tuning the Seq2Seq model with new data in a different domain. 11. The computer-implemented method of claim 8 wherein the encoder comprises one or more recurrent layers to generate the intermediate representation. 12. The computer-implemented method of claim 8 wherein the output of the language model is the LM hidden state, which represents an output hidden state of the language model. 13. The computer-implemented method of claim 11 wherein the encoder further comprises at least one max pooling layer coupled between the one or more recurrent layers. 14. The computer-implemented method of claim 8 wherein the gate uses the hidden state from the Seq2Seq model and the LM hidden state obtained using the output from the language model as inputs. 15. The computer-implemented method of claim 8 wherein the LM hidden state is an output of a deep neural network (DNN) that receives as input a logit output of the language model and outputs the LM hidden state. 16. The computer-implemented method of claim 8 wherein the gated LM hidden state and the hidden state from the Seq2Seq model are concatenated to generate the combined hidden state. 17. The computer-implemented method of claim 8 wherein the deep neural network (DNN) comprises an affine layer and activation. 18. The computer-implemented method of claim 15 wherein the DNN further comprises an affine layer prior to a softmax, the affine layer integrated with rectified linear unit (ReLU) activation. 19. A computer-implemented method for training a sequential-to-sequential (Seq2Seq) model, the method comprising: inputting an input sequence into the Seq2Seq model, which has not been pretrained; generating a hidden state of the Seq2Seq model; obtaining a combined hidden state based at least on the generated hidden state of the Seq2Seq model and a gated probability projection, which is generated using a gate and an output of a language model that has been pretrained; and using an output, which is obtained using a deep neural network (DNN) that takes as an input the combined hidden state, to train the Seq2Seq model. 20. The computer-implemented method of claim 19 wherein the probability projection comprises projecting a token distribution onto a common embedding space.

Assignees

Inventors

Classifications

  • characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title

  • Activation functions · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10867595B2 cover?
Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are …
Who is the assignee on this patent?
Baidu Usa Llc
What technology area does this patent fall under?
Primary CPC classification G06F18/2155. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 15 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).