Cold fusing sequence-to-sequence models with language models

US2018336884A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018336884-A1
Application numberUS-201815913875-A
CountryUS
Kind codeA1
Filing dateMar 6, 2018
Priority dateMay 19, 2017
Publication dateNov 22, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for training a sequential-to-sequential (Seq2Seq) model, the method comprising: pre-training a language model (LM) with a set of training data; obtaining a hidden state of the Seq2Seq model based on an input sequence; combining a LM hidden state obtained from the pre-trained language model with the obtained hidden state from the Seq2Seq model into a combined hidden state; and using output obtained from the combined hidden state to train the Seq2Seq model. 2 . The computer-implemented method of claim 1 wherein the set of training data are unlabeled training data. 3 . The computer-implemented method of claim 1 wherein the language model was trained in at least one of a source domain and a target domain of the Seq2Seq model. 4 . The computer-implemented method of claim 1 wherein combining the LM hidden state from the pre-trained language model with the hidden state from the Seq2Seq model comprises a gated computation using both the hidden state from the language model and the hidden state from the Seq2Seq model as input. 5 . The computer-implemented method of claim 1 wherein combining the LM hidden state from the pre-trained language model with the hidden state from the Seq2Seq model comprises using a different gate value for each hidden node of the pre-trained language model's state. 6 . The computer-implemented method of claim 1 further comprises using a deep neural network (DNN) to generate a logit input based on the output obtained from the combined hidden state. 7 . The computer-implemented method of claim 6 wherein the logit input is fed into a softmax to generate a distribution of probability for the Seq2Seq model training. 8 . A computer-implemented method for training a sequential to sequential (Seq2Seq) model with a language model (LM), the method comprising: receiving, at an encoder of the Seq2Seq model, an input sequence in a source domain; generating, by the encoder, an intermediate representation of the input sequence; receiving, with at least one recurrent layer within a decoder of the Seq2Seq model, the intermediate representation; generating, by the least one recurrent layer, a hidden state of the Seq2Seq model based at least on the intermediate representation; combining the generated hidden state with a LM hidden state from the language model into a combined hidden state; and generating, by the decoder, a logit output based on the combined hidden state in a target domain. 9 . The computer-implemented method of claim 8 wherein the at least one recurrent layer within the decoder of the Seq2Seq model is gated recurrent unit (GRU) layer. 10 . The computer-implemented method of claim 8 further comprises fine-tuning the Seq2Seq model with new data in a domain different from the source domain and the target domain. 11 . The computer-implemented method of claim 8 wherein the encoder comprises one or more recurrent layers to generate the intermediate representation. 12 . The computer-implemented method of claim 11 wherein the one or more recurrent layers are bi-directional long short term memory (LSTM) layers. 13 . The computer-implemented method of claim 11 wherein the encoder further comprises at least one max pooling layer coupled between the one or more recurrent layers. 14 . The computer-implemented method of claim 8 wherein combining the generated hidden state with the hidden state from the language model comprises a gated computation using both the hidden state from the language model and the hidden state from the Seq2Seq model as input. 15 . The computer-implemented method of claim 14 wherein an output from the gated computation is combined with hidden state from the language model using an element-wise multiplication for a multiplication result. 16 . The computer-implemented method of claim 15 wherein the multiplication result and the hidden state of the Seq2Seq model are concatenated to generate the combined hidden state. 17 . The computer-implemented method of claim 8 wherein the logit output based on the combined hidden state is generated by a deep neural network (DNN) within the decoder. 18 . The computer-implemented method of claim 8 wherein the DNN further comprises an affine layer prior to a softmax, the affine layer integrated with rectified linear unit (ReLU) activation. 19 . A computer-implemented method for training a sequential to sequential (Seq2Seq) model, the method comprising: receiving an input sequence to the Seq2Seq model; generating a hidden state of the Seq2Seq model; obtaining a combined hidden state based at least on the generated hidden state of the Seq2Seq model and a probability projection across a plurality of language models; and using output from the combined hidden state to train the Seq2Seq model. 20 . The computer-implemented method of claim 19 wherein the probability projection comprises projecting a token distribution onto a common embedding space.

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Activation functions · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018336884A1 cover?
Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are …
Who is the assignee on this patent?
Baidu Usa Llc, Baidu Usa Llc
What technology area does this patent fall under?
Primary CPC classification G06F18/2155. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).