Implementing a classification model for recognition processing
US-10529318-B2 · Jan 7, 2020 · US
US2018336884A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018336884-A1 |
| Application number | US-201815913875-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 6, 2018 |
| Priority date | May 19, 2017 |
| Publication date | Nov 22, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for training a sequential-to-sequential (Seq2Seq) model, the method comprising: pre-training a language model (LM) with a set of training data; obtaining a hidden state of the Seq2Seq model based on an input sequence; combining a LM hidden state obtained from the pre-trained language model with the obtained hidden state from the Seq2Seq model into a combined hidden state; and using output obtained from the combined hidden state to train the Seq2Seq model. 2 . The computer-implemented method of claim 1 wherein the set of training data are unlabeled training data. 3 . The computer-implemented method of claim 1 wherein the language model was trained in at least one of a source domain and a target domain of the Seq2Seq model. 4 . The computer-implemented method of claim 1 wherein combining the LM hidden state from the pre-trained language model with the hidden state from the Seq2Seq model comprises a gated computation using both the hidden state from the language model and the hidden state from the Seq2Seq model as input. 5 . The computer-implemented method of claim 1 wherein combining the LM hidden state from the pre-trained language model with the hidden state from the Seq2Seq model comprises using a different gate value for each hidden node of the pre-trained language model's state. 6 . The computer-implemented method of claim 1 further comprises using a deep neural network (DNN) to generate a logit input based on the output obtained from the combined hidden state. 7 . The computer-implemented method of claim 6 wherein the logit input is fed into a softmax to generate a distribution of probability for the Seq2Seq model training. 8 . A computer-implemented method for training a sequential to sequential (Seq2Seq) model with a language model (LM), the method comprising: receiving, at an encoder of the Seq2Seq model, an input sequence in a source domain; generating, by the encoder, an intermediate representation of the input sequence; receiving, with at least one recurrent layer within a decoder of the Seq2Seq model, the intermediate representation; generating, by the least one recurrent layer, a hidden state of the Seq2Seq model based at least on the intermediate representation; combining the generated hidden state with a LM hidden state from the language model into a combined hidden state; and generating, by the decoder, a logit output based on the combined hidden state in a target domain. 9 . The computer-implemented method of claim 8 wherein the at least one recurrent layer within the decoder of the Seq2Seq model is gated recurrent unit (GRU) layer. 10 . The computer-implemented method of claim 8 further comprises fine-tuning the Seq2Seq model with new data in a domain different from the source domain and the target domain. 11 . The computer-implemented method of claim 8 wherein the encoder comprises one or more recurrent layers to generate the intermediate representation. 12 . The computer-implemented method of claim 11 wherein the one or more recurrent layers are bi-directional long short term memory (LSTM) layers. 13 . The computer-implemented method of claim 11 wherein the encoder further comprises at least one max pooling layer coupled between the one or more recurrent layers. 14 . The computer-implemented method of claim 8 wherein combining the generated hidden state with the hidden state from the language model comprises a gated computation using both the hidden state from the language model and the hidden state from the Seq2Seq model as input. 15 . The computer-implemented method of claim 14 wherein an output from the gated computation is combined with hidden state from the language model using an element-wise multiplication for a multiplication result. 16 . The computer-implemented method of claim 15 wherein the multiplication result and the hidden state of the Seq2Seq model are concatenated to generate the combined hidden state. 17 . The computer-implemented method of claim 8 wherein the logit output based on the combined hidden state is generated by a deep neural network (DNN) within the decoder. 18 . The computer-implemented method of claim 8 wherein the DNN further comprises an affine layer prior to a softmax, the affine layer integrated with rectified linear unit (ReLU) activation. 19 . A computer-implemented method for training a sequential to sequential (Seq2Seq) model, the method comprising: receiving an input sequence to the Seq2Seq model; generating a hidden state of the Seq2Seq model; obtaining a combined hidden state based at least on the generated hidden state of the Seq2Seq model and a probability projection across a plurality of language models; and using output from the combined hidden state to train the Seq2Seq model. 20 . The computer-implemented method of claim 19 wherein the probability projection comprises projecting a token distribution onto a common embedding space.
Combinations of networks · CPC title
Activation functions · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling · CPC title
Backpropagation, e.g. using gradient descent · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.