Neural network combined image and text evaluator and classifier
US-2018096219-A1 · Apr 5, 2018 · US
US11615249B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11615249-B2 |
| Application number | US-202016996726-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 18, 2020 |
| Priority date | Feb 9, 2018 |
| Publication date | Mar 28, 2023 |
| Grant date | Mar 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Approaches for multitask learning as question answering include an input layer for encoding a context and a question, a self-attention based transformer including an encoder and a decoder, a first bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder, a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from the output of the decoder and a hidden state, an attention network for generating first attention weights based on an output of the first biLSTM and an output of the LSTM, a vocabulary layer for generating a distribution over a vocabulary, a context layer for generating a distribution over the context, and a switch for generating a weighting between the distributions over the vocabulary and the context, generating a composite distribution based on the weighting, and selecting a word of an answer using the composite distribution.
Opening claim text (preview).
What is claimed is: 1. A system for natural language processing, the system comprising: one or more processors; and a memory storing computer-executable instructions, which when executed by the one or more processors, cause the system to perform operations comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on an output of the biLSTM and an output of the LSTM; generating, by a vocabulary layer, a first distribution over a plurality of words in a vocabulary based on the set of attention weights; generating, by a context layer, a second distribution over the context-based words based on the set of attention weights; and selecting a set of words for an answer to the question based on the first distribution and the second distribution. 2. The system of claim 1 , wherein the operations further comprise: generating, using a switch, a weighting between the first distribution over the plurality of words from the vocabulary and the second distribution over the context-based words. 3. The system of claim 2 , wherein the operations further comprise: generating, using the switch, a composite distribution based on the weighting; and selecting, using the switch, a word for inclusion in the answer using the composite distribution. 4. The system of claim 1 , wherein the input layer comprises one or more of a linear layer, a second biLSTM, a coattention layer, and a third biLSTM. 5. The system of claim 1 , wherein the operations further comprise: generating, via a coattention layer, an affinity matrix based on the context-based representation and the question-based representation; generating second attention weights based on the affinity matrix; and generating weighted sums of the context-based representation and the question-based representation using the second attention weights. 6. The system of claim 1 , wherein the vocabulary layer comprises: a tan h layer for generating a hidden state based on the set of attention weights, the second encoding, and the context-adjusted hidden state; and a softmax layer for generating the first distribution over a plurality of words in a vocabulary. 7. The system of claim 6 , wherein a decoder, the LSTM, the attention network, the vocabulary layer, the context layer, and a switch iteratively select each word for the answer. 8. The system of claim 6 , wherein the first encoding and the second encoding are implemented at a transformer that comprises a plurality of transformer layers, each of the plurality of transformer layers comprising an encoder portion having a first multi-head self-attention network and a decoder portion having a second multi-head self-attention network and a third multi-head attention network. 9. The system of claim 1 , wherein the system is trained using a hybrid training strategy where the system is first trained against a plurality of task types using a sequential training strategy and is then trained against the plurality of task types using a joint training strategy. 10. The system of claim 9 , wherein each of the plurality of task types is a language translation task type, a classification task type, or a question answering task type. 11. A method for natural language processing, the method comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on a first an output of the biLSTM and an output of the LSTM; generating, by a vocabulary layer, a first distribution over a plurality of words in a vocabulary based on the set of attention weights; generating, by a context layer, a second distribution over the context-based words based on the set of attention weights; and selecting a set of words for an answer to the question based on the first distribution and the second distribution. 12. The method of claim 11 , further comprising: generating, using a switch, a weighting between the first distribution over the plurality of words from the vocabulary and the second distribution over the context-based words. 13. The method of claim 12 , further comprising: generating, using the switch, a composite distribution based on the weighting; and selecting, using the switch, a word for inclusion in the answer using the composite distribution. 14. The method of claim 11 , further comprising: generating, via a coattention layer, an affinity matrix based on the context-based representation and the question-based representation; generating second attention weights based on the affinity matrix; and generating weighted sums of the context-based representation and the question-based representation using the second attention weights. 15. The method of claim 11 , wherein the vocabulary layer comprises: a tan h layer for generating a hidden state based on the set of attention weights, the second encoding, and the context-adjusted hidden state; and a softmax layer for generating the first distribution over a plurality of words in a vocabulary. 16. The method of claim 11 , further comprising: encoding and decoding, using a self-attention-based transformer, an output of the input layer. 17. The method of claim 16 , wherein the self-attention-based transformer comprises a plurality of transformer layers, each of the plurality of transformer layers comprising an encoder portion having a first multi-head self-attention network and a decoder portion having a second multi-head self-attention network and a third multi-head attention network. 18. A non-transitory processor-readable medium storing processor-executable instructions for natural language processing, the instructions being executable by a processor to perform operations comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on an output of the biLSTM and an output of the LSTM; generatin
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Learning methods · CPC title
Supervised learning · CPC title
Parsing for meaning understanding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.