Joint Many-Task Neural Network Model for Multiple Natural Language Processing (NLP) Tasks
US-2018121787-A1 · May 3, 2018 · US
US11600194B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11600194-B2 |
| Application number | US-201816006691-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 12, 2018 |
| Priority date | May 18, 2018 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Approaches for natural language processing include a multi-layer encoder for encoding words from a context and words from a question in parallel, a multi-layer decoder for decoding the encoded context and the encoded question, a pointer generator for generating distributions over the words from the context, the words from the question, and words in a vocabulary based on an output from the decoder, and a switch. The switch generates a weighting of the distributions over the words from the context, the words from the question, and the words in the vocabulary, generates a composite distribution based on the weighting of the distribution over the first words from the context, the distribution over the second words from the question, and the distribution over the words in the vocabulary, and selects words for inclusion in an answer using the composite distribution.
Opening claim text (preview).
What is claimed is: 1. A method for natural language processing, the method comprising: encoding first words from a context and second words from a question, wherein the question is separate from but related to the context, the encodings performed in parallel using a multi-layer encoder; decoding the encoded context and the encoded question using a multi-layer decoder; generating, based on an output from the decoder, a first distribution over the first words from the context, a second distribution over the second words from the question, and a third distribution over third words in a generative vocabulary that comprises a source of words; generating a first weight of the first distribution, a second weight of the second distribution, and a third weight of the third distribution; generating a composite distribution based on the first weight, second weight, and third weight; selecting words for inclusion in an answer using the composite distribution; and training the multi-layer encoder and the multi-layer decoder against a subset of task types, wherein the subset of task types are selected according to an anti-curriculum strategy. 2. The method of claim 1 , wherein the context and the question correspond to a natural language processing task type selected from question answering, machine translation, document summarization, database query generation, sentiment analysis, natural language inference, semantic role labeling, relation extraction, goal oriented dialogue, and pronoun resolution. 3. The method of claim 1 , further comprising: determining a coattention between the first words in the context and the second words in the question. 4. The method of claim 1 , further comprising: generating an attention across the context and an attention across the question in parallel; and generating final encodings of the context and the question in parallel based on the attention. 5. The method of claim 1 , further comprising: encoding the words in the context and words in the question in parallel; projecting the encodings of the words in the context and the words in the question in parallel; and further encoding the projections of the encodings. 6. The method of claim 1 , further comprising: encoding and embedding an intermediate version of the answer; generating an attention between the encoded and embedded intermediate version of the answer and a final encoding of the context; generating an intermediate decoder state from the generated attention; and generating context and question decoder states based on a final encoding of the context, a final encoding of the question, and the intermediate decoder state. 7. The method of claim 1 , wherein the method further comprises training the multi-layer decoder and the multi-layer encoder against a full set of task types after the system is trained against the subset of task types. 8. A non-transitory machine-readable medium comprising executable code which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising: encoding first words from a context and second words from a question, wherein the question is separate from but related to the context, the encodings performed in parallel using a multi-layer encoder; decoding the encoded context and the encoded question using a multi-layer decoder; generating, based on an output from the decoder, a first distribution over the first words from the context, a second distribution over the second words from the question, and a third distribution over third words in a generative vocabulary that comprises a source of words; generating a first weight of the first distribution, a second weight of the second distribution, and a third weight of the third distribution generating a composite distribution based on the first weight, second weight, and third weight; and selecting words for inclusion in an answer using the composite distribution; and training the multi-layer encoder and the multi-layer decoder against a subset of task types, wherein the subset of task types are selected according to an anti-curriculum strategy. 9. The non-transitory machine-readable medium of claim 8 , wherein the context and the question correspond to a natural language processing task type selected from question answering, machine translation, document summarization, database query generation, sentiment analysis, natural language inference, semantic role labeling, relation extraction, goal oriented dialogue, and pronoun resolution. 10. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: determining a coattention between the first words in the context and the second words in the question. 11. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: generating an attention across the context and an attention across the question in parallel; and generating final encodings of the context and the question in parallel based on the attention. 12. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: encoding the words in the context and words in the question in parallel; projecting the encodings of the words in the context and the words in the question in parallel; and further encoding the projections of the encodings. 13. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: encoding and embedding an intermediate version of the answer; generating an attention between the encoded and embedded intermediate version of the answer and a final encoding of the context; generating an intermediate decoder state from the generated attention; and generating context and question decoder states based on a final encoding of the context, a final encoding of the question, and the intermediate decoder state. 14. A system for natural language processing, the system comprising: a memory storing instructions; and a processor coupled with the memory and configured, when executing the instructions on the memory, to cause the system to: encode first words from a context and second words from a question, wherein the question is separate from but related to the context, the encodings performed in parallel; decode the encoded context and the encoded question; generate, based on the decoded context and the decoded question, a first distribution over the first words from the context, a second distribution over the second words from the question, and a third distribution over third words in a generative vocabulary that comprises a source of words; generate a first weight of the first distribution, a second weight of the second distribution, and a third weight of the third distribution; generate a composite distribution based on the first weight, second weight, and third weight; and select words for inclusion in an answer using the composite distribution, wherein the system is trained against a subset of task types, and wherein the subset of task types are selected according to an anti-curriculum strategy. 15. The system of claim 14 , wherein the context and the question correspond to a natural language processing task type selected from question answering, machine translation, document summarization, database query generation, sentiment analysis, natural language inference, semantic role labeling, relation extraction, goal oriented dialogue, and pronoun resolution. 16. The system of claim 14 , wherein the processor is further config
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title
Transfer learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.