Multitask learning as question answering

US11600194B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11600194-B2
Application numberUS-201816006691-A
CountryUS
Kind codeB2
Filing dateJun 12, 2018
Priority dateMay 18, 2018
Publication dateMar 7, 2023
Grant dateMar 7, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Approaches for natural language processing include a multi-layer encoder for encoding words from a context and words from a question in parallel, a multi-layer decoder for decoding the encoded context and the encoded question, a pointer generator for generating distributions over the words from the context, the words from the question, and words in a vocabulary based on an output from the decoder, and a switch. The switch generates a weighting of the distributions over the words from the context, the words from the question, and the words in the vocabulary, generates a composite distribution based on the weighting of the distribution over the first words from the context, the distribution over the second words from the question, and the distribution over the words in the vocabulary, and selects words for inclusion in an answer using the composite distribution.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for natural language processing, the method comprising: encoding first words from a context and second words from a question, wherein the question is separate from but related to the context, the encodings performed in parallel using a multi-layer encoder; decoding the encoded context and the encoded question using a multi-layer decoder; generating, based on an output from the decoder, a first distribution over the first words from the context, a second distribution over the second words from the question, and a third distribution over third words in a generative vocabulary that comprises a source of words; generating a first weight of the first distribution, a second weight of the second distribution, and a third weight of the third distribution; generating a composite distribution based on the first weight, second weight, and third weight; selecting words for inclusion in an answer using the composite distribution; and training the multi-layer encoder and the multi-layer decoder against a subset of task types, wherein the subset of task types are selected according to an anti-curriculum strategy. 2. The method of claim 1 , wherein the context and the question correspond to a natural language processing task type selected from question answering, machine translation, document summarization, database query generation, sentiment analysis, natural language inference, semantic role labeling, relation extraction, goal oriented dialogue, and pronoun resolution. 3. The method of claim 1 , further comprising: determining a coattention between the first words in the context and the second words in the question. 4. The method of claim 1 , further comprising: generating an attention across the context and an attention across the question in parallel; and generating final encodings of the context and the question in parallel based on the attention. 5. The method of claim 1 , further comprising: encoding the words in the context and words in the question in parallel; projecting the encodings of the words in the context and the words in the question in parallel; and further encoding the projections of the encodings. 6. The method of claim 1 , further comprising: encoding and embedding an intermediate version of the answer; generating an attention between the encoded and embedded intermediate version of the answer and a final encoding of the context; generating an intermediate decoder state from the generated attention; and generating context and question decoder states based on a final encoding of the context, a final encoding of the question, and the intermediate decoder state. 7. The method of claim 1 , wherein the method further comprises training the multi-layer decoder and the multi-layer encoder against a full set of task types after the system is trained against the subset of task types. 8. A non-transitory machine-readable medium comprising executable code which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising: encoding first words from a context and second words from a question, wherein the question is separate from but related to the context, the encodings performed in parallel using a multi-layer encoder; decoding the encoded context and the encoded question using a multi-layer decoder; generating, based on an output from the decoder, a first distribution over the first words from the context, a second distribution over the second words from the question, and a third distribution over third words in a generative vocabulary that comprises a source of words; generating a first weight of the first distribution, a second weight of the second distribution, and a third weight of the third distribution generating a composite distribution based on the first weight, second weight, and third weight; and selecting words for inclusion in an answer using the composite distribution; and training the multi-layer encoder and the multi-layer decoder against a subset of task types, wherein the subset of task types are selected according to an anti-curriculum strategy. 9. The non-transitory machine-readable medium of claim 8 , wherein the context and the question correspond to a natural language processing task type selected from question answering, machine translation, document summarization, database query generation, sentiment analysis, natural language inference, semantic role labeling, relation extraction, goal oriented dialogue, and pronoun resolution. 10. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: determining a coattention between the first words in the context and the second words in the question. 11. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: generating an attention across the context and an attention across the question in parallel; and generating final encodings of the context and the question in parallel based on the attention. 12. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: encoding the words in the context and words in the question in parallel; projecting the encodings of the words in the context and the words in the question in parallel; and further encoding the projections of the encodings. 13. The non-transitory machine-readable medium of claim 8 , wherein the method further comprises: encoding and embedding an intermediate version of the answer; generating an attention between the encoded and embedded intermediate version of the answer and a final encoding of the context; generating an intermediate decoder state from the generated attention; and generating context and question decoder states based on a final encoding of the context, a final encoding of the question, and the intermediate decoder state. 14. A system for natural language processing, the system comprising: a memory storing instructions; and a processor coupled with the memory and configured, when executing the instructions on the memory, to cause the system to: encode first words from a context and second words from a question, wherein the question is separate from but related to the context, the encodings performed in parallel; decode the encoded context and the encoded question; generate, based on the decoded context and the decoded question, a first distribution over the first words from the context, a second distribution over the second words from the question, and a third distribution over third words in a generative vocabulary that comprises a source of words; generate a first weight of the first distribution, a second weight of the second distribution, and a third weight of the third distribution; generate a composite distribution based on the first weight, second weight, and third weight; and select words for inclusion in an answer using the composite distribution, wherein the system is trained against a subset of task types, and wherein the subset of task types are selected according to an anti-curriculum strategy. 15. The system of claim 14 , wherein the context and the question correspond to a natural language processing task type selected from question answering, machine translation, document summarization, database query generation, sentiment analysis, natural language inference, semantic role labeling, relation extraction, goal oriented dialogue, and pronoun resolution. 16. The system of claim 14 , wherein the processor is further config

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

  • Transfer learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11600194B2 cover?
Approaches for natural language processing include a multi-layer encoder for encoding words from a context and words from a question in parallel, a multi-layer decoder for decoding the encoded context and the encoded question, a pointer generator for generating distributions over the words from the context, the words from the question, and words in a vocabulary based on an output from the decod…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/90332. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).