Multitask learning as question answering

US11615249B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11615249-B2
Application numberUS-202016996726-A
CountryUS
Kind codeB2
Filing dateAug 18, 2020
Priority dateFeb 9, 2018
Publication dateMar 28, 2023
Grant dateMar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Approaches for multitask learning as question answering include an input layer for encoding a context and a question, a self-attention based transformer including an encoder and a decoder, a first bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder, a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from the output of the decoder and a hidden state, an attention network for generating first attention weights based on an output of the first biLSTM and an output of the LSTM, a vocabulary layer for generating a distribution over a vocabulary, a context layer for generating a distribution over the context, and a switch for generating a weighting between the distributions over the vocabulary and the context, generating a composite distribution based on the weighting, and selecting a word of an answer using the composite distribution.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for natural language processing, the system comprising: one or more processors; and a memory storing computer-executable instructions, which when executed by the one or more processors, cause the system to perform operations comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on an output of the biLSTM and an output of the LSTM; generating, by a vocabulary layer, a first distribution over a plurality of words in a vocabulary based on the set of attention weights; generating, by a context layer, a second distribution over the context-based words based on the set of attention weights; and selecting a set of words for an answer to the question based on the first distribution and the second distribution. 2. The system of claim 1 , wherein the operations further comprise: generating, using a switch, a weighting between the first distribution over the plurality of words from the vocabulary and the second distribution over the context-based words. 3. The system of claim 2 , wherein the operations further comprise: generating, using the switch, a composite distribution based on the weighting; and selecting, using the switch, a word for inclusion in the answer using the composite distribution. 4. The system of claim 1 , wherein the input layer comprises one or more of a linear layer, a second biLSTM, a coattention layer, and a third biLSTM. 5. The system of claim 1 , wherein the operations further comprise: generating, via a coattention layer, an affinity matrix based on the context-based representation and the question-based representation; generating second attention weights based on the affinity matrix; and generating weighted sums of the context-based representation and the question-based representation using the second attention weights. 6. The system of claim 1 , wherein the vocabulary layer comprises: a tan h layer for generating a hidden state based on the set of attention weights, the second encoding, and the context-adjusted hidden state; and a softmax layer for generating the first distribution over a plurality of words in a vocabulary. 7. The system of claim 6 , wherein a decoder, the LSTM, the attention network, the vocabulary layer, the context layer, and a switch iteratively select each word for the answer. 8. The system of claim 6 , wherein the first encoding and the second encoding are implemented at a transformer that comprises a plurality of transformer layers, each of the plurality of transformer layers comprising an encoder portion having a first multi-head self-attention network and a decoder portion having a second multi-head self-attention network and a third multi-head attention network. 9. The system of claim 1 , wherein the system is trained using a hybrid training strategy where the system is first trained against a plurality of task types using a sequential training strategy and is then trained against the plurality of task types using a joint training strategy. 10. The system of claim 9 , wherein each of the plurality of task types is a language translation task type, a classification task type, or a question answering task type. 11. A method for natural language processing, the method comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on a first an output of the biLSTM and an output of the LSTM; generating, by a vocabulary layer, a first distribution over a plurality of words in a vocabulary based on the set of attention weights; generating, by a context layer, a second distribution over the context-based words based on the set of attention weights; and selecting a set of words for an answer to the question based on the first distribution and the second distribution. 12. The method of claim 11 , further comprising: generating, using a switch, a weighting between the first distribution over the plurality of words from the vocabulary and the second distribution over the context-based words. 13. The method of claim 12 , further comprising: generating, using the switch, a composite distribution based on the weighting; and selecting, using the switch, a word for inclusion in the answer using the composite distribution. 14. The method of claim 11 , further comprising: generating, via a coattention layer, an affinity matrix based on the context-based representation and the question-based representation; generating second attention weights based on the affinity matrix; and generating weighted sums of the context-based representation and the question-based representation using the second attention weights. 15. The method of claim 11 , wherein the vocabulary layer comprises: a tan h layer for generating a hidden state based on the set of attention weights, the second encoding, and the context-adjusted hidden state; and a softmax layer for generating the first distribution over a plurality of words in a vocabulary. 16. The method of claim 11 , further comprising: encoding and decoding, using a self-attention-based transformer, an output of the input layer. 17. The method of claim 16 , wherein the self-attention-based transformer comprises a plurality of transformer layers, each of the plurality of transformer layers comprising an encoder portion having a first multi-head self-attention network and a decoder portion having a second multi-head self-attention network and a third multi-head attention network. 18. A non-transitory processor-readable medium storing processor-executable instructions for natural language processing, the instructions being executable by a processor to perform operations comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on an output of the biLSTM and an output of the LSTM; generatin

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Supervised learning · CPC title

  • Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11615249B2 cover?
Approaches for multitask learning as question answering include an input layer for encoding a context and a question, a self-attention based transformer including an encoder and a decoder, a first bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder, a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from the output o…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).