What technology area does this patent fall under?

Primary CPC classification G06N3/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multitask learning as question answering

US11615249B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11615249-B2
Application number	US-202016996726-A
Country	US
Kind code	B2
Filing date	Aug 18, 2020
Priority date	Feb 9, 2018
Publication date	Mar 28, 2023
Grant date	Mar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Approaches for multitask learning as question answering include an input layer for encoding a context and a question, a self-attention based transformer including an encoder and a decoder, a first bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder, a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from the output of the decoder and a hidden state, an attention network for generating first attention weights based on an output of the first biLSTM and an output of the LSTM, a vocabulary layer for generating a distribution over a vocabulary, a context layer for generating a distribution over the context, and a switch for generating a weighting between the distributions over the vocabulary and the context, generating a composite distribution based on the weighting, and selecting a word of an answer using the composite distribution.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for natural language processing, the system comprising: one or more processors; and a memory storing computer-executable instructions, which when executed by the one or more processors, cause the system to perform operations comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on an output of the biLSTM and an output of the LSTM; generating, by a vocabulary layer, a first distribution over a plurality of words in a vocabulary based on the set of attention weights; generating, by a context layer, a second distribution over the context-based words based on the set of attention weights; and selecting a set of words for an answer to the question based on the first distribution and the second distribution. 2. The system of claim 1 , wherein the operations further comprise: generating, using a switch, a weighting between the first distribution over the plurality of words from the vocabulary and the second distribution over the context-based words. 3. The system of claim 2 , wherein the operations further comprise: generating, using the switch, a composite distribution based on the weighting; and selecting, using the switch, a word for inclusion in the answer using the composite distribution. 4. The system of claim 1 , wherein the input layer comprises one or more of a linear layer, a second biLSTM, a coattention layer, and a third biLSTM. 5. The system of claim 1 , wherein the operations further comprise: generating, via a coattention layer, an affinity matrix based on the context-based representation and the question-based representation; generating second attention weights based on the affinity matrix; and generating weighted sums of the context-based representation and the question-based representation using the second attention weights. 6. The system of claim 1 , wherein the vocabulary layer comprises: a tan h layer for generating a hidden state based on the set of attention weights, the second encoding, and the context-adjusted hidden state; and a softmax layer for generating the first distribution over a plurality of words in a vocabulary. 7. The system of claim 6 , wherein a decoder, the LSTM, the attention network, the vocabulary layer, the context layer, and a switch iteratively select each word for the answer. 8. The system of claim 6 , wherein the first encoding and the second encoding are implemented at a transformer that comprises a plurality of transformer layers, each of the plurality of transformer layers comprising an encoder portion having a first multi-head self-attention network and a decoder portion having a second multi-head self-attention network and a third multi-head attention network. 9. The system of claim 1 , wherein the system is trained using a hybrid training strategy where the system is first trained against a plurality of task types using a sequential training strategy and is then trained against the plurality of task types using a joint training strategy. 10. The system of claim 9 , wherein each of the plurality of task types is a language translation task type, a classification task type, or a question answering task type. 11. A method for natural language processing, the method comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on a first an output of the biLSTM and an output of the LSTM; generating, by a vocabulary layer, a first distribution over a plurality of words in a vocabulary based on the set of attention weights; generating, by a context layer, a second distribution over the context-based words based on the set of attention weights; and selecting a set of words for an answer to the question based on the first distribution and the second distribution. 12. The method of claim 11 , further comprising: generating, using a switch, a weighting between the first distribution over the plurality of words from the vocabulary and the second distribution over the context-based words. 13. The method of claim 12 , further comprising: generating, using the switch, a composite distribution based on the weighting; and selecting, using the switch, a word for inclusion in the answer using the composite distribution. 14. The method of claim 11 , further comprising: generating, via a coattention layer, an affinity matrix based on the context-based representation and the question-based representation; generating second attention weights based on the affinity matrix; and generating weighted sums of the context-based representation and the question-based representation using the second attention weights. 15. The method of claim 11 , wherein the vocabulary layer comprises: a tan h layer for generating a hidden state based on the set of attention weights, the second encoding, and the context-adjusted hidden state; and a softmax layer for generating the first distribution over a plurality of words in a vocabulary. 16. The method of claim 11 , further comprising: encoding and decoding, using a self-attention-based transformer, an output of the input layer. 17. The method of claim 16 , wherein the self-attention-based transformer comprises a plurality of transformer layers, each of the plurality of transformer layers comprising an encoder portion having a first multi-head self-attention network and a decoder portion having a second multi-head self-attention network and a third multi-head attention network. 18. A non-transitory processor-readable medium storing processor-executable instructions for natural language processing, the instructions being executable by a processor to perform operations comprising: receiving, at an input layer, a natural language input of a question; performing a first encoding of context-based words and question-based words from the question into a context-based representation and a question-based representation; performing, using a bi-directional long-term short-term memory (biLSTM), a second encoding of the context-based representation and the question-based representation; generating, using a long-term short-term memory (LSTM), a context-adjusted hidden state based at least in part from the context-based representation and the question-based representation; generating, by an attention network, a set of attention weights based on an output of the biLSTM and an output of the LSTM; generatin

Assignees

Salesforce Com Inc

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/08Primary
Learning methods · CPC title
G06N3/09
Supervised learning · CPC title
G10L15/1822
Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

View patent family 67540543

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11615249B2 cover?: Approaches for multitask learning as question answering include an input layer for encoding a context and a question, a self-attention based transformer including an encoder and a decoder, a first bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder, a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from the output o…
Who is the assignee on this patent?: Salesforce Com Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).