Multitask learning as question answering

US11501076B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11501076-B2
Application numberUS-201815974075-A
CountryUS
Kind codeB2
Filing dateMay 8, 2018
Priority dateFeb 9, 2018
Publication dateNov 15, 2022
Grant dateNov 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Approaches for multitask learning as question answering include a method for training that includes receiving a plurality of training samples including training samples from a plurality of task types, presenting the training samples to a neural model to generate an answer, determining an error between the generated answer and the natural language ground truth answer for each training sample presented, and adjusting parameters of the neural model based on the error. Each of the training samples includes a natural language context, question, and ground truth answer. An order in which the training samples are presented to the neural model includes initially selecting the training samples according to a first training strategy and switching to selecting the training samples according to a second training strategy. In some embodiments the first training strategy is a sequential training strategy and the second training strategy is a joint training strategy.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for training a question answering system, the method comprising: receiving a plurality of training samples, each of the training samples including a natural language context, a natural language question, and a natural language ground truth answer, wherein the plurality of training samples contains different training samples each corresponding to training the question answering system for a different task type from a plurality of task types; presenting the plurality of training samples to a neural model to generate an answer; determining an error between the generated answer and the natural language ground truth answer for each training sample presented; and adjusting parameters of the neural model based on the error; wherein the plurality of training samples are presented to the neural model by: initially selecting a first set of training samples from the plurality of training samples according to a first training strategy that first selects samples corresponding to a first task type and subsequently selects samples corresponding to a second task type resulting in a first ordering of the first set of training samples that covers each of the plurality of task types; and switching to selecting a second set of training samples from the plurality of training samples according to a second training strategy that mixes samples corresponding to the different task types, resulting in a second ordering of the second set of training samples that covers each of the plurality of task types. 2. The method of claim 1 , wherein each of the plurality of task types is a language translation task type, a classification task type, or a question answering task type. 3. The method of claim 1 , wherein the first training strategy is a sequential training strategy where each of the first set of training samples for the first task type are selected before selecting training samples of the second task type. 4. The method of claim 3 , wherein the sequential training strategy includes reselecting training samples for the first task type after selecting training samples for each of the plurality of task types. 5. The method of claim 1 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected training samples are selected from different ones of the plurality of task types. 6. The method of claim 1 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected small groups of training samples are selected from different ones of the plurality of task types. 7. The method of claim 1 , wherein the first training strategy is a modified sequential training strategy where the first set of training samples are selected according to a sequential training strategy with periodic intervals where the training samples are selected according to a joint training strategy. 8. The method of claim 1 , further comprising switching to selecting the second set of training samples using the second training strategy after each of the first set of training samples for each of the plurality of task types is presented to the neural model a predetermined number of times. 9. The method of claim 1 , further comprising switching to selecting the second set of training samples using the second training strategy based on monitoring of performance metrics associated with each of the plurality of task types. 10. The method of claim 1 , wherein the neural model comprises: an input layer for encoding first words from the context and second words from the question; a self-attention based transformer comprising an encoder and a decoder; a bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder; a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from an output of the decoder and a hidden state; an attention network for generating attention weights based on an output of the encoder and the context-adjusted hidden state; a vocabulary layer for generating a distribution over third words in a vocabulary based on the attention weights; a context layer for generating a distribution over the first words from the context based on the attention weights; and a switch for: generating a weighting between the distribution over the third words from the vocabulary and the distribution over the first words from the context; generating a composite distribution based on the weighting of the distribution over the third words from the vocabulary and the distribution over the first words from the context; and selecting a word for inclusion in an answer using the composite distribution. 11. A non-transitory machine-readable medium comprising a plurality of machine-readable instructions which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising. receiving a plurality of training samples, each of the training samples including a natural language context, a natural language question, and a natural language ground truth answer, the training samples including training samples from a plurality of task types; presenting the training samples to a neural model to generate an answer; determining an error between the generated answer and the natural language ground truth answer for each training sample presented; and adjusting parameters of the neural model based on the error; wherein the plurality of training samples are presented to the neural model by: initially selecting a first set of training samples from the plurality of training samples according to a first training strategy that first selects samples corresponding to a first task type and subsequently selects samples corresponding to a second task type resulting in a first ordering of the first set of training samples that covers each of the plurality of task types; and switching to selecting a second set of training samples from the plurality of training samples according to a second training strategy that mixes samples corresponding to the different task types, resulting in a second ordering of the second set of training samples that covers each of the plurality of task types. 12. The non-transitory machine-readable medium of claim 11 , wherein the first training strategy is a sequential training strategy where each of the first set of training samples for the first task type are selected before selecting training samples of the second task type. 13. The non-transitory machine-readable medium of claim 11 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected training samples are selected from different ones of the plurality of task types. 14. The non-transitory machine-readable medium of claim 11 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected small groups of training samples are selected from different ones of the plurality of task types. 15. The non-transitory machine-readable medium of claim 11 , further comprising switching to selecting the second set of training samples using the second training strategy after each of the training samples for each of the plurality of task types is presented to the neural model a predetermined number of times. 16. A system for deep learning, the system compris

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11501076B2 cover?
Approaches for multitask learning as question answering include a method for training that includes receiving a plurality of training samples including training samples from a plurality of task types, presenting the training samples to a neural model to generate an answer, determining an error between the generated answer and the natural language ground truth answer for each training sample pre…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).