Neural network combined image and text evaluator and classifier
US-2018096219-A1 · Apr 5, 2018 · US
US11501076B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11501076-B2 |
| Application number | US-201815974075-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 8, 2018 |
| Priority date | Feb 9, 2018 |
| Publication date | Nov 15, 2022 |
| Grant date | Nov 15, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Approaches for multitask learning as question answering include a method for training that includes receiving a plurality of training samples including training samples from a plurality of task types, presenting the training samples to a neural model to generate an answer, determining an error between the generated answer and the natural language ground truth answer for each training sample presented, and adjusting parameters of the neural model based on the error. Each of the training samples includes a natural language context, question, and ground truth answer. An order in which the training samples are presented to the neural model includes initially selecting the training samples according to a first training strategy and switching to selecting the training samples according to a second training strategy. In some embodiments the first training strategy is a sequential training strategy and the second training strategy is a joint training strategy.
Opening claim text (preview).
What is claimed is: 1. A method for training a question answering system, the method comprising: receiving a plurality of training samples, each of the training samples including a natural language context, a natural language question, and a natural language ground truth answer, wherein the plurality of training samples contains different training samples each corresponding to training the question answering system for a different task type from a plurality of task types; presenting the plurality of training samples to a neural model to generate an answer; determining an error between the generated answer and the natural language ground truth answer for each training sample presented; and adjusting parameters of the neural model based on the error; wherein the plurality of training samples are presented to the neural model by: initially selecting a first set of training samples from the plurality of training samples according to a first training strategy that first selects samples corresponding to a first task type and subsequently selects samples corresponding to a second task type resulting in a first ordering of the first set of training samples that covers each of the plurality of task types; and switching to selecting a second set of training samples from the plurality of training samples according to a second training strategy that mixes samples corresponding to the different task types, resulting in a second ordering of the second set of training samples that covers each of the plurality of task types. 2. The method of claim 1 , wherein each of the plurality of task types is a language translation task type, a classification task type, or a question answering task type. 3. The method of claim 1 , wherein the first training strategy is a sequential training strategy where each of the first set of training samples for the first task type are selected before selecting training samples of the second task type. 4. The method of claim 3 , wherein the sequential training strategy includes reselecting training samples for the first task type after selecting training samples for each of the plurality of task types. 5. The method of claim 1 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected training samples are selected from different ones of the plurality of task types. 6. The method of claim 1 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected small groups of training samples are selected from different ones of the plurality of task types. 7. The method of claim 1 , wherein the first training strategy is a modified sequential training strategy where the first set of training samples are selected according to a sequential training strategy with periodic intervals where the training samples are selected according to a joint training strategy. 8. The method of claim 1 , further comprising switching to selecting the second set of training samples using the second training strategy after each of the first set of training samples for each of the plurality of task types is presented to the neural model a predetermined number of times. 9. The method of claim 1 , further comprising switching to selecting the second set of training samples using the second training strategy based on monitoring of performance metrics associated with each of the plurality of task types. 10. The method of claim 1 , wherein the neural model comprises: an input layer for encoding first words from the context and second words from the question; a self-attention based transformer comprising an encoder and a decoder; a bi-directional long-term short-term memory (biLSTM) for further encoding an output of the encoder; a long-term short-term memory (LSTM) for generating a context-adjusted hidden state from an output of the decoder and a hidden state; an attention network for generating attention weights based on an output of the encoder and the context-adjusted hidden state; a vocabulary layer for generating a distribution over third words in a vocabulary based on the attention weights; a context layer for generating a distribution over the first words from the context based on the attention weights; and a switch for: generating a weighting between the distribution over the third words from the vocabulary and the distribution over the first words from the context; generating a composite distribution based on the weighting of the distribution over the third words from the vocabulary and the distribution over the first words from the context; and selecting a word for inclusion in an answer using the composite distribution. 11. A non-transitory machine-readable medium comprising a plurality of machine-readable instructions which when executed by one or more processors associated with a computing device are adapted to cause the one or more processors to perform a method comprising. receiving a plurality of training samples, each of the training samples including a natural language context, a natural language question, and a natural language ground truth answer, the training samples including training samples from a plurality of task types; presenting the training samples to a neural model to generate an answer; determining an error between the generated answer and the natural language ground truth answer for each training sample presented; and adjusting parameters of the neural model based on the error; wherein the plurality of training samples are presented to the neural model by: initially selecting a first set of training samples from the plurality of training samples according to a first training strategy that first selects samples corresponding to a first task type and subsequently selects samples corresponding to a second task type resulting in a first ordering of the first set of training samples that covers each of the plurality of task types; and switching to selecting a second set of training samples from the plurality of training samples according to a second training strategy that mixes samples corresponding to the different task types, resulting in a second ordering of the second set of training samples that covers each of the plurality of task types. 12. The non-transitory machine-readable medium of claim 11 , wherein the first training strategy is a sequential training strategy where each of the first set of training samples for the first task type are selected before selecting training samples of the second task type. 13. The non-transitory machine-readable medium of claim 11 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected training samples are selected from different ones of the plurality of task types. 14. The non-transitory machine-readable medium of claim 11 , wherein the second training strategy is a joint training strategy where each of the second set of training samples are selected so that consecutively selected small groups of training samples are selected from different ones of the plurality of task types. 15. The non-transitory machine-readable medium of claim 11 , further comprising switching to selecting the second set of training samples using the second training strategy after each of the training samples for each of the plurality of task types is presented to the neural model a predetermined number of times. 16. A system for deep learning, the system compris
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.