System and method for pseudo-task augmentation in deep multitask learning

US12033079B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12033079-B2
Application numberUS-201916270681-A
CountryUS
Kind codeB2
Filing dateFeb 8, 2019
Priority dateFeb 8, 2018
Publication dateJul 9, 2024
Grant dateJul 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A multi-task (MTL) process is adapted to the single-task learning (STL) case, i.e., when only a single task is available for training. The process is formalized as pseudo-task augmentation (PTA), in which a single task has multiple distinct decoders projecting the output of the shared structure to task predictions. By training the shared structure to solve the same problem in multiple ways, PTA simulates the effect of training towards distinct but closely-related tasks drawn from the same universe. Training dynamics with multiple pseudo-tasks strictly subsumes training with just one, and a class of algorithms is introduced for controlling pseudo-tasks in practice.

First claim

Opening claim text (preview).

The invention claimed is: 1. A neural network-based joint model coupled to memory and running on one or more parallel processors, comprising: an encoder that processes an input and generates an encoding; a plurality of decoders that are grouped into sets of decoders in dependence upon corresponding classification tasks, wherein each of the plurality of decoders respectively receive the encoding as input from the encoder, thereby forming encoder-decoder pairs which operate independently of each other when performing the corresponding classification tasks T, wherein each of the plurality of decoders respectively process the received encoding including processing the encoding through their individual decoder layers and classification layers and produce classification scores for classes defined for the corresponding classification tasks; and a trainer that jointly trains the encoder-decoder pairs over at least one backpropagation iterations to perform the corresponding classification tasks wherein the trainer is further configured to comprise: a forward pass stage that processes training inputs through each single encoder and resulting encodings through each of the plurality of decoders, wherein the plurality of decoders includes decoders for performing different classification tasks, paired with a same single encoder to compute respective activations for each of the training inputs; a backward pass stage, that, over each of the at least one thousand backpropagation iterations; determines gradient data for the each of the plurality of decoders for each of the training inputs in dependence upon a loss function, wherein the loss function is cross entropy that uses either a maximum likelihood objective function, a policy gradient function, or both; averages the gradient data determined for the each of the plurality of decoders; and determines gradient data for the same single encoder by backpropagating the averaged gradient data for all of the plurality of decoders through the same single encoder; an update stage that modifies weights of each single encoder in dependence upon the gradient data determined for the single encoder from the plurality of decoders paired therewith; and a persistence stage that, upon convergence after a final backpropagation iteration, persists in the memory the modified weights of the encoder derived by the training to be applied to future classification tasks; wherein for each classification task T, a single model, including a jointly trained encoder-decoder pair, is selected from the jointly trained multiple encoder-decoder pairs in the joint model. 2. The neural network-based model of claim 1 , wherein the encoder is paired with the plurality of decoders D including the plurality of decoders from a same set of decoders and the plurality of decoders from a different set of decoders, wherein each of the plurality of decoders in the same set of decoders performs a first classification task and each of the plurality of decoder in the different set of decoders performs a second classification task. 3. The neural network-based model of any of claim 2 , further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of each decoder within a set of decoders derived by the training to perform the corresponding classification tasks on inference inputs, and wherein the inference inputs are processed by the encoder to produce encodings, followed by one of the decoders processing the encodings to output classification scores for classes defined for the corresponding classification tasks. 4. The neural network-based model of claim 2 , further configured to use a combination of the modified weights of the encoder derived by the training and modified weights of two or more of the decoders derived by the training to respectively perform two or more of the classification tasks on inference inputs, and wherein the inference inputs are processed by the encoder to produce encodings, followed by the two or more of the decoders respectively processing the encodings to output classification scores for classes defined for the two or more of the classification tasks. 5. The neural network-based model of claim 4 , wherein the input, the training inputs, and the inference inputs are selected from a group consisting of image data, text data and genomic data. 6. The neural network-based model of claim 2 , wherein each training input is annotated with a plurality of task-specific labels for the corresponding classification tasks. 7. The neural network-based model of any of claim 1 , wherein a plurality of training inputs for the corresponding classification tasks are fed in parallel to the encoder as input in each forward pass iteration, and wherein each training input is annotated with a task-specific label for a corresponding classification task. 8. The neural network-based model of claim 1 , wherein the encoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest. 9. The neural network-based model of claim 1 , wherein the encoding is convolution data. 10. The neural network-based model of claim 1 , wherein each decoder further comprises at least one decoder layer and at least one classification layer. 11. The neural network-based model of claim 10 , wherein each of the numerous decoders is a fully-connected neural network (abbreviated FCNN) and the decoder layer is a fully-connected layer. 12. The neural network-based model of claim 10 , wherein the at least one classification layer is a sigmoid classifier. 13. The neural network-based model of claim 10 , wherein the at least one classification layer is a softmax classifier. 14. The neural network-based model of claim 1 , wherein the encoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network. 15. The neural network-based model of claim 1 , wherein the encoding is hidden state data. 16. The neural network-based model of claim 1 , wherein each decoder is a recurrent neural network (abbreviated RNN), including long short-term memory (LSTM) network or gated recurrent unit (GRU) network. 17. The neural network-based model of claim 1 , wherein each decoder is a convolutional neural network (abbreviated CNN) with a plurality of convolution layers arranged in a sequence from lowest to highest. 18. The neural network-based model of claim 1 , wherein the encoder is a fully-connected neural network (abbreviated FCNN) with at least one fully-connected layer. 19. The neural network-based model of claim 1 , wherein at least some of the decoders are of a first neural network type, at least some of the decoders are of a second neural network type, and at least some of the decoders are of a third neural network type. 20. The neural network-based model of claim 1 , wherein at least some of the decoders are convolutional neural networks (abbreviated CNNs) with a plurality of convolution layers arranged in a sequence from lowest to highest, at least some of the decoders are recurrent neural networks (abbreviated RNNs), including long short-term memory (LSTM) networks or gated recurrent unit (GRU) networks, and at least some of the decoders are fully-connected neural networks (abbreviated FCNNs). 21. The neural network-based model of claim 1 , further configured to comprise an initializer that initializes the decoders with random weights

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Supervised learning · CPC title

  • Hyperparameter optimisation; Meta-learning; Learning-to-learn · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12033079B2 cover?
A multi-task (MTL) process is adapted to the single-task learning (STL) case, i.e., when only a single task is available for training. The process is formalized as pseudo-task augmentation (PTA), in which a single task has multiple distinct decoders projecting the output of the shared structure to task predictions. By training the shared structure to solve the same problem in multiple ways, PTA…
Who is the assignee on this patent?
Cognizant Tech Solutions U S Corporation
What technology area does this patent fall under?
Primary CPC classification G06N3/084. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).