Beyond shared hierarchies: deep multitask learning through soft layer ordering

US11250314B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11250314-B2
Application numberUS-201816172660-A
CountryUS
Kind codeB2
Filing dateOct 26, 2018
Priority dateOct 27, 2017
Publication dateFeb 15, 2022
Grant dateFeb 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The technology disclosed identifies parallel ordering of shared layers as a common assumption underlying existing deep multitask learning (MTL) approaches. This assumption restricts the kinds of shared structure that can be learned between tasks. The technology disclosed demonstrates how direct approaches to removing this assumption can ease the integration of information across plentiful and diverse tasks. The technology disclosed introduces soft ordering as a method for learning how to apply layers in different ways at different depths for different tasks, while simultaneously learning the layers themselves. Soft ordering outperforms parallel ordering methods as well as single-task learning across a suite of domains. Results show that deep MTL can be improved while generating a compact set of multipurpose functional primitives, thus aligning more closely with our understanding of complex real-world processes.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural network-based system coupled to memory and running on one or more parallel processors, comprising: an encoder generator which generates an encoder by accessing a set of processing submodules defined for the neural network-based system, constructing clones of the set of processing submodules, and arranging the clones in the encoder in a clone sequence starting from a lowest depth and continuing to a highest depth, wherein the clones in the encoder are shared by a plurality of classification tasks; a feeder which feeds input data for a particular one of each of the plurality of classification tasks to each one of processing submodules in a first clone at the lowest depth in the clone sequence to produce an output encoding from each one of the processing submodules in the first clone; a scaler generator which generates a scaler for the first clone, wherein the scaler applies task-specific, depth-specific, and processing submodule-specific scaling values to respective output encodings of the processing submodules in the first clone to produce a scaled output encoding for each one of the processing submodules in the first clone; an accumulator that mixes respective scaled output encodings of the processing submodules in the first clone to produce an accumulated output encoding for the first clone; a forwarder that transmits the accumulated output encoding for the first clone as input to each one of processing submodules in a successive clone at a successive depth in the clone sequence; a controller that iteratively invokes the scaler generator, the accumulator, and the forwarder to, respectively, produce a scaled output encoding for each one of the processing submodules in the successive clone, produce an accumulated output encoding for the successive clone, and transmit the accumulated output encoding for the successive clone as input to each one of processing submodules in another successive clone at another successive depth in the clone sequence until an accumulated output encoding is produced for a final clone at the highest depth in the clone sequence; a decoder selector which selects, from among numerous decoders, a decoder that is specific to each of a particular one of the classification tasks and transmits the accumulated output encoding produced for the final clone as input to the selected decoder; and the selected decoder which processes the accumulated output encoding and produces classification scores for classes defined for each of the particular one of the classification tasks. 2. The neural network-based system of claim 1 , wherein the scaler is a three-dimensional tensor that is learned using a gradient-update technique based on backpropagation. 3. The neural network-based system of claim 1 , wherein the scaling values are scalar values that augment or diminish respective magnitudes of the output encodings. 4. The neural network-based system of claim 3 , wherein the scalar values are softmax values that sum to unity. 5. The neural network-based system of claim 3 , wherein the scalar values are sigmoid values between zero and unity. 6. The neural network-based system of claim 3 , wherein the scalar values are continuous values normalized between a lowest value and a highest value. 7. The neural network-based system of claim 1 , wherein the processing submodules in the set have at least one different global topology hyperparameter, global operational hyperparameter, local topology hyperparameter, and/or local operational hyperparameter. 8. The neural network-based system of claim 1 , wherein the encoder is a convolutional neural network and the processing submodules are convolution layers interspersed with activation and/or normalization functions. 9. The neural network-based system of claim 1 , wherein the encoder is a recurrent neural network and the processing submodules are recurrent layers interspersed with activation and/or normalization functions. 10. The neural network-based system of claim 1 , wherein each decoder further comprises at least one decoder layer and at least one classification layer. 11. The neural network-based system of claim 10 , wherein the decoder is a fully-connected neural network and the decoder layer is a fully-connected layer. 12. The neural network-based system of claim 10 , wherein the classification layer is a sigmoid classifier. 13. The neural network-based system of claim 10 , wherein the classification layer is a softmax classifier. 14. A neural network-implemented method of soft ordering, including: generating an encoder by accessing a set of processing submodules defined for a neural network-based model, constructing clones of the set of processing submodules, and arranging the clones in the encoder in a clone sequence starting from a lowest depth and continuing to a highest depth, wherein the clones in the encoder are shared by a plurality of classification tasks; feeding input data for a particular one of each of the plurality of classification tasks to each one of processing submodules in a first clone at the lowest depth in the clone sequence to produce an output encoding from each one of the processing submodules in the first clone; generating a scaler for the first clone, wherein the scaler applies task-specific, depth-specific, and processing submodule-specific scaling values to respective output encodings of the processing submodules in the first clone to produce a scaled output encoding for each one of the processing submodules in the first clone; mixing respective scaled output encodings of the processing submodules in the first clone to produce an accumulated output encoding for the first clone; transmitting the accumulated output encoding for the first clone as input to each one of processing submodules in a successive clone at a successive depth in the clone sequence; iterating the scaler generation, the mixing, and the transmitting to, respectively, produce a scaled output encoding for each one of the processing submodules in the successive clone, produce an accumulated output encoding for the successive clone, and transmit the accumulated output encoding for the successive clone as input to each one of processing submodules in another successive clone at another successive depth in the clone sequence until an accumulated output encoding is produced for a final clone at the highest depth in the clone sequence; selecting, from among numerous decoders, a decoder that is specific to each of a particular one of the classification tasks and transmitting the accumulated output encoding produced for the final clone as input to the selected decoder; and processing the accumulated output encoding through the selected decoder to produce classification scores for classes defined for each of the particular one of the classification tasks. 15. The neural network-implemented method of claim 14 , wherein the scaler is a three dimensional tensor that is learned using a gradient-update technique based on backpropagation. 16. The neural network-implemented method of claim 14 , wherein the scaling values are scalar values that augment or diminish respective magnitudes of the output encodings. 17. The neural network-implemented method of claim 16 , wherein the scalar values are softmax values that sum to unity. 18. The neural network-implemented method of claim 16 , wherein the scalar values are sigmoid values between zero and unity. 19. The neural network-implemented method of claim 16 , wherein the scalar values are continuous values normalized between a

Assignees

Inventors

Classifications

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • G06N3/048Primary

    Activation functions · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11250314B2 cover?
The technology disclosed identifies parallel ordering of shared layers as a common assumption underlying existing deep multitask learning (MTL) approaches. This assumption restricts the kinds of shared structure that can be learned between tasks. The technology disclosed demonstrates how direct approaches to removing this assumption can ease the integration of information across plentiful and d…
Who is the assignee on this patent?
Cognizant Tech Solutions U S Corporation
What technology area does this patent fall under?
Primary CPC classification G06N3/048. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).