Systems for multi-task joint training of neural networks using multi-label datasets

US12243292B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12243292-B2
Application numberUS-202217929449-A
CountryUS
Kind codeB2
Filing dateSep 2, 2022
Priority dateSep 2, 2022
Publication dateMar 4, 2025
Grant dateMar 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism are provided. In one aspect, the system includes a processor configured to receive input data including a first set of labels and a second set of labels. Using the encoder module, features are extracted from the input data. Using a multi-headed attention mechanism, training loss metrics are computed. A first training loss metric is computed using the extracted features and the first set of labels, and a second training loss metric is computed using the extracted features and the second set of labels. A first mask is applied to filter the first training loss metric, and a second mask is applied to filter the second training loss metric. A final training loss metric is computed based on the filtered first and second training loss metrics.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer system for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism, the computer system comprising: a processor coupled to a storage medium that stores instructions, which, upon execution by the processor, cause the processor to: receive input data including a first set of labels and a second set of labels; using the encoder module, extract features from the input data; using a first task head of the multi-headed attention mechanism, compute a first training loss metric using the extracted features and the first set of labels; using a second task head of the multi-headed attention mechanism, compute a second training loss metric using the extracted features and the second set of labels; apply a first mask to filter the first training loss metric, wherein the first mask is computed based on the first set of labels; apply a second mask to filter the second training loss metric, wherein the second mask is computed based on the second set of labels; and compute a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric. 2. The computer system of claim 1 , wherein the first task head includes a classification neural network model. 3. The computer system of claim 2 , wherein the classification neural network model includes a facial expression capturing neural network model configured to compute, using the extracted features, a prediction indicating a facial expression that is one of a predetermined number of facial expression categories, wherein the first training loss metric is computed by comparing the first set of labels with the prediction indicating the facial expression. 4. The computer system of claim 1 , wherein the second task head includes a regression neural network model. 5. The computer system of claim 4 , wherein the regression neural network model includes a facial landmark regression model configured to compute, using the extracted features, a prediction indicating coordinates of a facial landmark, wherein the second training loss metric is computed by comparing the second set of labels with the prediction indicating the coordinates of the facial landmark. 6. The computer system of claim 1 , wherein the first set of labels includes annotations for a portion of the input data and missing annotations for the remaining portion of the input data. 7. The computer system of claim 6 , wherein the first mask includes a binary mask having zero values corresponding to the missing annotations of the first set of labels. 8. The computer system of claim 1 , wherein the input data includes image data. 9. The computer system of claim 8 , wherein the image data includes an image associated with: a first label from the first set of labels, the first label indicating a facial expression; and a second label from the second set of labels, the second label indicates a missing annotation. 10. The computer system of claim 1 , wherein the encoder module includes one or more of a convolutional neural network, a recurrent neural network, a transformer, or a sub-network. 11. A method for performing an inference task using a neural network, the method comprising: providing the neural network; receive an image; and compute a result by processing the image using the neural network, wherein the neural network has been trained by: receiving input data including a first set of labels and a second set of labels; using an encoder module, extracting features from the input data; using a first task head of a multi-headed attention mechanism, computing a first training loss metric using the extracted features and the first set of labels; using a second task head of the multi-headed attention mechanism, computing a second training loss metric using the extracted features and the second set of labels; applying a first mask to filter the first training loss metric, wherein the first mask is computed based on the first set of labels; applying a second mask to filter the second training loss metric, wherein the second mask is computed based on the second set of labels; computing a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric; and updating the neural network based on the final training loss metric. 12. The method of claim 11 , wherein the first task head includes a classification neural network model. 13. The method of claim 12 , wherein the classification neural network model includes a facial expression capturing neural network model configured to compute, using the extracted features, a prediction indicating a facial expression that is one of a predetermined number of facial expression categories, wherein the first training loss metric is computed by comparing the first set of labels with the prediction indicating the facial expression. 14. The method of claim 11 , wherein the second task head includes a regression neural network model. 15. The method of claim 14 , wherein the regression neural network model includes a facial landmark regression model configured to compute, using the extracted features, a prediction indicating coordinates of a facial landmark, wherein the second training loss metric is computed by comparing the second set of labels with the prediction indicating the coordinates of the facial landmark. 16. The method of claim 11 , wherein the first set of labels includes annotations for a portion of the input data and missing annotations for the remaining portion of the input data. 17. The method of claim 16 , wherein the first mask includes a binary mask having zero values corresponding to the missing annotations of the first set of labels. 18. The method of claim 11 , wherein the input data includes image data. 19. The method of claim 18 , wherein the image data includes an image associated with: a first image label from the first set of labels, the first image label indicating a facial expression; and a second image label from the second set of labels, the second image label indicates a missing annotation. 20. A computer system for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism, the computer system comprising: a processor coupled to a storage medium that stores instructions, which, upon execution by the processor, cause the processor to: receive image data including a set of facial expression labels and a set of facial landmark labels; using the encoder module, extract features from the image data; using a facial expression classification task head of the multi-headed attention mechanism, compute a first training loss metric using the extracted features and the set of facial expression labels; using a facial landmark regression task head of the multi-headed attention mechanism, compute a second training loss metric using the extracted features and the set of facial landmark labels; apply a first mask to filter the first training loss metric, wherein the first mask is computed based on the set of facial expression labels; apply a second mask to filter the second training loss metric, wherein the second mask is computed based on the set of facial landmark labels; and compute a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric.

Assignees

Inventors

Classifications

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Supervised learning · CPC title

  • G06N3/0455Primary

    Auto-encoder networks; Encoder-decoder networks · CPC title

  • Active pattern-learning, e.g. online learning of image or video features · CPC title

  • using regression, e.g. by projecting features on hyperplanes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12243292B2 cover?
Systems and methods for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism are provided. In one aspect, the system includes a processor configured to receive input data including a first set of labels and a second set of labels. Using the encoder module, features are extracted from the input data. Using a multi-headed attention mecha…
Who is the assignee on this patent?
Lemon Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/0455. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).