What technology area does this patent fall under?

Primary CPC classification G06N3/0455. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems for multi-task joint training of neural networks using multi-label datasets

US12243292B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12243292-B2
Application number	US-202217929449-A
Country	US
Kind code	B2
Filing date	Sep 2, 2022
Priority date	Sep 2, 2022
Publication date	Mar 4, 2025
Grant date	Mar 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism are provided. In one aspect, the system includes a processor configured to receive input data including a first set of labels and a second set of labels. Using the encoder module, features are extracted from the input data. Using a multi-headed attention mechanism, training loss metrics are computed. A first training loss metric is computed using the extracted features and the first set of labels, and a second training loss metric is computed using the extracted features and the second set of labels. A first mask is applied to filter the first training loss metric, and a second mask is applied to filter the second training loss metric. A final training loss metric is computed based on the filtered first and second training loss metrics.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer system for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism, the computer system comprising: a processor coupled to a storage medium that stores instructions, which, upon execution by the processor, cause the processor to: receive input data including a first set of labels and a second set of labels; using the encoder module, extract features from the input data; using a first task head of the multi-headed attention mechanism, compute a first training loss metric using the extracted features and the first set of labels; using a second task head of the multi-headed attention mechanism, compute a second training loss metric using the extracted features and the second set of labels; apply a first mask to filter the first training loss metric, wherein the first mask is computed based on the first set of labels; apply a second mask to filter the second training loss metric, wherein the second mask is computed based on the second set of labels; and compute a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric. 2. The computer system of claim 1 , wherein the first task head includes a classification neural network model. 3. The computer system of claim 2 , wherein the classification neural network model includes a facial expression capturing neural network model configured to compute, using the extracted features, a prediction indicating a facial expression that is one of a predetermined number of facial expression categories, wherein the first training loss metric is computed by comparing the first set of labels with the prediction indicating the facial expression. 4. The computer system of claim 1 , wherein the second task head includes a regression neural network model. 5. The computer system of claim 4 , wherein the regression neural network model includes a facial landmark regression model configured to compute, using the extracted features, a prediction indicating coordinates of a facial landmark, wherein the second training loss metric is computed by comparing the second set of labels with the prediction indicating the coordinates of the facial landmark. 6. The computer system of claim 1 , wherein the first set of labels includes annotations for a portion of the input data and missing annotations for the remaining portion of the input data. 7. The computer system of claim 6 , wherein the first mask includes a binary mask having zero values corresponding to the missing annotations of the first set of labels. 8. The computer system of claim 1 , wherein the input data includes image data. 9. The computer system of claim 8 , wherein the image data includes an image associated with: a first label from the first set of labels, the first label indicating a facial expression; and a second label from the second set of labels, the second label indicates a missing annotation. 10. The computer system of claim 1 , wherein the encoder module includes one or more of a convolutional neural network, a recurrent neural network, a transformer, or a sub-network. 11. A method for performing an inference task using a neural network, the method comprising: providing the neural network; receive an image; and compute a result by processing the image using the neural network, wherein the neural network has been trained by: receiving input data including a first set of labels and a second set of labels; using an encoder module, extracting features from the input data; using a first task head of a multi-headed attention mechanism, computing a first training loss metric using the extracted features and the first set of labels; using a second task head of the multi-headed attention mechanism, computing a second training loss metric using the extracted features and the second set of labels; applying a first mask to filter the first training loss metric, wherein the first mask is computed based on the first set of labels; applying a second mask to filter the second training loss metric, wherein the second mask is computed based on the second set of labels; computing a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric; and updating the neural network based on the final training loss metric. 12. The method of claim 11 , wherein the first task head includes a classification neural network model. 13. The method of claim 12 , wherein the classification neural network model includes a facial expression capturing neural network model configured to compute, using the extracted features, a prediction indicating a facial expression that is one of a predetermined number of facial expression categories, wherein the first training loss metric is computed by comparing the first set of labels with the prediction indicating the facial expression. 14. The method of claim 11 , wherein the second task head includes a regression neural network model. 15. The method of claim 14 , wherein the regression neural network model includes a facial landmark regression model configured to compute, using the extracted features, a prediction indicating coordinates of a facial landmark, wherein the second training loss metric is computed by comparing the second set of labels with the prediction indicating the coordinates of the facial landmark. 16. The method of claim 11 , wherein the first set of labels includes annotations for a portion of the input data and missing annotations for the remaining portion of the input data. 17. The method of claim 16 , wherein the first mask includes a binary mask having zero values corresponding to the missing annotations of the first set of labels. 18. The method of claim 11 , wherein the input data includes image data. 19. The method of claim 18 , wherein the image data includes an image associated with: a first image label from the first set of labels, the first image label indicating a facial expression; and a second image label from the second set of labels, the second image label indicates a missing annotation. 20. A computer system for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism, the computer system comprising: a processor coupled to a storage medium that stores instructions, which, upon execution by the processor, cause the processor to: receive image data including a set of facial expression labels and a set of facial landmark labels; using the encoder module, extract features from the image data; using a facial expression classification task head of the multi-headed attention mechanism, compute a first training loss metric using the extracted features and the set of facial expression labels; using a facial landmark regression task head of the multi-headed attention mechanism, compute a second training loss metric using the extracted features and the set of facial landmark labels; apply a first mask to filter the first training loss metric, wherein the first mask is computed based on the set of facial expression labels; apply a second mask to filter the second training loss metric, wherein the second mask is computed based on the set of facial landmark labels; and compute a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric.

Assignees

Lemon Inc

Inventors

Classifications

G06V10/454
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0455Primary
Auto-encoder networks; Encoder-decoder networks · CPC title
G06V10/778
Active pattern-learning, e.g. online learning of image or video features · CPC title
G06V10/766
using regression, e.g. by projecting features on hyperplanes · CPC title

Patent family

Related publications grouped by family.

View patent family 90043903

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12243292B2 cover?: Systems and methods for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism are provided. In one aspect, the system includes a processor configured to receive input data including a first set of labels and a second set of labels. Using the encoder module, features are extracted from the input data. Using a multi-headed attention mecha…
Who is the assignee on this patent?: Lemon Inc
What technology area does this patent fall under?: Primary CPC classification G06N3/0455. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).