Systems and Methods for Privacy-Preserving Optics
US-2024289990-A1 · Aug 29, 2024 · US
US12243292B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12243292-B2 |
| Application number | US-202217929449-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 2, 2022 |
| Priority date | Sep 2, 2022 |
| Publication date | Mar 4, 2025 |
| Grant date | Mar 4, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism are provided. In one aspect, the system includes a processor configured to receive input data including a first set of labels and a second set of labels. Using the encoder module, features are extracted from the input data. Using a multi-headed attention mechanism, training loss metrics are computed. A first training loss metric is computed using the extracted features and the first set of labels, and a second training loss metric is computed using the extracted features and the second set of labels. A first mask is applied to filter the first training loss metric, and a second mask is applied to filter the second training loss metric. A final training loss metric is computed based on the filtered first and second training loss metrics.
Opening claim text (preview).
The invention claimed is: 1. A computer system for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism, the computer system comprising: a processor coupled to a storage medium that stores instructions, which, upon execution by the processor, cause the processor to: receive input data including a first set of labels and a second set of labels; using the encoder module, extract features from the input data; using a first task head of the multi-headed attention mechanism, compute a first training loss metric using the extracted features and the first set of labels; using a second task head of the multi-headed attention mechanism, compute a second training loss metric using the extracted features and the second set of labels; apply a first mask to filter the first training loss metric, wherein the first mask is computed based on the first set of labels; apply a second mask to filter the second training loss metric, wherein the second mask is computed based on the second set of labels; and compute a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric. 2. The computer system of claim 1 , wherein the first task head includes a classification neural network model. 3. The computer system of claim 2 , wherein the classification neural network model includes a facial expression capturing neural network model configured to compute, using the extracted features, a prediction indicating a facial expression that is one of a predetermined number of facial expression categories, wherein the first training loss metric is computed by comparing the first set of labels with the prediction indicating the facial expression. 4. The computer system of claim 1 , wherein the second task head includes a regression neural network model. 5. The computer system of claim 4 , wherein the regression neural network model includes a facial landmark regression model configured to compute, using the extracted features, a prediction indicating coordinates of a facial landmark, wherein the second training loss metric is computed by comparing the second set of labels with the prediction indicating the coordinates of the facial landmark. 6. The computer system of claim 1 , wherein the first set of labels includes annotations for a portion of the input data and missing annotations for the remaining portion of the input data. 7. The computer system of claim 6 , wherein the first mask includes a binary mask having zero values corresponding to the missing annotations of the first set of labels. 8. The computer system of claim 1 , wherein the input data includes image data. 9. The computer system of claim 8 , wherein the image data includes an image associated with: a first label from the first set of labels, the first label indicating a facial expression; and a second label from the second set of labels, the second label indicates a missing annotation. 10. The computer system of claim 1 , wherein the encoder module includes one or more of a convolutional neural network, a recurrent neural network, a transformer, or a sub-network. 11. A method for performing an inference task using a neural network, the method comprising: providing the neural network; receive an image; and compute a result by processing the image using the neural network, wherein the neural network has been trained by: receiving input data including a first set of labels and a second set of labels; using an encoder module, extracting features from the input data; using a first task head of a multi-headed attention mechanism, computing a first training loss metric using the extracted features and the first set of labels; using a second task head of the multi-headed attention mechanism, computing a second training loss metric using the extracted features and the second set of labels; applying a first mask to filter the first training loss metric, wherein the first mask is computed based on the first set of labels; applying a second mask to filter the second training loss metric, wherein the second mask is computed based on the second set of labels; computing a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric; and updating the neural network based on the final training loss metric. 12. The method of claim 11 , wherein the first task head includes a classification neural network model. 13. The method of claim 12 , wherein the classification neural network model includes a facial expression capturing neural network model configured to compute, using the extracted features, a prediction indicating a facial expression that is one of a predetermined number of facial expression categories, wherein the first training loss metric is computed by comparing the first set of labels with the prediction indicating the facial expression. 14. The method of claim 11 , wherein the second task head includes a regression neural network model. 15. The method of claim 14 , wherein the regression neural network model includes a facial landmark regression model configured to compute, using the extracted features, a prediction indicating coordinates of a facial landmark, wherein the second training loss metric is computed by comparing the second set of labels with the prediction indicating the coordinates of the facial landmark. 16. The method of claim 11 , wherein the first set of labels includes annotations for a portion of the input data and missing annotations for the remaining portion of the input data. 17. The method of claim 16 , wherein the first mask includes a binary mask having zero values corresponding to the missing annotations of the first set of labels. 18. The method of claim 11 , wherein the input data includes image data. 19. The method of claim 18 , wherein the image data includes an image associated with: a first image label from the first set of labels, the first image label indicating a facial expression; and a second image label from the second set of labels, the second image label indicates a missing annotation. 20. A computer system for multi-task joint training of a neural network including an encoder module and a multi-headed attention mechanism, the computer system comprising: a processor coupled to a storage medium that stores instructions, which, upon execution by the processor, cause the processor to: receive image data including a set of facial expression labels and a set of facial landmark labels; using the encoder module, extract features from the image data; using a facial expression classification task head of the multi-headed attention mechanism, compute a first training loss metric using the extracted features and the set of facial expression labels; using a facial landmark regression task head of the multi-headed attention mechanism, compute a second training loss metric using the extracted features and the set of facial landmark labels; apply a first mask to filter the first training loss metric, wherein the first mask is computed based on the set of facial expression labels; apply a second mask to filter the second training loss metric, wherein the second mask is computed based on the set of facial landmark labels; and compute a final training loss metric based on the filtered first training loss metric and the filtered second training loss metric.
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Supervised learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Active pattern-learning, e.g. online learning of image or video features · CPC title
using regression, e.g. by projecting features on hyperplanes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.