System and method for a unified architecture multi-task deep learning machine for object recognition

US11645869B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11645869-B2
Application numberUS-202016808357-A
CountryUS
Kind codeB2
Filing dateMar 3, 2020
Priority dateMay 28, 2016
Publication dateMay 9, 2023
Grant dateMay 9, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system to recognize objects in an image includes an object detection network outputs a first hierarchical-calculated feature for a detected object. A face alignment regression network determines a regression loss for alignment parameters based on the first hierarchical-calculated feature. A detection box regression network determines a regression loss for detected boxes based on the first hierarchical-calculated feature. The object detection network further includes a weighted loss generator to generate a weighted loss for the first hierarchical-calculated feature, the regression loss for the alignment parameters and the regression loss of the detected boxes. A backpropagator backpropagates the generated weighted loss. A grouping network forms, based on the first hierarchical-calculated feature, the regression loss for the alignment parameters and the bounding box regression loss, at least one of a box grouping, an alignment parameter grouping, and a non-maximum suppression of the alignment parameters and the detected boxes.

First claim

Opening claim text (preview).

What is claimed is: 1. A system to detect objects in an image, the system comprising: an alignment network that is configured to be executed by at least one processor, wherein the alignment network receives as an input a feature for a detected object in an image, the alignment network to determine an alignment regression loss associated with alignment parameters based on the feature for the detected object; a classification network that is configured to be executed by the at least one processor, wherein the classification network receives as an input the feature for the detected object, the classification network to determine a classification regression loss for a classification score for the feature of the detected object; and a loss generator that is configured to be executed by the at least one processor, wherein the loss generator receives an output from the alignment network of the alignment regression loss and an output from the classification network of the classification regression loss and generates a multi-task loss function based on the alignment regression loss and the classification regression loss. 2. The system of claim 1 , further comprising a detection box network that is configured to be executed by the at least one processor, wherein the detection box network receives as an input the feature for the detected object, the detection box network to determine a bounding-box regression loss for a bounding box for the feature for the detected object, and wherein the loss generator receives an output from the detection box network and generates the multi-task loss function further based on the bounding-box regression loss. 3. The system of claim 2 , wherein the alignment parameters comprise one or more of a keypoint, a tilt angle and an affine transformation parameter for the feature of the detected object, and wherein the bounding box comprises a region of interest in the image, and wherein the keypoint may represent an eye, a nose, a left mouth corner or a right mouth corner. 4. The system of claim 2 , further comprising: a backpropagator to backpropagate the alignment regression loss, the classification regression loss and the bounding-box regression loss of the multi-task loss function; and a grouping network to form, based on the alignment regression loss, the classification regression loss and the bounding-box regression loss, at least one of a box grouping, an alignment parameter grouping, and a non-maximum suppression of the alignment parameters and the bounding box. 5. The system of claim 4 , wherein the loss generator further adjusts the alignment regression loss based on the non-maximum suppression of the alignment parameters and the bounding box. 6. The system of claim 4 , wherein the loss generator further adjusts the bounding-box regression loss based on the box grouping. 7. The system of claim 6 , further comprising: an aligner to apply the alignment parameter grouping to generate and output aligned regions of the image; a second convolution neural network to generate a second feature from the aligned output regions of the image; a second classification network to generate a subsequent level classification score of the detected object on a subsequent level classification hierarchy based on the second feature from the aligned output regions of the image; and a verification network to be trained using the subsequent level classification score of the detected object, wherein two or more of the alignment network, the classification network, the detection box network, the aligner, the second convolution neural network, the second classification network, and the verification network simultaneously provide one or more of joint training and joint sharing of calculations dynamically. 8. The system of claim 7 , wherein the system further comprises an object detection network to detect the object in the image, the object detection network comprising a convolutional neural network to output the feature for the detected object. 9. The system of claim 8 , wherein the convolutional neural network further outputs a corresponding feature for each detected object in a plurality of input images, wherein the plurality of input images are each scaled to a different size from the input image. 10. The system of claim 2 , wherein the multi-task loss function comprises: l ({π i (l) } i ,{b i (l) } i )=Σ i cls (π i (l) ,u i (l) )+λ l loc ( b i,u (l) ,v i (l) )+γ l reg ({circumflex over (θ)} i (l) ,θ (l) ) in which i denotes an index of an anchor in a mini-batch of training data, π i (l) and b i (l) respectively denote a corresponding probability mass function over level-l classes and their bounding box coordinates, θ (l) represents alignment and affine transformation parameters at level-l that are estimated by {circumflex over (θ)} (l) , λ and γ represent hyper parameters, u i (l) denotes a ground truth class at level l, v i (l) denotes bounding box coordinates of a corresponding truth, cls is the classification regression loss which is a classification loss that is provided by a softmax loss given by cls (p, x)=−log p x for a probability mass function p of x, loc is the bounding-box regression loss and is a function of true box coordinates and box coordinates predicted for a true class, and reg is the alignment regression loss that represents an error between learned and true parameters of alignment parameters. 11. A system to detect objects in an image, the system comprising: an alignment network that is configured to be executed by at least one processor, wherein the alignment network receives as an input a feature for a detected object in an image, the alignment network to determine an alignment regression loss associated with alignment parameters based on the feature for the detected object; a detection box network that is configured to be executed by the at least one processor, wherein the detection box network receives as an input the feature for the detected object, the detection box network to determine a bounding-box regression loss for a bounding box for the feature for the detected object; and a loss generator that is configured to be executed by the at least one processor, wherein the loss generator receives an output from the alignment network of the alignment regression loss and an output from the detection box network of the bounding-box regression loss and generates a multi-task loss function based on the alignment regression loss and the bounding-box regression loss. 12. The system of claim 11 , further comprising a classification network that is configured to be executed by the at least one processor, wherein the classification network receives as an input the feature for the detected object, the classification network to determine a classification regression loss for a classification score for the feature of the detected object, and wherein the loss generator receives an output from the classification network and generates the multi-task loss function further based on the classification regression loss. 13. The system of claim 12 , wherein the alignment parameters comprise one or more of a keypoint, a tilt angle and an affine transformation parameter for the feature of the detected object, and wherein the bounding box comprises a region of interest in the image, and wherein the keypoint may represent an eye, a nose, a left mouth corner or a right mouth corner. 14. The system of claim 12 , further comprising: a backpropagator to backpropagate the alignment regression loss, the classification regression loss and the bo

Assignees

Inventors

Classifications

  • Detection; Localisation; Normalisation · CPC title

  • Artificial neural networks [ANN] · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

  • G06V40/168Primary

    Feature extraction; Face representation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11645869B2 cover?
A system to recognize objects in an image includes an object detection network outputs a first hierarchical-calculated feature for a detected object. A face alignment regression network determines a regression loss for alignment parameters based on the first hierarchical-calculated feature. A detection box regression network determines a regression loss for detected boxes based on the first hie…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).