Cascaded convolutional neural network
US-2019042892-A1 · Feb 7, 2019 · US
US11645869B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11645869-B2 |
| Application number | US-202016808357-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 3, 2020 |
| Priority date | May 28, 2016 |
| Publication date | May 9, 2023 |
| Grant date | May 9, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system to recognize objects in an image includes an object detection network outputs a first hierarchical-calculated feature for a detected object. A face alignment regression network determines a regression loss for alignment parameters based on the first hierarchical-calculated feature. A detection box regression network determines a regression loss for detected boxes based on the first hierarchical-calculated feature. The object detection network further includes a weighted loss generator to generate a weighted loss for the first hierarchical-calculated feature, the regression loss for the alignment parameters and the regression loss of the detected boxes. A backpropagator backpropagates the generated weighted loss. A grouping network forms, based on the first hierarchical-calculated feature, the regression loss for the alignment parameters and the bounding box regression loss, at least one of a box grouping, an alignment parameter grouping, and a non-maximum suppression of the alignment parameters and the detected boxes.
Opening claim text (preview).
What is claimed is: 1. A system to detect objects in an image, the system comprising: an alignment network that is configured to be executed by at least one processor, wherein the alignment network receives as an input a feature for a detected object in an image, the alignment network to determine an alignment regression loss associated with alignment parameters based on the feature for the detected object; a classification network that is configured to be executed by the at least one processor, wherein the classification network receives as an input the feature for the detected object, the classification network to determine a classification regression loss for a classification score for the feature of the detected object; and a loss generator that is configured to be executed by the at least one processor, wherein the loss generator receives an output from the alignment network of the alignment regression loss and an output from the classification network of the classification regression loss and generates a multi-task loss function based on the alignment regression loss and the classification regression loss. 2. The system of claim 1 , further comprising a detection box network that is configured to be executed by the at least one processor, wherein the detection box network receives as an input the feature for the detected object, the detection box network to determine a bounding-box regression loss for a bounding box for the feature for the detected object, and wherein the loss generator receives an output from the detection box network and generates the multi-task loss function further based on the bounding-box regression loss. 3. The system of claim 2 , wherein the alignment parameters comprise one or more of a keypoint, a tilt angle and an affine transformation parameter for the feature of the detected object, and wherein the bounding box comprises a region of interest in the image, and wherein the keypoint may represent an eye, a nose, a left mouth corner or a right mouth corner. 4. The system of claim 2 , further comprising: a backpropagator to backpropagate the alignment regression loss, the classification regression loss and the bounding-box regression loss of the multi-task loss function; and a grouping network to form, based on the alignment regression loss, the classification regression loss and the bounding-box regression loss, at least one of a box grouping, an alignment parameter grouping, and a non-maximum suppression of the alignment parameters and the bounding box. 5. The system of claim 4 , wherein the loss generator further adjusts the alignment regression loss based on the non-maximum suppression of the alignment parameters and the bounding box. 6. The system of claim 4 , wherein the loss generator further adjusts the bounding-box regression loss based on the box grouping. 7. The system of claim 6 , further comprising: an aligner to apply the alignment parameter grouping to generate and output aligned regions of the image; a second convolution neural network to generate a second feature from the aligned output regions of the image; a second classification network to generate a subsequent level classification score of the detected object on a subsequent level classification hierarchy based on the second feature from the aligned output regions of the image; and a verification network to be trained using the subsequent level classification score of the detected object, wherein two or more of the alignment network, the classification network, the detection box network, the aligner, the second convolution neural network, the second classification network, and the verification network simultaneously provide one or more of joint training and joint sharing of calculations dynamically. 8. The system of claim 7 , wherein the system further comprises an object detection network to detect the object in the image, the object detection network comprising a convolutional neural network to output the feature for the detected object. 9. The system of claim 8 , wherein the convolutional neural network further outputs a corresponding feature for each detected object in a plurality of input images, wherein the plurality of input images are each scaled to a different size from the input image. 10. The system of claim 2 , wherein the multi-task loss function comprises: l ({π i (l) } i ,{b i (l) } i )=Σ i cls (π i (l) ,u i (l) )+λ l loc ( b i,u (l) ,v i (l) )+γ l reg ({circumflex over (θ)} i (l) ,θ (l) ) in which i denotes an index of an anchor in a mini-batch of training data, π i (l) and b i (l) respectively denote a corresponding probability mass function over level-l classes and their bounding box coordinates, θ (l) represents alignment and affine transformation parameters at level-l that are estimated by {circumflex over (θ)} (l) , λ and γ represent hyper parameters, u i (l) denotes a ground truth class at level l, v i (l) denotes bounding box coordinates of a corresponding truth, cls is the classification regression loss which is a classification loss that is provided by a softmax loss given by cls (p, x)=−log p x for a probability mass function p of x, loc is the bounding-box regression loss and is a function of true box coordinates and box coordinates predicted for a true class, and reg is the alignment regression loss that represents an error between learned and true parameters of alignment parameters. 11. A system to detect objects in an image, the system comprising: an alignment network that is configured to be executed by at least one processor, wherein the alignment network receives as an input a feature for a detected object in an image, the alignment network to determine an alignment regression loss associated with alignment parameters based on the feature for the detected object; a detection box network that is configured to be executed by the at least one processor, wherein the detection box network receives as an input the feature for the detected object, the detection box network to determine a bounding-box regression loss for a bounding box for the feature for the detected object; and a loss generator that is configured to be executed by the at least one processor, wherein the loss generator receives an output from the alignment network of the alignment regression loss and an output from the detection box network of the bounding-box regression loss and generates a multi-task loss function based on the alignment regression loss and the bounding-box regression loss. 12. The system of claim 11 , further comprising a classification network that is configured to be executed by the at least one processor, wherein the classification network receives as an input the feature for the detected object, the classification network to determine a classification regression loss for a classification score for the feature of the detected object, and wherein the loss generator receives an output from the classification network and generates the multi-task loss function further based on the classification regression loss. 13. The system of claim 12 , wherein the alignment parameters comprise one or more of a keypoint, a tilt angle and an affine transformation parameter for the feature of the detected object, and wherein the bounding box comprises a region of interest in the image, and wherein the keypoint may represent an eye, a nose, a left mouth corner or a right mouth corner. 14. The system of claim 12 , further comprising: a backpropagator to backpropagate the alignment regression loss, the classification regression loss and the bo
Detection; Localisation; Normalisation · CPC title
Artificial neural networks [ANN] · CPC title
using neural networks · CPC title
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
Feature extraction; Face representation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.