Deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition
US-2018211099-A1 · Jul 26, 2018 · US
US10354362B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10354362-B2 |
| Application number | US-201715698887-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 8, 2017 |
| Priority date | Sep 8, 2016 |
| Publication date | Jul 16, 2019 |
| Grant date | Jul 16, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods of detecting an object in an image using a convolutional neural network based architecture that processes multiple feature maps of differing scales from differing convolution layers within a convolutional network to create a regional-proposal bounding box. The bounding box is projected back to the feature maps of the individual convolution layers to obtain a set of regions of interest. These regions of interest are then processed to ultimately create a confidence score representing the confidence that the object detected in the bounding box is the desired object. These processes allow the method to utilize deep features encoded in both the global and the local representation for object regions, allowing the method to robustly deal with challenges in the problem of robust object detection. Software for executing the disclosed methods within an object-detection system is also disclosed.
Opening claim text (preview).
What is claimed is: 1. A method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising: receiving the image and storing it in computer memory; sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales; pooling at least one of the feature maps to create a corresponding at least one pooled feature map; normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps; concatenating the series of normalized feature maps together with one another to create a concatenated feature map; dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map; processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification; if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps; pooling each of the regions of interest to create a corresponding pooled region of interest; normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest; concatenating the normalized regions of interest with one another to create a concatenated region of interest; dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest; processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and storing the bounding box and the confidence score in the computer memory in association with the image. 2. The method according to claim 1 , wherein the normalizing of the at least one pooled feature map and each of the feature maps not pooled is performed using an L2 normalization. 3. The method according to claim 1 , wherein the processing of the dimensionally reduced region of interest to generate a determined confidence score includes using a softmax function. 4. The method according to claim 1 , wherein the desired classification is a human face. 5. The method according to claim 1 , further comprising the annotating the image to include a visual depiction of the bounding box and the confidence score. 6. The method according to claim 1 , wherein the pooling of at least one of the feature maps includes using a max pooling algorithm. 7. The method according to claim 1 , wherein the pooling of at least one of the feature maps includes pooling at least two of the feature maps. 8. The method according to claim 1 , wherein the normalization of the pooled regions of interest is performed using an L2 normalization. 9. The method according to claim 1 , wherein dimensionally reducing the concatenated region of interest includes using a 1×1 convolution. 10. The method according to claim 1 , further comprising displaying to a user on an electronic display, the image, a visual depiction of the bounding box overlaid on the image, and a the confidence score displayed in association with the bounding box. 11. A computer-readable storage medium containing computer-executable instructions that, when executed by a computing system, performs a method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising: receiving the image and storing it in computer memory; sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales; pooling at least one of the feature maps to create a corresponding at least one pooled feature map; normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps; concatenating the series of normalized feature maps together with one another to create a concatenated feature map; dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map; processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification; if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps; pooling each of the regions of interest to create a corresponding pooled region of interest; normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest; concatenating the normalized regions of interest with one another to create a concatenated region of interest; dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest; processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and storing the bounding box and the confidence score in the computer memory in association with the image. 12. The computer-readable storage medium according to claim 11 , wherein the normalizing of the at least one pooled feature map and each of the feature maps not pooled is performed using an L2 normalization. 13. The computer-readable storage medium according to claim 11 , wherein the processing of the dimensionally reduced region of interest to generate a determined confidence score includes using a softmax function. 14. The computer-readable storage medium according to claim 11 , wherein the desired classification is a human face. 15. The computer-readable storage medium according to claim 11 , further comprising the annotating the image to include a visual depiction of the bounding box and the confidence score. 16. The computer-readable storage medium according to claim 11 , wherein the pooling of at least one of the feature maps includes using a max pooling algorithm. 17. The computer-readable storage medium according to claim 11 , wherein the pooling of at least one of the feature maps includes pooling at least two of the feature maps. 18. The computer-readable storage medium according to claim 11 , wherein the normalization of the pooled regions of interest is performed using an L2 normalization. 19. The computer-readable storage medium according to claim 11 , wherein dimensionally reducing the concatenated region of interest includes using a 1×1 convolution. 20. The computer-readable storage medium according to claim 11 , further comprising displa
using neural networks · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
involving optimisations, e.g. using regularisation techniques · CPC title
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.