Methods and software for detecting objects in images using a multiscale fast region-based convolutional neural network

US10354362B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10354362-B2
Application numberUS-201715698887-A
CountryUS
Kind codeB2
Filing dateSep 8, 2017
Priority dateSep 8, 2016
Publication dateJul 16, 2019
Grant dateJul 16, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods of detecting an object in an image using a convolutional neural network based architecture that processes multiple feature maps of differing scales from differing convolution layers within a convolutional network to create a regional-proposal bounding box. The bounding box is projected back to the feature maps of the individual convolution layers to obtain a set of regions of interest. These regions of interest are then processed to ultimately create a confidence score representing the confidence that the object detected in the bounding box is the desired object. These processes allow the method to utilize deep features encoded in both the global and the local representation for object regions, allowing the method to robustly deal with challenges in the problem of robust object detection. Software for executing the disclosed methods within an object-detection system is also disclosed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising: receiving the image and storing it in computer memory; sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales; pooling at least one of the feature maps to create a corresponding at least one pooled feature map; normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps; concatenating the series of normalized feature maps together with one another to create a concatenated feature map; dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map; processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification; if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps; pooling each of the regions of interest to create a corresponding pooled region of interest; normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest; concatenating the normalized regions of interest with one another to create a concatenated region of interest; dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest; processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and storing the bounding box and the confidence score in the computer memory in association with the image. 2. The method according to claim 1 , wherein the normalizing of the at least one pooled feature map and each of the feature maps not pooled is performed using an L2 normalization. 3. The method according to claim 1 , wherein the processing of the dimensionally reduced region of interest to generate a determined confidence score includes using a softmax function. 4. The method according to claim 1 , wherein the desired classification is a human face. 5. The method according to claim 1 , further comprising the annotating the image to include a visual depiction of the bounding box and the confidence score. 6. The method according to claim 1 , wherein the pooling of at least one of the feature maps includes using a max pooling algorithm. 7. The method according to claim 1 , wherein the pooling of at least one of the feature maps includes pooling at least two of the feature maps. 8. The method according to claim 1 , wherein the normalization of the pooled regions of interest is performed using an L2 normalization. 9. The method according to claim 1 , wherein dimensionally reducing the concatenated region of interest includes using a 1×1 convolution. 10. The method according to claim 1 , further comprising displaying to a user on an electronic display, the image, a visual depiction of the bounding box overlaid on the image, and a the confidence score displayed in association with the bounding box. 11. A computer-readable storage medium containing computer-executable instructions that, when executed by a computing system, performs a method of processing an image to detect the presence of one or more objects of a desired classification in the image, the method being performed in an object-detection system and comprising: receiving the image and storing it in computer memory; sequentially convolving the image in a series of at least two convolution layers to create a corresponding series of feature maps of differing scales; pooling at least one of the feature maps to create a corresponding at least one pooled feature map; normalizing, relative to one another, the at least one pooled feature map and each of the feature maps not pooled to create a series of normalized feature maps; concatenating the series of normalized feature maps together with one another to create a concatenated feature map; dimensionally reducing the concatenated feature map to create a dimensionally reduced feature map; processing the dimensionally reduced feature map in a first set of fully connected layers to create a proposal comprising a bounding box corresponding to a suspected object of the desired classification in the image and an objectness score for the suspected object, wherein the first set of fully connected layers has been trained on the desired classification; if the objectness score exceeds a predetermined threshold, then projecting the bounding box back to each of the at least two feature maps to identify a region of interest in each of the at least two feature maps; pooling each of the regions of interest to create a corresponding pooled region of interest; normalizing, relative one another, the pooled regions of interest to create a set of normalized regions of interest; concatenating the normalized regions of interest with one another to create a concatenated region of interest; dimensionally reducing the concatenated region of interest to create a dimensionally reduced region of interest; processing the dimensionally reduced region of interest in a second set of fully connected layers to generate a confidence score for the region of interest, wherein the second set of fully connected layers is trained on the desired classification; and storing the bounding box and the confidence score in the computer memory in association with the image. 12. The computer-readable storage medium according to claim 11 , wherein the normalizing of the at least one pooled feature map and each of the feature maps not pooled is performed using an L2 normalization. 13. The computer-readable storage medium according to claim 11 , wherein the processing of the dimensionally reduced region of interest to generate a determined confidence score includes using a softmax function. 14. The computer-readable storage medium according to claim 11 , wherein the desired classification is a human face. 15. The computer-readable storage medium according to claim 11 , further comprising the annotating the image to include a visual depiction of the bounding box and the confidence score. 16. The computer-readable storage medium according to claim 11 , wherein the pooling of at least one of the feature maps includes using a max pooling algorithm. 17. The computer-readable storage medium according to claim 11 , wherein the pooling of at least one of the feature maps includes pooling at least two of the feature maps. 18. The computer-readable storage medium according to claim 11 , wherein the normalization of the pooled regions of interest is performed using an L2 normalization. 19. The computer-readable storage medium according to claim 11 , wherein dimensionally reducing the concatenated region of interest includes using a 1×1 convolution. 20. The computer-readable storage medium according to claim 11 , further comprising displa

Assignees

Inventors

Classifications

  • G06T3/4046Primary

    using neural networks · CPC title

  • G06V10/82Primary

    using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • involving optimisations, e.g. using regularisation techniques · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10354362B2 cover?
Methods of detecting an object in an image using a convolutional neural network based architecture that processes multiple feature maps of differing scales from differing convolution layers within a convolutional network to create a regional-proposal bounding box. The bounding box is projected back to the feature maps of the individual convolution layers to obtain a set of regions of interest. …
Who is the assignee on this patent?
Univ Carnegie Mellon
What technology area does this patent fall under?
Primary CPC classification G06T3/4046. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 16 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).