Object detection using cascaded convolutional neural networks
US-9418319-B2 · Aug 16, 2016 · US
US9858496B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9858496-B2 |
| Application number | US-201615001417-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 20, 2016 |
| Priority date | Jan 20, 2016 |
| Publication date | Jan 2, 2018 |
| Grant date | Jan 2, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and computer-readable media for providing fast and accurate object detection and classification in images are described herein. In some examples, a computing device can receive an input image. The computing device can process the image, and generate a convolutional feature map. In some configurations, the convolutional feature map can be processed through a Region Proposal Network (RPN) to generate proposals for candidate objects in the image. In various examples, the computing device can process the convolutional feature map with the proposals through a Fast Region-Based Convolutional Neural Network (FRCN) proposal classifier to determine a class of each object in the image and a confidence score associated therewith. The computing device can then provide a requestor with an output including the object classification and/or confidence score.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving an input image; generating a convolutional feature map; identifying, by a first type of neural network, a candidate object in the input image; determining, by a second type of neural network, a category of the candidate object; and assigning a confidence score to the category of the candidate object, wherein the first type of neural network comprises a translation invariant component configured to: classify an anchor based on overlap with a ground-truth box; and predict a shift and a scale of the anchor. 2. A method as claim 1 recites, wherein the convolutional feature map is generated by a Zeiler and Fergus model or a Simonyan and Zisserman model deep convolutional neural network. 3. A method as claim 1 recites, further comprising training the convolutional feature map, the first type of neural network, and the second type of neural network using at least one of: stochastic gradient descent; or back-propagation. 4. A method as comprising: receiving an input image; generating a convolutional feature map; identifying, by a first type of neural network, a candidate object in the input image, wherein the identifying the candidate object in the input image comprises: generating one or more anchors at a point of the input image; determining an overlap of individual ones of the one or more anchors to a ground-truth box; assigning a label to each anchor of the one or more anchors based at least in part on the overlap; assigning a score to the label based at least in part on the overlap; and identifying the candidate object at the point based at least in part on the score; determining, by a second type of neural network, a category of the candidate object, wherein the first type of neural network and the second type of neural network share at least one algorithm; and assigning a confidence score to the category of the candidate object. 5. A method as claim 4 recites, wherein the identifying the candidate object in the input image further comprises: identifying an anchor corresponding to a highest score, the highest score corresponding to a percentage of the overlap; shifting the anchor corresponding to the highest score to better define the candidate object; and scaling the anchor corresponding to the highest score to better define the candidate object. 6. A method as claim 4 recites, wherein the generating the one or more anchors at the point of the input image comprises generating a set of anchor boxes; the set anchor boxes having three scales and three aspect ratios. 7. A method as claim 4 recites, wherein the label is positive when the overlap exceeds a threshold level. 8. A system comprising: a processor; and a computer-readable medium including instructions for an object detection and classification network, for execution by the processor, the object detection and classification network comprising: an initial processing module configured to input an image and generate a convolutional feature map; an object proposal module configured to generate a proposal corresponding to a candidate object in the image, and further comprising a translation invariant component configured to: classify an anchor based on overlap with a ground-truth box; and predict a shift and a scale of the anchor; and a proposal classifier module configured to assign a category associated with the candidate object, wherein the object proposal module and the proposal classifier module share at least one convolutional layer. 9. A system as claim 8 recites, wherein the proposal classifier module is further configured to assign a confidence score to the classification. 10. A system as claim 8 recites, wherein the object proposal module is further configured to: generate one or more anchors at a point of the image; determine an overlap of each anchor of the one or more anchors to a ground-truth box; assign a label to each anchor of the one or more anchor based at least in part on the overlap; assign a score to the label based at least in part on the overlap; select an anchor with a highest score; and generate the proposal based at least in part on the highest score. 11. A system comprising: a processor; and a computer-readable medium including instructions for an object detection and classification network, for execution by the processor, the object detection and classification network comprising: an initial processing module configured to input an image and generate a convolutional feature map; an object proposal module configured to generate a proposal corresponding to a candidate object in the image, wherein the object proposal module is further configured to: identify an anchor corresponding to a highest score, the highest score corresponding to a percentage of the overlap; shift the anchor corresponding to the highest score to better define the candidate object; or scale the anchor corresponding to the highest score to better define the candidate object; and a proposal classifier module configured to assign a category associated with the candidate object, wherein the object proposal module and the proposal classifier module share at least one convolutional layer. 12. A system comprising: a processor; a computer-readable medium including instructions for an object detection and classification network, for execution by the processor, the object detection and classification network comprising: an initial processing module configured to input an image and generate a convolutional feature map; an object proposal module configured to generate a proposal corresponding to a candidate object in the image; and a proposal classifier module configured to assign a category associated with the candidate object, wherein the object proposal module and the proposal classifier module share at least one convolutional layer; and a machine learning module configured to: train one or more parameters of the initial processing module and the object proposal module to generate one or more proposals on a training image; and train one or more parameters of the proposal classifier module to assign a category to each of the one or more proposals on the training image. 13. A system as claim 12 recites, wherein the machine learning module is further configured to train the one or more parameters of the initial processing module, the object proposal module, and the proposal classifier module using one or more of: stochastic gradient descent; or back-propagation. 14. A non-transitory computer readable storage medium having instructions stored thereon, the instructions when executed by a computing device cause the computing device to: receive an input image; generate a convolutional feature map; generate one or more anchors at a point of the input image; determine an overlap of individual ones of the one or more anchors to a ground-truth box; assign a label to each anchor of the one or more anchors based at least in part on the overlap; assign a score to the label based at least in part on the overlap; identify, by a neural network, a candidate object in the input image, the candidate object at the point based at least in part on the score; determine, by a proposal classifier sharing an algorithm with the neural network, a category of the candidate object; and assign, by the proposal classifier, a confidence score to the category of the candidate object. 15. A non-transitory computer readable storage medium as claim 14 recites, wherein the neural network is a region proposal network and the proposal classifier is a co
Backpropagation, e.g. using gradient descent · CPC title
Classification techniques · CPC title
Combinations of networks · CPC title
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
Determination of region of interest [ROI] or a volume of interest [VOI] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.