Generic object detection in images

US2016104058A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016104058-A1
Application numberUS-201514617909-A
CountryUS
Kind codeA1
Filing dateFeb 9, 2015
Priority dateOct 9, 2014
Publication dateApr 14, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. Thus, repeated computation of the convolutional features is avoided while accuracy is enhanced.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method to perform object detection in an image, the method comprising: receiving an input image; generating feature maps by one or more filters on a convolutional layer of a neural network processing the input image; spatially pooling responses of each filter at a spatial pyramid pooling (SPP) layer; providing outputs of the SPP layer to a fully-connected layer as fixed dimensional vectors; and training a classifier to detect one or more objects in the input image based on the fixed dimensional vectors received at the fully-connected layer. 2 . The method of claim 1 , wherein generating the feature maps comprises: employing sliding filters at one or more convolutional layers configured to accept inputs of arbitrary sizes and provide outputs that approximate an aspect ratio of the inputs. 3 . The method of claim 1 , wherein spatially pooling the responses of each filter comprises: pooling the responses of each filter in a plurality of spatial bins; and generating a multi-dimensional output vector, wherein a number of dimensions of the output vector is based on a number of the plurality of spatial bins multiplied by a number of filters in a last convolutional layer. 4 . The method of claim 1 , further comprising: applying the SPP on each candidate window of the feature maps to pool a fixed-length representation of each candidate window. 5 . The method of claim 1 , further comprising: resizing the image following feature extraction. 6 . The method of claim 1 , wherein the outputs of the SPP layer are representations of each window such that the classifier is trained for each category of the representations. 7 . The method of claim 1 , wherein training the classifier comprises: employing ground-truth windows to generate positive samples; and identifying negative samples based on an overlap with a positive window below a first predefined threshold. 8 . The method of claim 7 , further comprising: removing a negative sample that overlaps with another negative sample above a second predefined threshold. 9 . The method of claim 1 , further comprising: in a test mode, scoring candidate windows through the classifier. 10 . The method of claim 9 , further comprising: employing non-maximum suppression with a predefined threshold on the scored candidate windows. 11 . A computing device to perform object detection in an image, the computing device comprising: an input module configured to receive an input image through one or more of a wired or wireless communication; a memory configured to store instructions; and a processor coupled to the memory and the input module, the processor executing an image processing application, wherein the image processing application is configured to: generate feature maps by employing one or more sliding filters on a convolutional layer of a neural network processing the input image; spatially pool responses of each filter in a plurality of spatial bins at a spatial pyramid pooling (SPP) layer; provide outputs of the SPP layer to a fully-connected layer as fixed dimensional vectors; and train a classifier to detect one or more objects in the input image based on the fixed dimensional vectors received at the fully-connected layer. 12 . The computing device of claim 11 , wherein the one or more sliding filters are activated by semantic content. 13 . The computing device of claim 11 , wherein the feature maps are generated once from the entire input image at one or more scales. 14 . The computing device of claim 11 , wherein the image processing application is further configured to: resize the image; generate the feature maps for each scale; and combine features for each scale by pooling the features channel-by-channel. 15 . The computing device of claim 11 , wherein the SPP layer comprises a 4-level spatial pyramid of 1×1, 2×2, 3×3, and 6×6 configuration that yields a total of 50 spatial bins. 16 . The computing device of claim 11 , wherein the image processing application is further configured to: fine-tune the fully-connected layer by initializing weights of the fully-connected layer, performing a first training using a first learning rate and performing a second training using a refined second learning rate. 17 . The computing device of claim 11 , wherein the image processing application is further configured to: post-process prediction windows using bounding-box regression, wherein features used for regression are pooled features from the convolution layer. 18 . A computer-readable memory device with instructions stored thereon to perform object detection in an image, the instructions comprising: receiving an input image; generating feature maps by one or more filters on a convolutional layer of a first neural network processing the input image; extracting window-wise features from regions of deep convolutional feature maps; performing a selective search to generate a predefined number of candidate windows per image; spatially pooling responses of candidate windows at a spatial pyramid pooling (SPP) layer; providing outputs of the SPP layer to a fully-connected layer as fixed dimensional vectors; and training a classifier to detect one or more objects in the input image based on the fixed dimensional vectors received at the fully-connected layer. 19 . The computer-readable memory device of claim 18 , wherein the instructions further comprise: resizing the input image such that min (w; h)=s, where w is a width of the image, h is a height of the image, and s represents a predefined scale for the image. 20 . The computer-readable memory device of claim 19 , wherein the instructions further comprise: pre-training the first neural network and a second neural network with different random initializations; scoring candidate windows on a test image through the first neural network and the second neural network; and performing non-maximum suppression on a union of two sets of candidate windows with their respective scores; and selecting a window with higher score from the first neural network or the second neural network for the detection of the object.

Assignees

Inventors

Classifications

  • G06V10/454Primary

    Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

  • Physics · mapped topic

  • G06K9/66Primary

    Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016104058A1 cover?
Neural networks for object detection in images are used with a spatial pyramid pooling (SPP) layer. Using the SPP network structure, a fixed-length representation is generated regardless of image size and scale. The feature maps are computed from the entire image once, and the features are pooled in arbitrary regions (sub-images) to generate fixed-length representations for training the detecto…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/454. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 14 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).