Spatial pyramid pooling networks for image processing

US9542621B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9542621-B2
Application numberUS-201514617936-A
CountryUS
Kind codeB2
Filing dateFeb 10, 2015
Priority dateOct 9, 2014
Publication dateJan 10, 2017
Grant dateJan 10, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP layer based system may pool features extracted at variable scales due to the flexibility of input scales making it possible to generate a full-image representation for testing. Moreover, SPP networks may enable feeding of images with varying sizes or scales during training, which may increase scale-invariance and reduce the risk of over-fitting.

First claim

Opening claim text (preview).

What is claimed is: 1. A method to perform image processing, the method comprising: receiving an input image; generating feature maps by one or more filters on one or more convolutional layers of a neural network; spatially pooling responses of each filter of a top convolutional layer at a spatial pyramid pooling (SPP) network following the top convolutional layer, wherein the SPP network comprises one or more layers; and providing outputs of a top SPP network layer to a fully-connected layer as fixed dimensional vectors. 2. The method of claim 1 , further comprising: employing an output of the fully-connected layer for one or more of: training a classifier, scene reconstruction, event detection, video tracking, object recognition, image indexing, and motion estimation. 3. The method of claim 1 , wherein spatially pooling responses of each filter of the top convolutional layer at the SPP network comprises: pooling responses of each filter in a plurality of spatial bins of the SPP network. 4. The method of claim 3 , wherein providing outputs of the top SPP network layer to the fully-connected layer comprises: providing the outputs of the top SPP network layer as kM-dimensional vectors, where M denotes a number of the spatial bins in the SPP network and k denotes a number of filters at the top convolutional layer. 5. The method of claim 1 , further comprising: resizing the input image to fit a window size of the SPP network. 6. The method of claim 1 , further comprising: training the neural network using back-propagation. 7. The method of claim 1 , further comprising: pre-computing a number of spatial bins of the SPP network based on a size of the input image. 8. The method of claim 7 , further comprising: for an image size of a×a and an SPP network layer that includes n×n bins, implementing the SPP network layer as a sliding window pooling layer, where a window size is defined by win=┌a/n┐ and a stride is defined by str=└a/n┘ with ┌.┐ and └.┘ denoting ceiling and floor operations. 9. The method of claim 1 , further comprising: concatenating outputs of the SPP network layers at the fully-connected layer. 10. The method of claim 1 , wherein spatially pooling responses of each filter of the top convolutional layer at the SPP network comprises: employing maximum pooling on responses of the filters of the top convolutional layer. 11. A computing device to perform image processing, the computing device comprising: an input module configured to receive an input image through one or more of a wired or wireless communication; a memory configured to store instructions; and a processor coupled to the memory and the input module, the processor executing an image processing application, wherein the image processing application is configured to: receive an input image; generate feature maps by one or more filters on one or more convolutional layers of a neural network; spatially pool responses of each filter of a top convolutional layer in a plurality of spatial bins at a spatial pyramid pooling (SPP) network following the top convolutional layer, wherein the SPP network comprises one or more layers; and provide outputs of a top SPP network layer to a fully-connected layer as fixed dimensional vectors. 12. The computing device of claim 11 , wherein the feature maps are generated once from the entire input image at one or more scales. 13. The computing device of claim 11 , wherein the image processing application is further configured to: employ two or more fixed-size neural networks with respective SPP networks to process images of two or more sizes. 14. The computing device of claim 13 , wherein the outputs of top SPP network layers of the two or more fixed-size neural networks are configured to have a same fixed length. 15. The computing device of claim 13 , wherein the image processing application is further configured to: train a first full epoch on a first one of the two or more fixed-size neural networks; and train a second full epoch on a second one of the two or more fixed-size neural networks. 16. The computing device of claim 15 , wherein the image processing application is further configured to: copy weights of the first one of the two or more fixed-size neural networks to the second one of the two or more fixed-size neural networks prior to training the second epoch on the second one of the two or more fixed-size neural networks. 17. The computing device of claim 15 , wherein the image processing application is further configured to: perform the training on different neural network in an iterative manner. 18. A computer-readable memory device with instructions stored thereon to perform image processing, the instructions comprising: receiving an input image; generating feature maps by one or more filters on one or more convolutional layers of a neural network; spatially pooling responses of each filter of a top convolutional layer in a plurality of spatial bins of a spatial pyramid pooling (SPP) network following the top convolutional layer, wherein the SPP network comprises one or more layers; providing outputs of a top SPP network layer to a fully-connected layer as fixed dimensional vectors; and training a classifier to tag the input image based on the fixed dimensional vectors received at the fully-connected layer. 19. The computer-readable memory device of claim 18 , wherein the instructions further comprise: resizing the input image such that min (w; h)=s, where w is a width of the image, h is a height of the image, and s represents a predefined scale for the image. 20. The computer-readable memory device of claim 18 , wherein the instructions further comprise: training different full epochs on different fixed-size neural networks by copying weights of a first fixed-size neural network to subsequent fixed-size neural networks in an iterative manner.

Assignees

Inventors

Classifications

  • Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

  • G06V10/454Primary

    Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9542621B2 cover?
Spatial pyramid pooling (SPP) layers are combined with convolutional layers and partition an input image into divisions from finer to coarser levels, and aggregate local features in the divisions. A fixed-length output may be generated by the SPP layer(s) regardless of the input size. The multi-level spatial bins used by the SPP layer(s) may provide robustness to object deformations. An SPP lay…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/454. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 10 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).