Similarity-based detection of prominent objects using deep CNN pooling layers as features

US9767381B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9767381-B2
Application numberUS-201514861386-A
CountryUS
Kind codeB2
Filing dateSep 22, 2015
Priority dateSep 22, 2015
Publication dateSep 19, 2017
Grant dateSep 19, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method provide object localization in a query image based on a global representation of the image generated with a model derived from a convolutional neural network. Representations of annotated images and a query image are each generated based on activations output by a layer of the model which precedes the fully-connected layers of the neural network. A similarity is computed between the query image representation and each of the annotated image representations to identify a subset of the annotated images having the highest computed similarity. Object location information from at least one of the subset of annotated images is transferred to the query image and information is output, based on the transferred object location information.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for object localization in a query image, comprising: with a processor: for each of a set of annotated images, with a model derived from part of a pre-trained neural network, the model including a sequence of convolutional layers, generating an annotated image representation based on activations output by a layer of the model, the annotated images each being annotated with object location information; for a query image, generating a query image representation based on activations output by the layer of the model; identifying a subset of similar annotated images from the set of annotated images, the identifying comprising computing a similarity between the query image representation and each of the annotated image representations; transferring object location information from at least one of the subset of similar annotated images to the query image; and outputting information based on the transferred object location information. 2. The method of claim 1 , wherein the layer is preceded by at least three convolutional layers in the model. 3. The method of claim 1 , wherein the layer of the model is a max-pooling layer. 4. The method of claim 3 , wherein the max-pooling layer is the max-pooling layer immediately before a first fully-connected layer of the trained neural network. 5. The method of claim 1 , wherein the similarity between the query image representation and each of the annotated image representations is computed as a cosine similarity between the representations. 6. The method of claim 1 , where the similarity between the query image representation and each of the annotated image representations is computed is computed in a feature space by projecting the image representation and each of the annotated image representations with a learned metric. 7. The method of claim 6 , further comprising learning the metric using a set of annotated training images to optimize a loss function. 8. The method of claim 6 , further comprising jointly learning the metric and adapting the convolutional weights of the model by backpropagation. 9. The method of claim 8 , comprising introducing a layer to the model after the last convolutional layer of the model, the layer serving as the metric. 10. The method of claim 1 , wherein the output information comprises at least one of a bounding box for localizing an object in the query image and information extracted from the bounding box. 11. The method of claim 10 , wherein the information extracted includes text extracted from a region of the image bounded by the bounding box. 12. The method of claim 11 , wherein the information extracted comprises a license plate number. 13. The method of claim 1 , wherein the providing of the annotated image representation comprises inputting the pixels of the annotated image into the model and outputting the activations of the layer of the model as a vector. 14. The method of claim 1 , wherein the model comprises a sequence of convolutional layers which each receive as input output activations of a prior convolutional layer or the image in the case of a first convolutional layer. 15. The method of claim 1 , further comprising ranking the annotated images based on the computed similarity and wherein the transferring object location information from at least one of the subset of annotated images to the query image comprises computing a weighted sum of bounding box annotations of the annotated images in the subset, the weights being based on the rankings of the annotated images in the subset. 16. A computer program product comprising a non-transitory recording medium storing instructions, which when executed on a computer, causes the computer to perform the method of claim 1 . 17. A system comprising memory which stores instructions for performing the method of claim 1 and a processor in communication with the memory for executing the instructions. 18. A system for object localization in a query image, comprising: memory which stores a model derived from a trained neural network, the model comprising a sequence of convolutional layers which receive as input output activations of a prior convolutional layer or an image in the case of a first convolutional layer; a representation generator which generates a representation of a query image based on the activations of a selected one of the convolutional layers for the query image and an annotated image representation for each of a set of annotated images based on activations output by the selected layer of the model for the annotated image, the annotated images each being annotated with object location information; a retrieval component which retrieves a subset of similar images from the set of annotated images, based on a similarity between respective representations; a segmentation component which transfers the object location information from at least one of the subset of annotated images to the query image; an output component which outputs information based on the transferred object location information; and a processor which implements the representation generator, retrieval component, segmentation component, and output component. 19. The system of claim 18 , further comprising a metric learning component which jointly learns a metric for projecting the image representations into a new feature space and adapts weights of the convolutional layers of the model by backpropagation. 20. A method for object localization in a query image, comprising: with a processor: adding a new layer to a pre-trained neural network to generate a model comprising a sequence of convolutional layers which each act on the output of a respective previous layer, the new layer being positioned after the last of the convolutional layers; using a set of annotated training images, updating a matrix of weights of the new layer and weights of the convolutional layers of the model by backpropagation to optimize a loss function; for each of a set of annotated images, generating an annotated image representation based on activations output by the new layer of the model, the annotated images each being annotated with object location information; for a query image, generating a query image representation based on activations output by the new layer of the model; identifying a subset of the annotated images comprising computing a similarity between the query image representation and each of the annotated image representations; transferring object location information from at least one of the subset of annotated images to the query image; and outputting information based on the transferred object location information.

Assignees

Inventors

Classifications

  • G06V10/82Primary

    using neural networks · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • Matching criteria, e.g. proximity measures · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9767381B2 cover?
A system and method provide object localization in a query image based on a global representation of the image generated with a model derived from a convolutional neural network. Representations of annotated images and a query image are each generated based on activations output by a layer of the model which precedes the fully-connected layers of the neural network. A similarity is computed bet…
Who is the assignee on this patent?
Xerox Corp
What technology area does this patent fall under?
Primary CPC classification G06V10/82. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 19 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).