Image retrieval with deep local feature descriptors and attention-based keypoint descriptors

US10402448B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10402448-B2
Application numberUS-201715635387-A
CountryUS
Kind codeB2
Filing dateJun 28, 2017
Priority dateJun 28, 2017
Publication dateSep 3, 2019
Grant dateSep 3, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods of the present disclosure can use machine-learned image descriptor models for image retrieval applications and other applications. A trained image descriptor model can be used to analyze a plurality of database images to create a large-scale index of keypoint descriptors associated with the database images. An image retrieval application can provide a query image as input to the trained image descriptor model, resulting in receipt of a set of keypoint descriptors associated with the query image. Keypoint descriptors associated with the query image can be analyzed relative to the index to determine matching descriptors (e.g., by implementing a nearest neighbor search). Matching descriptors can then be geometrically verified and used to identify one or more matching images from the plurality of database images to retrieve and provide as output (e.g., by providing for display) within the image retrieval application.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of image retrieval, comprising: receiving, by a computing system comprising one or more computing devices, a query image; determining, by the computing system, a plurality of local feature descriptors from the query image; determining, by the computing system, an attention score for each local feature descriptor; determining, by the computing system, a set of keypoint descriptors for the query image based at least in part on the attention scores, the set of keypoint descriptors corresponding to a subset of the local feature descriptors; reducing, by the computing system, a spatial dimensionality of the set of keypoint descriptors for the query image; and retrieving, by the computing system, one or more images corresponding to the query image, based at least in part on the set of keypoint descriptors for the query image. 2. The computer-implemented method of image retrieval of claim 1 , wherein the set of keypoint descriptors comprises a predetermined number of local feature descriptors having the highest attention scores for the query image. 3. The computer-implemented method of image retrieval of claim 1 , further comprising: receiving, by the computing system, a plurality of database images; determining, by the computing system, a plurality of local feature descriptors for each database image; determining, by the computing system, an attention score for the local feature descriptors associated with each database image; and determining, by the computing system, a set of keypoint descriptors for each database image based at least in part on the attention scores, the set of keypoint descriptors corresponding to a subset of the local feature descriptors for that database image; and wherein retrieving, by the computing system, one or more images corresponding to the query image comprises retrieving, by the computing system, one or more images from the plurality of database images based at least in part on the set of keypoint descriptors for the query image and the set of keypoint descriptors for each database image. 4. The computer-implemented method of image retrieval of claim 3 , further comprising determining a set of matching features by comparing the keypoint descriptors associated with the query image with the keypoint descriptors associated with the plurality of database images, and wherein the set of matching features is used to retrieve the one or more matching images from the plurality of database images. 5. The computer-implemented method of image retrieval of claim 4 , wherein determining a set of matching features comprises implementing a nearest neighbor search among keypoint descriptors associated with the query image and keypoint descriptors associated with the plurality of database images. 6. The computer-implemented method of image retrieval of claim 4 , further comprising performing, by the computing system, geometric verification to evaluate the set of matching features across the query image and the one or more matching images. 7. The computer-implemented method of image retrieval of claim 1 , further comprising: constructing, by the computing system, an image pyramid based at least in part on the query image, the image pyramid comprising a plurality of image levels; and inputting each of the plurality of image levels into the machine-learned image descriptor model, independently. 8. One or more tangible, non-transitory computer-readable media storing computer-readable instructions that when executed by one or more processors cause the one or more processors to perform operations, the operations comprising: obtaining data descriptive of a machine-learned image descriptor model, wherein the machine-learned image descriptor model has been trained to receive one or more input images and, in response to receipt of the one or more input images, determine one or more local feature descriptors in the one or more input images, determine an attention score for each of the one or more local feature descriptors, and provide a set of keypoint descriptors based at least in part on the attention score for each of the one or more local feature descriptors, each keypoint descriptor describing a selected local feature determined from the one or more input images such that the set of keypoint descriptors corresponds to a subset of the local feature descriptors; obtaining a query image; inputting the query image into the machine-learned image descriptor model; receiving, as an output of the machine-learned image descriptor model, a set of keypoint descriptors, each keypoint descriptor describing a selected local feature determined from the query image and selected based on a respective attention score generated for the selected local feature by the machine-learned image descriptor model; and providing the set of keypoint descriptors as to an image processing application. 9. The one or more tangible, non-transitory computer-readable media of claim 8 , wherein the machine-learned image descriptor model has been trained based on a set of training data that includes a first portion of training data corresponding to a plurality of training images and a second portion of training data corresponding to image-level labels associated with the plurality of training images. 10. The one or more tangible, non-transitory computer-readable media of claim 9 , wherein the image-level labels included within the second portion of training data comprise one or more of a visual feature label and a geographic position label. 11. The one or more tangible, non-transitory computer-readable media of claim 10 , wherein one or more of the plurality of training images do not contain a visual feature. 12. The one or more tangible, non-transitory computer-readable media of claim 8 , wherein the machine-learned image descriptor model comprises a convolutional neural network. 13. The one or more tangible, non-transitory computer-readable media of claim 8 , wherein the machine-learned image descriptor model has been trained based on a first training process to learn determination of the one or more local feature descriptors and a second training process to learn determination of the attention score for each of the one or more local feature descriptors given the determined local feature descriptors. 14. The one or more tangible, non-transitory computer-readable media of claim 13 , wherein the machine-learned image descriptor model has been trained based on a set of training data that includes a plurality of training images, and wherein the plurality of training images are randomly resealed during the second training process. 15. The one or more tangible, non-transitory computer-readable media of claim 8 , wherein the machine-learned image descriptor model comprises a plurality of shared layers that are used at least in part for both determining the one or more local feature descriptors and for determining an attention score for each of the one or more local feature descriptors. 16. The one or more tangible, non-transitory computer-readable media of claim 8 , the operations further comprising: obtaining a plurality of database images; inputting the plurality of database images into the machine-learned image descriptor model; receiving, as an output of the machine-learned image descriptor model, a set of keypoint descriptors, each keypoint descriptor describing a selected local feature identified from the plurality of database images; determining a set of matching features by comparing the keypoint descriptors associated with the query image with the keypoint descri

Assignees

Inventors

Classifications

  • G06V10/462Primary

    Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title

  • using shape and object relationship · CPC title

  • Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title

  • Matching configurations of points or features · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10402448B2 cover?
Systems and methods of the present disclosure can use machine-learned image descriptor models for image retrieval applications and other applications. A trained image descriptor model can be used to analyze a plurality of database images to create a large-scale index of keypoint descriptors associated with the database images. An image retrieval application can provide a query image as input to…
Who is the assignee on this patent?
Google Inc, Google Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/462. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 03 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).