Searching for Images by Video

US2016358036A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016358036-A1
Application numberUS-201615240838-A
CountryUS
Kind codeA1
Filing dateAug 18, 2016
Priority dateMay 18, 2011
Publication dateDec 8, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques describe submitting a video clip as a query by a user. A process retrieves images and information associated with the images in response to the query. The process decomposes the video clip into a sequence of frames to extract the features in a frame and to quantize the extracted features into descriptive words. The process further tracks the extracted features as points in the frame, a first set of points to correspond to a second set of points in consecutive frames to construct a sequence of points. Then the process identifies the points that satisfy criteria of being stable points and being centrally located in the frame to represent the video clip as a bag of descriptive words for searching for images and information related to the video clip.

First claim

Opening claim text (preview).

1 .- 20 . (canceled) 21 . A system comprising: one or more processors; memory communicatively coupled to the one or more processors; an image-retrieval model module, stored in the memory and communicatively coupled to the one or more processors, configured to: generate a representation of a video clip comprising an object of interest; and retrieve images from a database based at least in part on the representation of the video clip; and a vector space model module, stored in the memory and communicatively coupled to the one or more processors, configured to: analyze the images from the database and the video clip to determine one or more similarities between the images and the video clip; and calculate a similarity score for each image of the images based on the one or more similarities between the images from the database and the video clip to identify candidate search images from the images. 22 . The system of claim 21 , wherein the representation of the video clip comprises one or more descriptive words and each image of the images is associated with at least one descriptive word of the one or more descriptive words. 23 . The system of claim 21 , wherein the similarity scores are based at least in part on the representation of the video clip and element-wise multiplication of vectors, each vector of the vectors representing an inverted image frequency of a descriptive word used to describe the images and the video clip. 24 . The system of claim 21 , further comprising an image-retrieval application module, stored in the memory and communicatively coupled to the one or more processors, configured to: receive the video clip submitted as a query; and extract features from one or more frames of the video clip. 25 . The system of claim 24 , the image-retrieval application module further configured to: construct points of the features in consecutive frames of the one or more frames; identify center points that are located in a center of a frame of the consecutive frames; determine that an amount of the center points is less than a threshold number; and in response to determining that the amount of the center points is less than the threshold number, filtering out the center points. 26 . The system of claim 24 , the image-retrieval application module further configured to: construct points of the features in consecutive frames of the one or more frames; identify center points that are located in a center of a frame of the consecutive frames; determine that an amount of the center points is greater than a threshold number; and in response to determining that the amount of the center points is greater than the threshold number, creating the representation of the object of interest based at least in part on the center points. 27 . The system of claim 26 , the image-retrieval application module further configured to construct the vector space module based at least in part on the representation of the object of interest. 28 . A method comprising: generating, by an image-retrieval model, a representation of a first frame of a video clip comprising an object of interest; retrieving, by the image-retrieval model, images from a database based at least in part on the representation of the first frame of the video clip; comparing, by a vector space model, the images from the database and the representation of the first frame to identify similarities between the images and the representation of the first frame; calculating, by the vector space model, a first similarity score for each image of the images based on the similarities between the images from the database and the representation of the first frame to identify candidate search images from the images. 29 . The method of claim 28 , further comprising: ranking, by an image-retrieval application module, the candidate search images from the images based at least in part on the respective first similarity scores. 30 . The method of claim 29 , further comprising: generating, by the image-retrieval model, a representation of a second frame of the video clip comprising the object of interest; comparing, by the vector space model, the candidate search images and the representation of the second frame to identify second similarities between the candidate search images and the representation of the second frame; calculating, by the vector space model, a second similarity score for each of the candidate search images based on the second similarities; and re-ranking, by the image-retrieval application module, at least one of the candidate search images based at least in part on the second similarity scores. 31 . The method of claim 28 , wherein: the representation of the first frame of the video clip comprises one or more descriptive words; retrieving the images from the database is based at least in part on the one or more descriptive words; and further comprising: calculating gradients of functions of the candidate search images and the first frame of the video clip; combining the first similarity scores with an average of the gradients of the functions; and ranking the candidate search images based at least in part on the combining the first similarity scores with the average of the gradients of the functions. 32 . The method of claim 28 , further comprising: receiving, by an image-retrieval application module, the video clip submitted as a query; and extracting features from consecutive frames of the video clip. 33 . The method of claim 32 , further comprising: constructing points representing the features in the consecutive frames; identifying center points of the points that are located in a center of a frame of the consecutive frames; determining that an amount of the center points is less than a threshold number; and in response to determining that the amount of the center points is less than the threshold number, filtering out the center points. 34 . The method of claim 32 , further comprising: constructing points representing the features in the consecutive frames; identifying center points of the points that are located in a center of a frame of the consecutive frames; determining that an amount of the center points is greater than a threshold number; and in response to determining that the amount of the center points is greater than the threshold number, creating the representation of the object of interest based at least in part on the center points. 35 . The method of claim 28 , wherein the first similarity scores are based at least in part on the representation of the first frame of the video clip and element-wise multiplication of vectors, each vector of the vectors representing an inverted image frequency of a descriptive word used to describe the images and the video clip. 36 . One or more computer-readable storage media storing instructions that, when executed by a processor, perform acts comprising: generating, by an image-retrieval model, a representation of a first frame of a video clip comprising an object of interest; retrieving, by the image-retrieval model, images from a database based at least in part on the representation of the first frame of the video clip; comparing, by a vector space model, the images from the database and the representation of the first frame to identify similarities between the images and the representation of the first frame; calculating, by the vector space model, a first similarity score for each image of the images based on the similarities between the images from the database and the representation of th

Assignees

Inventors

Classifications

  • G06V10/462Primary

    Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title

  • by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title

  • involving foreground-background segmentation · CPC title

  • using low-level visual features of the video content · CPC title

  • using original textual content or text extracted from visual content or transcript of audio data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016358036A1 cover?
Techniques describe submitting a video clip as a query by a user. A process retrieves images and information associated with the images in response to the query. The process decomposes the video clip into a sequence of frames to extract the features in a frame and to quantize the extracted features into descriptive words. The process further tracks the extracted features as points in the frame,…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06V10/462. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 08 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).