Who is the assignee on this patent?

Yang Linjun, Hua Xian-Sheng, Cai Yang, and 1 more

What technology area does this patent fall under?

Primary CPC classification G06V10/462. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Searching for images by video

US9443011B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9443011-B2
Application number	US-201113110708-A
Country	US
Kind code	B2
Filing date	May 18, 2011
Priority date	May 18, 2011
Publication date	Sep 13, 2016
Grant date	Sep 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques describe submitting a video clip as a query by a user. A process retrieves images and information associated with the images in response to the query. The process decomposes the video clip into a sequence of frames to extract the features in a frame and to quantize the extracted features into descriptive words. The process further tracks the extracted features as points in the frame, a first set of points to correspond to a second set of points in consecutive frames to construct a sequence of points. Then the process identifies the points that satisfy criteria of being stable points and being centrally located in the frame to represent the video clip as a bag of descriptive words for searching for images and information related to the video clip.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented at least partially by a processor, the method comprising: receiving a video clip to be decomposed into a frame sequence; extracting scale-invariant feature transform (SIFT) features in a frame to quantize the SIFT features into descriptive words; tracking SIFT points of the extracted SIFT features in the frame, a first set of SIFT points corresponding to a second set of SIFT points in consecutive frames to construct a sequence of points; calculating center-awareness values for each of the SIFT points based at least in part on the location of each of the SIFT points in the frame relative to a center of the frame; computing values for the SIFT points based on a number of points in the sequence of points and the center-awareness values of the SIFT points of the frame; identifying a subset of SIFT points based on the number of points in a tracking sequence and the center-awareness values of the SIFT points; using the subset of SIFT points, representing the video clip as a bag of descriptive words; retrieving candidate images from a database based at least in part on the bag of descriptive words; computing a similarity score of the candidate images versus images in the video clip; calculating gradients of functions of the candidate images and the images in the video clip; combining the similarity scores with an average of the gradients being calculated; and ranking the candidate images based at least in part on the combining. 2. The method of claim 1 , wherein the tracking the SIFT points further comprises: tracking the first set of SIFT points in a previous frame; aligning tracked positions in a subsequent frame to one of the SIFT points in the first set; identifying the SIFT points from the second set that are less than one pixel away from the tracked positions as candidates; and selecting from the candidates in the second set that are similar to a corresponding SIFT point in the first set, as a tracked SIFT point. 3. The method of claim 1 , further comprising identifying the SIFT points located in a center of the frame by: calculating a first distance of a SIFT point to a center of the frame; calculating a numerator by multiplying the first distance by a sum of the sequence of points; calculating a second distance from an origin of the frame to the center of the frame; calculating a denominator by multiplying the second distance by a number of points in the sequence of points; and dividing the numerator by the denominator. 4. The method of claim 1 , further comprising: aggregating an occurrence of each descriptive word in the frame per number of frames; and creating a histogram based at least in part on the aggregating for mining synonyms of the descriptive words. 5. The method of claim 1 , further comprising aggregating an occurrence of each descriptive word in the frames to evaluate the video clip in different frames with variations of scales, viewpoints, and lighting. 6. The method of claim 1 , further comprising: constructing an affinity matrix to identify a count of each descriptive word in the frame, and the number of points in the tracking sequence of points; and generating a contextual histogram from the video clip as a histogram based on the descriptive words identified for the SIFT points in the frames to address a synonymous relationship with the descriptive words. 7. The method of claim 1 , further comprising: extracting the SIFT features from images in a database; building a codebook by correlating the SIFT features from the images with descriptive words; and accessing the codebook for quantizing the SIFT features into descriptive words. 8. One or more computer-readable storage media encoded with instructions that, when executed by a processor, perform acts comprising: receiving a query of a video clip with an object of interest, the video clip to be decomposed into a frame sequence; extracting scale-invariant feature transform (SIFT) features from the object of interest in a frame to quantize the SIFT features into descriptive words; identifying SIFT points of the extracted SIFT features based on the SIFT points being stable in adjacent frames and being centrally located in the frame; creating a representation of the video clip; retrieving images and information associated with the images in response to the query based at least on the representation of the video clip; computing similarity scores of the images versus the representation of the video clip; calculating gradients of functions of the images and the representation of the video clip; combining the similarity scores with an average of the gradients being calculated; and ranking the images based at least in part on the combining. 9. The computer-readable storage media of claim 8 , wherein identifying the qualified SIFT points comprises: constructing a sequence of the SIFT points in consecutive frames; determining the SIFT points that are located in a center of the frame; and identifying the qualified SIFT points based on a number of points in the sequence of the SIFT points and the SIFT points located in the center of the frame to filter out noisy SIFT points. 10. The computer-readable storage media of claim 8 , further comprising aggregating an occurrence of each descriptive word in the frame to evaluate the video clip in different frames with variations of scales, viewpoints, and lighting. 11. The computer-readable storage media of claim 8 , further comprising: identifying a count of each descriptive word in the frame; constructing a sequence of the SIFT points in consecutive frames to determine a number of points in the sequence; and quantizing the SIFT points into the descriptive words and using synonyms to enrich the representation of the video clip. 12. The computer-readable storage media of claim 8 , wherein the retrieving further comprises: extracting SIFT features for an image in a database to describe the SIFT features by descriptive words; mapping the SIFT features to the descriptive words in the database; and indexing the descriptive words being generated based on an inverted file structure to locate the images and the information associated with the images. 13. The computer-readable storage media of claim 8 , further comprising presenting search results of the images and the information associated with the images in a ranked list. 14. A system comprising: a memory; a processor coupled to the memory; an image-retrieval application module operated by the processor and configured to: receive a video clip submitted as a query; extract features from a frame of the video clip; track points of the extracted features to construct a sequence of points of the features in consecutive frames; and determine the points that are located in a center of a frame; and an image-retrieval model module operated by the processor and configured to: construct a representation of the video clip with an object of interest submitted as the query; and retrieve images from a database in response to the representation of the video clip; the image-retrieval application module further configured to: compute a similarity score between the images from the database and the representation of the video clip; calculate gradients of functions of the images from the database and the representation of the video clip; combine the similarly scores with an average of the gradients being calculated; and rank the images from the database based at least in part on the combining. 15. The system of claim 14 , wherein the image-retrieval ap

Assignees

Inventors

Classifications

G06V10/462Primary
Salient features, e.g. scale invariant feature transforms [SIFT] · CPC title
G06F18/2113
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title
G06F16/7847
using low-level visual features of the video content · CPC title
G06F16/7844
using original textual content or text extracted from visual content or transcript of audio data · CPC title
G06T7/194
involving foreground-background segmentation · CPC title

Patent family

Related publications grouped by family.

View patent family 47174943

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9443011B2 cover?: Techniques describe submitting a video clip as a query by a user. A process retrieves images and information associated with the images in response to the query. The process decomposes the video clip into a sequence of frames to extract the features in a frame and to quantize the extracted features into descriptive words. The process further tracks the extracted features as points in the frame,…
Who is the assignee on this patent?: Yang Linjun, Hua Xian-Sheng, Cai Yang, and 1 more
What technology area does this patent fall under?: Primary CPC classification G06V10/462. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).