Relevance-based image selection

US10614124B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10614124-B2
Application numberUS-201514687116-A
CountryUS
Kind codeB2
Filing dateApr 15, 2015
Priority dateAug 24, 2009
Publication dateApr 7, 2020
Grant dateApr 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system, computer readable storage medium, and computer-implemented method presents video search results responsive to a user keyword query. The video hosting system uses a machine learning process to learn a feature-keyword model associating features of media content from a labeled training dataset with keywords descriptive of their content. The system uses the learned model to provide video search results relevant to a keyword query based on features found in the videos. Furthermore, the system determines and presents one or more thumbnail images representative of the video using the learned model.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of generating search results in response to video queries, comprising: accessing, by a computing system comprising one or more processors, labeled training data including a set of training videos and textual data descriptive of visual content for each of two or more frames of each training video; extracting, by the computing system, a plurality of keywords from the labeled training data: generating, by the computing system, a plurality of features characterizing visual content of the set of training videos; training, by the computing system, a machine-learned model to learn correlations between the plurality of keywords and the plurality of features characterizing the visual content of the set of training videos; generating, by the computing system, a video annotation index based at least in part on the machine-learned model; extracting, by the computing system, one or more query terms from a search query associated with a client device; determining, by the computing system, a result set satisfying the search query, the result set including at least one video that matches the search query based at least in part on the one or more query terms; accessing, by the computing system, the video annotation index including an electronic mapping between two or more frames of the at least one video and two or more keywords of the plurality of keywords, the electronic mapping including for each of the two or more frames two or more keyword scores based at least in part on a machine-learned relationship between one or more features of each of the two or more frames and the two or more keywords; selecting, by the computing system, one or more frames of the at least one video based at least in part on a keyword score of the one or more frames that corresponds to the one or more query terms; and transmitting, by the computing system to the client device, one or more responses to the search query including the one or more frames of the at least one video and a link to the at least one video. 2. The computer-implemented method of claim 1 , wherein the search query is a first search query, the one or more query terms are one or more first query terms, the result set is first result set, the method further comprising: receiving, at the computing system, a second search query associated with a client device; extracting, by the computing system, one or more second query terms from the second search query; determining, by the computing system, a second result set satisfying the second search query, the second result set including the at least one video, the at least one video matching the second search query based at least in part on the one or more second query terms; accessing, by the computing system, the video annotation index; selecting, by the computing system, one or more second frames of the at least one video based at least in part on a keyword score of the one or more second frames that corresponds to the one or more second query terms, the one or more second frames are different than the one or more first frames; and transmitting, by the computing system to the client device, one or more responses to the second search query including the one or more second frames of the at least one video and the link to the at least one video. 3. The computer-implemented method of claim 1 , further comprising: generating at least one feature vector for each of the two or more frames; providing, to the machine-learned model for each of the two or more frames, the at least one feature vector; receiving, as an output of the machine-learned model for each of the two or more frames, the two or more keyword scores indicative of a relationship between said each of the two or more frames and the two or more keywords. 4. The computer-implemented method of claim 3 , wherein receiving, as an output of the machine-learned model for each of the two or more frames, the two or more keyword scores comprises: receiving a vector of keyword scores indicative of a likelihood that said each of the two or more frames is relevant to the two or more keywords. 5. The computer-implemented method of claim 1 , wherein generating the plurality of features characterizing the visual content of the set of training vides comprises: segmenting at least one frame of each training video into a plurality of patches; generating a plurality of feature vectors, each feature vector corresponding to one of the plurality of patches; and applying a clustering algorithm to determine a plurality of representative feature vectors in the set of training videos. 6. The computer-implemented method of claim 5 , wherein generating the video annotation index comprises: storing a mapping between the plurality of keywords and the plurality of representative feature vectors. 7. The computer-implemented method of claim 6 , wherein storing the mapping comprises: generating a feature-keyword matrix, wherein entries in a first dimension of the feature-keyword matrix each correspond to a different one of the plurality of representative feature vectors, and where entries in a second dimension of the feature-keyword matrix each correspond to a different one of the plurality of keywords. 8. The computer-implemented method of claim 7 , wherein generating the feature-keyword matrix comprises: initializing the feature-keyword matrix by populating the entries with initial weights; selecting a positive training media item associated with a first keyword and a negative training media item not associated with a second keyword; extracting features for the positive and negative training media items to obtain a positive feature vector and a negative feature vector; applying a transformation to the positive feature vector using the feature-keyword matrix to obtain a first keyword score for the positive training media item; applying a transformation to the negative feature vector using the feature-keyword matrix to obtain a second keyword score for the negative training media item; determining if the keyword score for the positive media training item is at least a threshold value higher than the keyword score for the negative training media item; and responsive to the keyword score for the positive media training item not being at least a threshold value higher than the keyword score for the negative training media item, adjusting the weights in the feature-keyword matrix. 9. The computer-implemented method of claim 1 , wherein: for a first keyword, a first of the two or more frames of the at least one video includes a first keyword score; and for the first keyword, a second of the two or more frames of the at least one video includes a second keyword score that is different than the first keyword score. 10. A computing system having one or more processors configured to perform operations comprising: accessing labeled training data including a set of training videos and textual data descriptive of visual content for each of two or more frames of each training video; extracting a plurality of keywords from the labeled training data; generating a plurality of features characterizing visual content of the set of training videos; training a machine-learned model to learn correlations between the plurality of keywords and the plurality of features characterizing the visual content of the set of training videos; generating a video annotation index based at least in part on the machine-learned model; extracting one or more query terms from a search query associated with a client device; determining a result set satisfying the search query, the result set including at least one video that matches the search query

Assignees

Inventors

Classifications

  • Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

  • using metadata automatically derived from the content · CPC title

  • using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings · CPC title

  • Presentation of query results · CPC title

  • a collection of video files or sequences · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10614124B2 cover?
A system, computer readable storage medium, and computer-implemented method presents video search results responsive to a user keyword query. The video hosting system uses a machine learning process to learn a feature-keyword model associating features of media content from a labeled training dataset with keywords descriptive of their content. The system uses the learned model to provide video …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/7867. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).