Image match based video search
US-9881084-B1 · Jan 30, 2018 · US
US12373490B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12373490-B2 |
| Application number | US-202318321225-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 22, 2023 |
| Priority date | Aug 24, 2009 |
| Publication date | Jul 29, 2025 |
| Grant date | Jul 29, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system, computer readable storage medium, and computer-implemented method presents video search results responsive to a user keyword query. The video hosting system uses a machine learning process to learn a feature-keyword model associating features of media content from a labeled training dataset with keywords descriptive of their content. The system uses the learned model to provide video search results relevant to a keyword query based on features found in the videos. Furthermore, the system determines and presents one or more thumbnail images representative of the video using the learned model.
Opening claim text (preview).
The invention claimed is: 1. A computing system, the system comprising: one or more processors; one or more non-transitory computer readable media that collectively store instructions that, when executed by the one or more processors, cause the computing system to perform operations, the operations comprising: generating a searchable video index based at least in part on a machine-learned model, wherein the searchable video index maps frames of video to one or more keywords according to the machine-learned model, wherein generating the searchable video index comprises: generating at least one feature vector for each of two or more frames of each of a plurality of videos associated with the searchable video index; and processing data associated with the two or more frames of each of the plurality of videos with the machine-learned model, wherein processing the data includes inputting the at least one feature vector for each of the two or more frames; storing a mapping between two or more frames of each of a plurality of videos and the one or more keyword representations; playing a selected video using a web-based video player; monitoring a current frame of video during playback of the selected video; determining one or more keywords are associated with the current frame based on a video annotation index, wherein the video annotation index comprises the searchable video index that comprises the mapping between the two or more frames of each of the plurality of videos and one or more keyword representations, wherein the mapping is generated based at least in part on the machine-learned model trained to learn correlations between visual content of individual video frames and keyword representations; determining a media content item of a media content database is associated with the one or more keywords; and providing the media content item for display during playback of the current frame. 2. The system of claim 1 , wherein the one or more features extracted from the plurality of videos is one or more feature vectors. 3. The system of claim 2 , further comprising: extracting a plurality of keyword representations from a labeled training dataset; wherein training the machine-learned model includes training the machine-learned model to learn correlations between the one or more feature vectors and the one or more keyword representations based at least in part of the plurality of keyword representations extracted from the labeled training data. 4. The system of claim 1 , wherein the media content item comprises one or more images. 5. A computer-implemented method for presenting a set of related videos, the method comprising: playing, by a computing system comprising one or more processors, a selected video using a web-based video player; extracting, by the computing system, metadata associated with the selected video, the metadata including one or more keywords descriptive of the selected video; accessing, by the computing system, a searchable video index using the one or more keywords to determine one or more related videos, wherein the searchable video index comprises a searchable video index that comprises a mapping between two or more frames of each of a plurality of videos and one or more keyword representations, wherein the mapping is generated based at least in part on a machine-learned model trained to learn correlations between visual content of individual video frames and keyword representations, wherein accessing, by the computing system, the searchable video index using the one or more keywords to determine one or more related videos comprises: determining a particular frame of one or more related videos having a high keyword association score with the one or more keywords; determining scene boundaries of a scene relevant to the one or more keywords, the scene of the one or more related videos including the frame having the high keyword association score; and selecting the scene as a portion of the one or more related videos to provide; and providing, by the computing system, the one or more related videos for display, each related video represented by a thumbnail image representative of its content. 6. The method of claim 5 , wherein the one or more related videos are determined by: generating a search query comprising the one or more keywords; and searching the searchable video index with the search query. 7. The method of claim 5 , wherein the searchable video index was generated by: sampling frames of each of the plurality of videos in a video database; computing a first feature vector for a first sampled frame of a sampled video representative of content of the first sampled frame; applying the machine-learned model to the first feature vector to generate a keyword association score between the first sampled frame and the selected keyword; and storing the keyword association score in association with the first sampled frame in the searchable video index. 8. The method of claim 5 , wherein the thumbnail image is generated by: receiving a particular related video; selecting a frame from the particular related video as representative of content of the video using a video annotation index that stores keyword association scores between frames of the plurality of videos and keywords associated with the frames of the plurality of videos; and providing the selected frame as the thumbnail image for the particular related video. 9. The method of claim 5 , wherein the thumbnail image is selected based on the one or more keywords. 10. The method of claim 5 , wherein accessing, by the computing system, the searchable video index using the one or more keywords to determine one or more related videos further comprises: ranking the one or more related videos among the plurality of videos in a result set based on the keyword association scores between frames of the one or more related videos in the result set and the one or more keywords. 11. One or more non-transitory computer-readable media that collectively store instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations, the operations comprising: playing a selected video using a web-based video player; monitoring a current frame of video during playback of the selected video; accessing a video annotation index using the current frame of video to determine one or more keywords associated with the current frame, wherein the video annotation index comprises a mapping between two or more frames of each of a plurality of videos and one or more keyword representations, wherein the mapping is generated based at least in part on a machine-learned model trained to learn correlations between visual content of individual video frames and keyword representations, wherein the video annotation index was generated by: receiving a labeled training dataset comprising a set of media items together with one or more training keywords descriptive of content of the media items; extracting features characterizing the content of the media items; training the machine-learned model to learn correlations between the extracted features of the media items and the training keywords descriptive of the content; and generating the video annotation index mapping frames of videos in a video database to keywords based on features of the videos in the video database and the machine-learned model; accessing an advertising database using the one or more keywords to select an advertisement associated with the one or more keywords; and providing the advertisement for display during playback of the current frame. 12. The one or more non-transitory
using original textual content or text extracted from visual content or transcript of audio data · CPC title
using metadata automatically derived from the content · CPC title
a collection of video files or sequences · CPC title
Presentation of query results · CPC title
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.