Selecting and presenting representative frames for video previews
US-10867183-B2 · Dec 15, 2020 · US
US12014542B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12014542-B2 |
| Application number | US-202017120525-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 14, 2020 |
| Priority date | Sep 8, 2014 |
| Publication date | Jun 18, 2024 |
| Grant date | Jun 18, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method for selecting representative frames for videos is provided. The method includes receiving a video and identifying a set of features for each of the frames of the video. The features including frame-based features and semantic features. The semantic features identifying likelihoods of semantic concepts being present as content in the frames of the video. A set of video segments for the video is subsequently generated. Each video segment includes a chronological subset of frames from the video and each frame is associated with at least one of the semantic features. The method generates a score for each frame of the subset of frames for each video segment based at least on the semantic features, and selecting a representative frame for each video segment based on the scores of the frames in the video segment. The representative frame represents and summarizes the video segment.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for selecting representative frames for videos, the method comprising: receiving a search query from a user of a user device; identifying a plurality of semantic features for one or more frames of a video by determining, using a plurality of semantic classifiers, a likelihood of a semantic concept being depicted in a frame of the video and assigning a label corresponding to the semantic concept to the frame of the video based on the likelihood of the semantic concept being depicted in the frame of the video; selecting a plurality of representative frames of the video, wherein each representative frame is selected based on determining that the assigned label is relevant to the received search query; and causing a player interface to be presented, wherein the video is presented within the player interface, wherein the player interface includes a progress bar that indicates a length of the video, wherein a marker is presented on the progress bar that indicates a time in the video at which a representative frame of the plurality of representative frames of the video occurs, wherein the representative frame is selected from among the plurality of representative frames based on relevance to the received search query, and wherein interaction with the marker causes the representative frame to be presented adjacent to the marker within the progress bar along with a textual description of the semantic concept being depicted in the representative frame. 2. The method of claim 1 , wherein each representative frame of the plurality of representative frames of the video is selected further based on the assigned label corresponding to a user interest and wherein an interest-based storyboard is generated that combines at least a portion of the plurality of representative frames. 3. The method of claim 2 , further comprising determining the user interest based on prior videos viewed by the user. 4. The method of claim 2 , further comprising determining the user interest based on user activity on sites that are different than a video hosting service associated with the video. 5. The method of claim 1 , wherein each representative frame of the plurality of representative frames of the video is selected based on the assigned label corresponding to a search query entered by a user. 6. The method of claim 1 , further comprising: generating a plurality of video segments for the video, wherein each video segment includes a chronological subset of frames from the video, and wherein each frame is associated with at least one of the semantic features; and generating, for each video segment in the plurality of video segments, a score for each frame of the subset of frames of the video segment based at least on the semantic features, wherein each representative frame for each video segment in the plurality of video segments is selected based on the scores for the frames in the video segment, and wherein the representative frame represents and summarizes the video segment. 7. The method of claim 6 , wherein the score comprises a semantic score that is generated by: identifying a plurality of semantic concepts for the video segment containing the frame by comparing each semantic feature generated for the chronological subset of frames included in the video segment to a threshold, each semantic concept of the plurality of semantic concepts having the corresponding semantic feature greater than the threshold; for each semantic concept of the plurality of semantic concepts, determining a frame-level score for each frame of the chronological subset of frames in the video segment by determining an amount the semantic concept being present in the frame compared to a reference value; and determining the semantic score for the frame by aggregating the frame-level scores of the frames in the segment. 8. The method of claim 6 , wherein generating the score for the frame comprises combining semantic concepts and corresponding likelihood in the frame. 9. The method of claim 6 , wherein generating the score for the frame comprises combining a semantic score and an aesthetic score by calculating the semantic score based on the semantic features, calculating the aesthetic score using a set of quality measures, and combining the semantic score and the aesthetic score. 10. The method of claim 1 , further comprising generating a segment table for the video, wherein the segment table stores the representative frames of the video and a plurality of semantic concepts associated with each of the representative frames. 11. A computer-implemented system for selecting representative frames for videos, the system comprising: a hardware processor that is configured to: receive a search query from a user of a user device; identify a plurality of semantic features for one or more frames of a video by determining, using a plurality of semantic classifiers, a likelihood of a semantic concept being depicted in a frame of the video and assigning a label corresponding to the semantic concept to the frame of the video based on the likelihood of the semantic concept being depicted in the frame of the video; select a plurality of representative frames of the video, wherein each representative frame is selected based on determining that the assigned label is relevant to the received search query; and cause a player interface to be presented, wherein the video is presented within the player interface, wherein the player interface includes a progress bar that indicates a length of the video, wherein a marker is presented on the progress bar that indicates a time in the video at which a representative frame of the plurality of representative frames of the video occurs, wherein the representative frame is selected from among the plurality of representative frames based on relevance to the received search query, and wherein interaction with the marker causes the representative frame to be presented adjacent to the marker within the progress bar along with a textual description of the semantic concept being depicted in the representative frame. 12. The system of claim 11 , wherein each representative frame of the plurality of representative frames of the video is selected further based on the assigned label corresponding to a user interest and wherein an interest-based storyboard is generated that combines at least a portion of the plurality of representative frames. 13. The system of claim 12 , wherein the hardware processor is further configured to determine the user interest based on prior videos viewed by the user. 14. The system of claim 12 , wherein the hardware processor is further configured to determine the user interest based on user activity on sites that are different than a video hosting service associated with the video. 15. The system of claim 11 , wherein each representative frame of the plurality of representative frames of the video is selected based on the assigned label corresponding to a search query entered by a user. 16. The system of claim 11 , wherein the hardware processor is further configured to: generate a plurality of video segments for the video, wherein each video segment includes a chronological subset of frames from the video, and wherein each frame is associated with at least one of the semantic features; and generate, for each video segment in the plurality of video segments, a score for each frame of the subset of frames of the video segment based at least on the semantic features, wherein each representative frame for each video segment in the plurality of video segments is s
Recognition assisted with metadata · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Detecting features for summarising video content · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
using audio features · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.