Identifying representative frames in video content
US-2021390315-A1 · Dec 16, 2021 · US
US12010405B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12010405-B2 |
| Application number | US-202117457506-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 3, 2021 |
| Priority date | Dec 3, 2021 |
| Publication date | Jun 11, 2024 |
| Grant date | Jun 11, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method includes receiving a viewer request for playing a video summary of a video, wherein the viewer request includes a length of the video summary, generating the video summary of the viewer-requested length comprising a set of frames selected from the video based on audience reviews of the video, and playing a video stream of the video summary.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, by one or more processing units, a viewer request for playing a video summary of a video, wherein the viewer request includes a length of the video summary; and in response to receiving the viewer request: obtaining, by one or more processing units, audience reviews on the video; extracting images from the audience reviews, the images being derived from the video; extracting, by one or more processing units, images features of the images to generate image feature vectors, each of the image feature vectors corresponding to a respective one of the images; grouping, by one or more processing units, the image feature vectors into clusters, the clusters being ranked based on their sizes and each having respective center point image feature vectors; designating, by one or more processing units, one or more images corresponding to respective center point image feature vectors of one or more top-ranked clusters of the clusters as the one or more representative images; determining, by one or more processing units, weights of importance of the one or more representative images based on the sizes of the one or more top-ranked clusters; extracting, by one or more processing units, textual messages from the audience reviews; extracting, by one or more processing units, text features of the textual messages to generate text feature vectors, each of the text feature vectors corresponding to a respective one of the textual messages; grouping, by one or more processing units, the text feature vectors into clusters each having respective center point text feature vectors that represent different topics, the topics each having respective weights of interest measured by sizes of respective clusters of the text feature vectors; associating, by one or more processing units, each of the one or more images with one of the topics to which the representative image is most relevant; adjusting, by one or more processing units, the weights of importance of the one or more representative images by using the weights of interest of respective associated topics; identifying, by one or more processing units, candidate frames of the video that are similar to the one or more representative images, wherein the candidate frames are ranked in order of their confidence levels of similarity to the one or more representative images and weights of importance of the one or more representative images; selecting, by one or more processing units, the set of frames based on the ranked candidate frames; generating, by one or more processing units, the video summary of the viewer-requested length comprising the set of frames; and playing, by one or more processing units, a video stream of the video summary. 2. The method of claim 1 , further comprising: generating, by one or more processing units, the video stream in response to receiving the viewer request; wherein the set of frames selected is based solely on audience reviews of the video. 3. The method of claim 1 , wherein the identifying candidate frames of the video comprises: comparing, by one or more processing units, frames of the video with the one or more representative images to calculate their confidence levels of similarity to respective representative images; and determining, by one or more processing units, the frames that have confidence levels greater than a threshold as the candidate frames, wherein the candidate frames are divided into tiers each associated with respective representative images, the tiers being ranked in order of the weights of importance of their associated representative images, and wherein the candidate frames are further ranked in order of their confidence levels of similarity. 4. The method of claim 3 , wherein the identifying candidate frames of the video comprises: determining, by one or more processing units, frames that are adjacent to the candidate frames by an amount of time, wherein the frames have confidence levels lesser than the threshold; and including the frames as candidate frames. 5. A system comprising: one or more processors; a memory coupled to the one or more processors; and a set of computer program instructions stored in the memory and executed by the one or more processors to implement a method comprising: receiving a viewer request for playing a video summary of a video, wherein the viewer request includes a length of the video summary; and in response to receiving the viewer request: obtaining the audience reviews on the video; extracting images from the audience reviews, the images being derived from the video; extracting image features of the images to generate image feature vectors, each of the image feature vectors corresponding to a respective one of the images; grouping the image feature vectors into clusters, the clusters being ranked based on their sizes and each having respective center point image feature vectors; designating one or more images corresponding to respective center point image feature vectors of one or more top-ranked clusters of the clusters as the one or more representative images; determining weights of importance of the one or more representative images based on the sizes of the one or more top-ranked clusters; extracting textual messages from the audience reviews; extracting text features of the textual messages to generate text feature vectors, each of the text feature vectors corresponding to a respective one of the textual messages; grouping the text feature vectors into clusters each having respective center point text feature vectors that represent different topics, the topics each having respective weights of interest measured by sizes of respective clusters of the text feature vectors; associating each of the one or more images with one of the topics to which the representative image is most relevant; adjusting the weights of importance of the one or more representative images by using the weights of interest of respective associated topics; identifying candidate frames of the video that are similar to the one or more representative images, wherein the candidate frames are ranked in order of their confidence levels of similarity to the one or more representative images and weights of importance of the one or more representative images; selecting the set of frames based on the ranked candidate frames; generating the video summary of the viewer-requested length comprising the set of frames; and playing a video stream of the video summary. 6. The system of claim 5 , the method further comprising: generating, by one or more processing units, the video stream in response to receiving the viewer request; wherein the set of frames selected is based solely on audience reviews of the video. 7. The system of claim 5 , wherein the identifying candidate frames of the video comprises: comparing frames of the video with the one or more representative images to calculate their confidence levels of similarity to respective representative images; and determining the frames that have confidence levels greater than a threshold as the candidate frames, wherein the candidate frames are divided into tiers each associated with respective representative images, the tiers being ranked in order of the weights of importance of their associated representative images, and wherein the candidate frames are further ranked in order of their confidence levels of similarity. 8. The system of claim 7 , wherein the identifying candidate frames of the video comprises: determining, by one or more processing units, frames that are adjacent to the candidate frames by an amount of time, wherein the frames have confidence levels lesser than the threshold; and
Detecting features for summarising video content · CPC title
Proximity, similarity or dissimilarity measures · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title
involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams (arrangements characterised by components specially adapted for monitoring, identification or recognition of audio in broadcast systems H04H60/58) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.