Generating video summary

US12010405B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12010405-B2
Application numberUS-202117457506-A
CountryUS
Kind codeB2
Filing dateDec 3, 2021
Priority dateDec 3, 2021
Publication dateJun 11, 2024
Grant dateJun 11, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method includes receiving a viewer request for playing a video summary of a video, wherein the viewer request includes a length of the video summary, generating the video summary of the viewer-requested length comprising a set of frames selected from the video based on audience reviews of the video, and playing a video stream of the video summary.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, by one or more processing units, a viewer request for playing a video summary of a video, wherein the viewer request includes a length of the video summary; and in response to receiving the viewer request: obtaining, by one or more processing units, audience reviews on the video; extracting images from the audience reviews, the images being derived from the video; extracting, by one or more processing units, images features of the images to generate image feature vectors, each of the image feature vectors corresponding to a respective one of the images; grouping, by one or more processing units, the image feature vectors into clusters, the clusters being ranked based on their sizes and each having respective center point image feature vectors; designating, by one or more processing units, one or more images corresponding to respective center point image feature vectors of one or more top-ranked clusters of the clusters as the one or more representative images; determining, by one or more processing units, weights of importance of the one or more representative images based on the sizes of the one or more top-ranked clusters; extracting, by one or more processing units, textual messages from the audience reviews; extracting, by one or more processing units, text features of the textual messages to generate text feature vectors, each of the text feature vectors corresponding to a respective one of the textual messages; grouping, by one or more processing units, the text feature vectors into clusters each having respective center point text feature vectors that represent different topics, the topics each having respective weights of interest measured by sizes of respective clusters of the text feature vectors; associating, by one or more processing units, each of the one or more images with one of the topics to which the representative image is most relevant; adjusting, by one or more processing units, the weights of importance of the one or more representative images by using the weights of interest of respective associated topics; identifying, by one or more processing units, candidate frames of the video that are similar to the one or more representative images, wherein the candidate frames are ranked in order of their confidence levels of similarity to the one or more representative images and weights of importance of the one or more representative images; selecting, by one or more processing units, the set of frames based on the ranked candidate frames; generating, by one or more processing units, the video summary of the viewer-requested length comprising the set of frames; and playing, by one or more processing units, a video stream of the video summary. 2. The method of claim 1 , further comprising: generating, by one or more processing units, the video stream in response to receiving the viewer request; wherein the set of frames selected is based solely on audience reviews of the video. 3. The method of claim 1 , wherein the identifying candidate frames of the video comprises: comparing, by one or more processing units, frames of the video with the one or more representative images to calculate their confidence levels of similarity to respective representative images; and determining, by one or more processing units, the frames that have confidence levels greater than a threshold as the candidate frames, wherein the candidate frames are divided into tiers each associated with respective representative images, the tiers being ranked in order of the weights of importance of their associated representative images, and wherein the candidate frames are further ranked in order of their confidence levels of similarity. 4. The method of claim 3 , wherein the identifying candidate frames of the video comprises: determining, by one or more processing units, frames that are adjacent to the candidate frames by an amount of time, wherein the frames have confidence levels lesser than the threshold; and including the frames as candidate frames. 5. A system comprising: one or more processors; a memory coupled to the one or more processors; and a set of computer program instructions stored in the memory and executed by the one or more processors to implement a method comprising: receiving a viewer request for playing a video summary of a video, wherein the viewer request includes a length of the video summary; and in response to receiving the viewer request: obtaining the audience reviews on the video; extracting images from the audience reviews, the images being derived from the video; extracting image features of the images to generate image feature vectors, each of the image feature vectors corresponding to a respective one of the images; grouping the image feature vectors into clusters, the clusters being ranked based on their sizes and each having respective center point image feature vectors; designating one or more images corresponding to respective center point image feature vectors of one or more top-ranked clusters of the clusters as the one or more representative images; determining weights of importance of the one or more representative images based on the sizes of the one or more top-ranked clusters; extracting textual messages from the audience reviews; extracting text features of the textual messages to generate text feature vectors, each of the text feature vectors corresponding to a respective one of the textual messages; grouping the text feature vectors into clusters each having respective center point text feature vectors that represent different topics, the topics each having respective weights of interest measured by sizes of respective clusters of the text feature vectors; associating each of the one or more images with one of the topics to which the representative image is most relevant; adjusting the weights of importance of the one or more representative images by using the weights of interest of respective associated topics; identifying candidate frames of the video that are similar to the one or more representative images, wherein the candidate frames are ranked in order of their confidence levels of similarity to the one or more representative images and weights of importance of the one or more representative images; selecting the set of frames based on the ranked candidate frames; generating the video summary of the viewer-requested length comprising the set of frames; and playing a video stream of the video summary. 6. The system of claim 5 , the method further comprising: generating, by one or more processing units, the video stream in response to receiving the viewer request; wherein the set of frames selected is based solely on audience reviews of the video. 7. The system of claim 5 , wherein the identifying candidate frames of the video comprises: comparing frames of the video with the one or more representative images to calculate their confidence levels of similarity to respective representative images; and determining the frames that have confidence levels greater than a threshold as the candidate frames, wherein the candidate frames are divided into tiers each associated with respective representative images, the tiers being ranked in order of the weights of importance of their associated representative images, and wherein the candidate frames are further ranked in order of their confidence levels of similarity. 8. The system of claim 7 , wherein the identifying candidate frames of the video comprises: determining, by one or more processing units, frames that are adjacent to the candidate frames by an amount of time, wherein the frames have confidence levels lesser than the threshold; and

Assignees

Inventors

Classifications

  • Detecting features for summarising video content · CPC title

  • Proximity, similarity or dissimilarity measures · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title

  • involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams (arrangements characterised by components specially adapted for monitoring, identification or recognition of audio in broadcast systems H04H60/58) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12010405B2 cover?
A computer-implemented method includes receiving a viewer request for playing a video summary of a video, wherein the viewer request includes a length of the video summary, generating the video summary of the viewer-requested length comprising a set of frames selected from the video based on audience reviews of the video, and playing a video stream of the video summary.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04N21/8549. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jun 11 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).