Generating video segments based on video metadata
US-11120490-B1 · Sep 14, 2021 · US
US11893794B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11893794-B2 |
| Application number | US-202217805080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 2, 2022 |
| Priority date | Sep 10, 2020 |
| Publication date | Feb 6, 2024 |
| Grant date | Feb 6, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are directed to segmentation and hierarchical clustering of video. In an example implementation, a video is ingested to generate a multi-level hierarchical segmentation of the video. In some embodiments, the finest level identifies a smallest interaction unit of the video—semantically defined video segments of unequal duration called clip atoms. Clip atom boundaries are detected in various ways. For example, speech boundaries are detected from audio of the video, and scene boundaries are detected from video frames of the video. The detected boundaries are used to define the clip atoms, which are hierarchically clustered to form a multi-level hierarchical representation of the video. In some cases, the hierarchical segmentation identifies a static, pre-computed, hierarchical set of video segments, where each level of the hierarchical segmentation identifies a complete set (i.e., covering the entire range of the video) of disjoint (i.e., non-overlapping) video segments with a corresponding level of granularity.
Opening claim text (preview).
What is claimed is: 1. A method comprising: extracting, from a software usage log associated with a video, event boundaries of log events associated with screen capturing, screencasting, or livestreaming the video; generating a representation of a hierarchical segmentation of a video timeline of the video using the event boundaries extracted from the software usage log; and providing at least one level of the hierarchical segmentation of the video timeline for presentation. 2. The method of claim 1 , the software usage log generated by creative software during the screen capturing or the screencasting of the video. 3. The method of claim 1 , wherein the video is a tutorial for creative software. 4. The method of claim 1 , the software usage log generated by a video game during screencasting of the video game. 5. The method of claim 1 , wherein the event boundaries are extracted from a log of visual events detected from video frames of the video. 6. The method of claim 1 , wherein the software usage log represents interactions between one or more users viewing the video and the video. 7. The method of claim 1 , wherein the event boundaries are extracted from a chat stream of chat messages associated with the livestream of the video. 8. The method of claim 1 , wherein generating the representation of the hierarchical segmentation of the video timeline comprises forming a level of the hierarchical segmentation by computing an optimal segmentation of the video timeline using a cost function that quantifies cut cost for a candidate segmentation based on different types of cut costs for different types of boundaries in the candidate segmentation. 9. One or more non-transitory computer-readable storage media containing instructions which, when executed by one or more processors, cause the one or more processors to perform operations comprising: extracting, from a software usage log associated with a video, event boundaries of log events associated with screen capturing, screencasting, or livestreaming the video; generating a representation of a hierarchical segmentation of a video timeline of the video using the event boundaries extracted from the software usage log; and providing at least one level of the hierarchical segmentation of the video timeline for presentation. 10. The one or more non-transitory computer-readable storage media of claim 9 , the software usage log generated by creative software during the screen capturing or the screencasting of the video. 11. The one or more non-transitory computer-readable storage media of claim 9 , wherein the video is a tutorial for creative software. 12. The one or more non-transitory computer-readable storage media of claim 9 , the software usage log generated by a video game during the screencasting of the video game. 13. The one or more non-transitory computer-readable storage media of claim 9 , wherein the event boundaries are extracted from a log of visual events detected from video frames of the video. 14. The one or more non-transitory computer-readable storage media of claim 9 , wherein the software usage log represents interactions between one or more users viewing the video and the video. 15. The one or more non-transitory computer-readable storage media of claim 9 , wherein the event boundaries are extracted from a chat stream of chat messages associated with the livestream of the video. 16. The one or more non-transitory computer-readable storage media of claim 9 , wherein generating the representation of the hierarchical segmentation of the video timeline comprises forming a level of the hierarchical segmentation by computing an optimal segmentation of the video timeline using a cost function that quantifies cut cost for a candidate segmentation based on different types of cut costs for different types of boundaries in the candidate segmentation. 17. A computing system, comprising: one or more processors; and one or more non-transitory computer-readable storage media containing instructions which, when executed by the one or more processors, cause the one or more processors to perform operations comprising: extracting, from a software usage log associated with a video, event boundaries of log events associated with screen capturing, screencasting, or livestreaming the video; generating a representation of a hierarchical segmentation of a video timeline of the video using the event boundaries extracted from the software usage log; and providing at least one level of the hierarchical segmentation of the video timeline for presentation. 18. The computing system of claim 17 , the software usage log generated by creative software during the screen capturing or the screencasting of the video. 19. The computing system of claim 17 , wherein the event boundaries are extracted from a log of visual events detected from video frames of the video. 20. The computing system of claim 17 , wherein the software usage log represents interactions between one or more users viewing the video and the video.
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.