Video segmentation techniques

US9805270B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9805270-B2
Application numberUS-201615255978-A
CountryUS
Kind codeB2
Filing dateSep 2, 2016
Priority dateDec 19, 2014
Publication dateOct 31, 2017
Grant dateOct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A video segmentation system can be utilized to automate segmentation of digital video content. Features corresponding to visual, audio, and/or textual content of the video can be extracted from frames of the video. The extracted features of adjacent frames are compared according to a similarity measure to determine boundaries of a first set of shots or video segments distinguished by abrupt transitions. The first set of shots is analyzed according to certain heuristics to recognize a second set of shots distinguished by gradual transitions. Key frames can be extracted from the first and second set of shots, and the key frames can be used by the video segmentation system to group the first and second set of shots by scene. Additional processing can be performed to associate metadata, such as names of actors or titles of songs, with the detected scenes.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: determining a feature for a frame of a plurality of frames of a video; analyzing a similarity between the feature and at least one feature associated with adjacent frames to the frame to determine a first shot of the video; determining that the first shot meets a time threshold; determining that a similarity metric between a first frame of the first shot and a second frame of the first shot meets a dissimilarity threshold; determining that a similarity matrix of at least a subset of frames of the first shot corresponds to a dissolve pattern, the subset of frames corresponding to at least one second shot of the video; generating a graph of the video, the graph comprising nodes corresponding to the first shot and the at least one second shot; and determining a grouping of the first shot and the at least one second shot by performing one or more cuts of the graphs. 2. The computer-implemented method of claim 1 , wherein analyzing similarity between the respective features for adjacent frames further includes: determining respective cosine similarity between the respective features for the adjacent frames; and comparing the respective cosine similarity between the respective features for the adjacent frames to a similarity threshold. 3. The computer-implemented method of claim 1 , wherein determining that the similarity matrix of at least the subset of frames of the first shot corresponds to the dissolve pattern further includes: generating the dissolve pattern; sliding the dissolve pattern along a diagonal of the similarity matrix; and matching the dissolve pattern to at least one portion of the diagonal. 4. The computer-implemented method of claim 1 , wherein determining the respective features for each frame further includes: determining a first histogram for the frame; determining a first plurality of histograms for first portions of the frame; and determining a second plurality of histograms for second portions of the frame. 5. The computer-implemented method of claim 1 , wherein determining the grouping of the first shot and the at least one second shot includes: obtaining one or more respective key frames for the first shot and the at least one second shot, wherein the nodes of the graph correspond to the respective key frames. 6. The computer-implemented method of claim 5 , wherein edges of the graph correspond to a respective cost between the nodes, and wherein the respective cost is based on a function of time and visual similarity. 7. The computer-implemented method of claim 1 , further comprising: obtaining one of an audio feature corresponding to the video or a text feature corresponding to the video, wherein the grouping is further based at least in part on one of the audio feature or video feature. 8. The computer-implemented method of claim 1 , further comprising: detecting a face in the grouping; determining an identity of the face; and associating the identity with the grouping. 9. The computer-implemented method of claim 1 , further comprising: detecting one of textual data corresponding to the grouping or music in the grouping, the music associated with a title; and associating one of the textual data or the title with the grouping. 10. The computer-implemented method of claim 1 , further comprising: analyzing visual content of the first shot; and classifying the first shot as one of a dissolve shot, a blank shot, a card credit, a rolling credit, an action shot, or a static shot. 11. A non-transitory computer-readable storage medium comprising instructions that, upon being executed by a processor of a computing device, cause the computing device to: determine a feature for a frame of a plurality of frames of a video; analyze a similarity between the feature and at least one feature associated with adjacent frames to the frame to determine a first shot of the video; determine that the first shot meets a time threshold; determine that a similarity metric between a first frame of the first shot and a second frame of the first shot meets a dissimilarity threshold; determine that a similarity matrix of at least a subset of frames of the first shot corresponds to a dissolve pattern, the subset of frames corresponding to at least one second shot of the video; generate a graph of the video, the graph comprising nodes corresponding to the first shot and the at least one second shot; and determine a grouping of the first shot and the at least one second shot by performing one or more cuts of the graphs. 12. The non-transitory computer-readable storage medium of claim 11 , wherein the instructions, upon being executed, further cause the computing device to: associate metadata with the grouping; and enable a user to navigate to the grouping based on the metadata. 13. The non-transitory computer-readable storage medium of claim 12 , wherein the metadata corresponds to at least one of an identity of an actor appearing in the at least one grouping, title of music playing in the at least one grouping, a representation of an object in the at least one grouping, a location corresponding to the at least one grouping, or textual data corresponding to the at least one grouping. 14. The non-transitory computer-readable storage medium of claim 11 , wherein the instructions, upon being executed to determine that the similarity matrix of at least the subset of frames of the first shot corresponds to the dissolve pattern, further cause the computing device to: generate the dissolve pattern; slide the dissolve pattern along a diagonal of the similarity matrix; and match the dissolve pattern to at least one portion of the diagonal. 15. The non-transitory computer-readable storage medium of claim 11 , wherein the instructions, upon being executed to analyze similarity between the respective features for adjacent frames, further cause the computing device to: determine respective cosine similarity between the respective features for the adjacent frames; and compare the respective cosine similarity between the respective features for the adjacent frames to a similarity threshold. 16. A computing device, comprising: a processor; memory including instructions that, upon being executed by the processor, cause the computing device to: determine a feature for a frame of a plurality of frames of a video; analyze a similarity between the feature and at least one feature associated with adjacent frames to the frame to determine a first shot of the video; determine that the first shot meets a time threshold; determine that a similarity metric between a first frame of the first shot and a second frame of the first shot meets a dissimilarity threshold; determine that a similarity matrix of at least a subset of frames of the first shot corresponds to a dissolve pattern, the subset of frames corresponding to at least one second shot of the video; generate a graph of the video, the graph comprising nodes corresponding to the first shot and the at least one second shot; and determine a grouping of the first shot and the at least one second shot by performing one or more cuts of the graphs. 17. The computing device of claim 16 , wherein the instructions, upon being executed include causing the computing device to: determine respective cosine similarity between the respective features for the adjacent frames; and compare the respective cosine similarity between the respective features for the adjacent frames to a similarity threshold. 18. The

Assignees

Inventors

Classifications

  • based on graphs, e.g. graph cuts or spectral clustering · CPC title

  • G06V20/49Primary

    Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title

  • based on graph theory, e.g. minimum spanning trees [MST] or graph cuts · CPC title

  • Video; Image sequence · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9805270B2 cover?
A video segmentation system can be utilized to automate segmentation of digital video content. Features corresponding to visual, audio, and/or textual content of the video can be extracted from frames of the video. The extracted features of adjacent frames are compared according to a similarity measure to determine boundaries of a first set of shots or video segments distinguished by abrupt tra…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/49. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).