System and Method For Processing A Video Stream To Extract Highlights
US-2016012296-A1 · Jan 14, 2016 · US
US9607224B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9607224-B2 |
| Application number | US-201514712071-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 14, 2015 |
| Priority date | May 14, 2015 |
| Publication date | Mar 28, 2017 |
| Grant date | Mar 28, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A solution is provided for temporally segmenting a video based on analysis of entities identified in the video frames of the video. The video is decoded into multiple video frames and multiple video frames are selected for annotation. The annotation process identifies entities present in a sample video frame and each identified entity has a timestamp and confidence score indicating the likelihood that the entity is accurately identified. For each identified entity, a time series comprising of timestamps and corresponding confidence scores is generated and smoothed to reduce annotation noise. One or more segments containing an entity over the length of the video are obtained by detecting boundaries of the segments in the time series of the entity. From the individual temporal segmentation for each identified entity in the video, an overall temporal segmentation for the video is generated, where the overall temporal segmentation reflects the semantics of the video.
Opening claim text (preview).
What is claimed is: 1. A method for temporally segmenting a video, the method comprising: selecting sample video frames from a plurality of decoded video frames of the video; training an annotation model on a corpus of training images with a neural network model; annotating each of the selected sample video frames with the trained annotation model, wherein annotating a selected sample video frame comprises: applying the trained annotation model to each selected sample video frame; identifying one or more entities present in the selected sample video frame based on the application of the trained annotation model, an identified entity of the video representing an object of interest in the selected sample video frame; representing each identified entity by a set of annotation parameters; segmenting the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames, a segment for an entity of the video representing a semantically meaningful spatial-temporal region of the video; and generating an overall temporal segmentation of the video based on the plurality of segments of each entity of the video. 2. The method of claim 1 , wherein the set of annotation parameters for an entity in the selected sample video frame includes a descriptive label describing the semantics of the entity, a portion of the selected sample video frame containing the entity and a confidence score indicating likelihood that the entity is accurately identified. 3. The method of claim 1 , wherein segmenting the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames comprises: for each entity of the video: generating a time series for the entity, the time series comprising a plurality of timestamps of the selected sample video frames containing the entity and corresponding confidence scores of the entity; applying a smoothing function to the generated time series of the entity; and identifying boundaries for each segment containing the entity based on the confidence scores of the smoothed time series of the entity. 4. The method of claim 3 , wherein applying the smoothing function to the generated time series of the entity comprises: applying a moving window to the time series of the entity, the moving window being defined by a size and a step, and the moving window selecting a plurality of confidences scores of timestamps that are within the moving window; and computing an average confidence score of the confidence scores selected by the moving window. 5. The method of claim 3 , wherein identifying boundaries for each segment containing the entity comprises: selecting an onset threshold value for the segment, the onset threshold value indicating the start of the segment; selecting an offset threshold value for the segment, the offset threshold value indicating the end of the segment; comparing the confidence scores of the smoothed time series of the entity with the onset threshold value and the offset threshold value; and identifying the boundaries of the segment based on the comparison of the confidence scores of the smoothed time series of the entity. 6. A non-transitory computer readable storage medium storing executable computer program instructions for temporally segmenting a video, the computer program instructions comprising instructions that when executed cause a computer processor to: select sample video frames from a plurality of decoded video frames of the video; train an annotation model on a corpus of training images with a neural network model; annotate each of the sample video frame with the trained annotation model, wherein to annotate a selected sample video frame comprises: apply the trained annotation model to each selected sample video frame; identify one or more entities present in the selected sample video frame based on the application of the trained annotation model, an identified entity of the video representing an object of interest in the selected sample video frame; represent each identified entity by a set of annotation parameters; segment the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames, a segment for an entity of the video representing a semantically meaningful spatial-temporal region of the video; and generate an overall temporal segmentation of the video based on the plurality of segments of each entity of the video. 7. The computer readable medium of claim 6 , wherein the set of annotation parameters for an entity in the selected sample video frame includes a descriptive label describing the semantics of the entity, a portion of the selected sample video frame containing the entity and a confidence score indicating likelihood that the entity is accurately identified. 8. The computer readable medium of claim 1 , wherein the computer program instructions for segmenting the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames comprise instructions that when executed cause the computer processor to: for each entity of the video: generate a time series for the entity, the time series comprising a plurality of timestamps of the selected sample video frames containing the entity and corresponding confidence scores of the entity; apply a smoothing function to the generated time series of the entity; and identify boundaries for each segment containing the entity based on the confidence scores of the smoothed time series of the entity. 9. The computer readable medium of claim 8 , wherein the computer program instructions for applying the smoothing function to the generated time series of the entity comprise instructions that when executed cause the computer processor to: apply a moving window to the time series of the entity, the moving window being defined by a size and a step, and the moving window selecting a plurality of confidences scores of timestamps that are within the moving window; and compute an average confidence score of the confidence scores selected by the moving window. 10. The computer readable medium of claim 9 , wherein the computer program instructions for identifying boundaries for each segment containing the entity comprise instructions that when executed cause the computer processor to: select an onset threshold value for the segment, the onset threshold value indicating the start of the segment; select an offset threshold value for the segment, the offset threshold value indicating the end of the segment; compare the confidence scores of the smoothed time series of the entity with the onset threshold value and the offset threshold value; and identify the boundaries of the segment based on the comparison of the confidence scores of the smoothed time series of the entity. 11. A computer system for temporally segmenting a video, the system comprising: a computer processor to perform steps, comprising: selecting sample video frames from a plurality of decoded video frames of the video; training an annotation model on a corpus of training images with a neural network model; annotating each of the sample video frame with the trained annotation model, wherein annotating a selected sample video frame comprises: applying the trained annotation model to each selected sample video frame; identifying one or more entities present in the selected sample video frame based on the application of the trained annotation model, an identified entity of the video representing an
Television signal processing therefor · CPC title
using classification, e.g. of video objects · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
based on the proximity to a decision surface, e.g. support vector machines · CPC title
based on distances to training or reference patterns · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.