Entity based temporal segmentation of video streams

US9607224B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9607224-B2
Application numberUS-201514712071-A
CountryUS
Kind codeB2
Filing dateMay 14, 2015
Priority dateMay 14, 2015
Publication dateMar 28, 2017
Grant dateMar 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A solution is provided for temporally segmenting a video based on analysis of entities identified in the video frames of the video. The video is decoded into multiple video frames and multiple video frames are selected for annotation. The annotation process identifies entities present in a sample video frame and each identified entity has a timestamp and confidence score indicating the likelihood that the entity is accurately identified. For each identified entity, a time series comprising of timestamps and corresponding confidence scores is generated and smoothed to reduce annotation noise. One or more segments containing an entity over the length of the video are obtained by detecting boundaries of the segments in the time series of the entity. From the individual temporal segmentation for each identified entity in the video, an overall temporal segmentation for the video is generated, where the overall temporal segmentation reflects the semantics of the video.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for temporally segmenting a video, the method comprising: selecting sample video frames from a plurality of decoded video frames of the video; training an annotation model on a corpus of training images with a neural network model; annotating each of the selected sample video frames with the trained annotation model, wherein annotating a selected sample video frame comprises: applying the trained annotation model to each selected sample video frame; identifying one or more entities present in the selected sample video frame based on the application of the trained annotation model, an identified entity of the video representing an object of interest in the selected sample video frame; representing each identified entity by a set of annotation parameters; segmenting the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames, a segment for an entity of the video representing a semantically meaningful spatial-temporal region of the video; and generating an overall temporal segmentation of the video based on the plurality of segments of each entity of the video. 2. The method of claim 1 , wherein the set of annotation parameters for an entity in the selected sample video frame includes a descriptive label describing the semantics of the entity, a portion of the selected sample video frame containing the entity and a confidence score indicating likelihood that the entity is accurately identified. 3. The method of claim 1 , wherein segmenting the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames comprises: for each entity of the video: generating a time series for the entity, the time series comprising a plurality of timestamps of the selected sample video frames containing the entity and corresponding confidence scores of the entity; applying a smoothing function to the generated time series of the entity; and identifying boundaries for each segment containing the entity based on the confidence scores of the smoothed time series of the entity. 4. The method of claim 3 , wherein applying the smoothing function to the generated time series of the entity comprises: applying a moving window to the time series of the entity, the moving window being defined by a size and a step, and the moving window selecting a plurality of confidences scores of timestamps that are within the moving window; and computing an average confidence score of the confidence scores selected by the moving window. 5. The method of claim 3 , wherein identifying boundaries for each segment containing the entity comprises: selecting an onset threshold value for the segment, the onset threshold value indicating the start of the segment; selecting an offset threshold value for the segment, the offset threshold value indicating the end of the segment; comparing the confidence scores of the smoothed time series of the entity with the onset threshold value and the offset threshold value; and identifying the boundaries of the segment based on the comparison of the confidence scores of the smoothed time series of the entity. 6. A non-transitory computer readable storage medium storing executable computer program instructions for temporally segmenting a video, the computer program instructions comprising instructions that when executed cause a computer processor to: select sample video frames from a plurality of decoded video frames of the video; train an annotation model on a corpus of training images with a neural network model; annotate each of the sample video frame with the trained annotation model, wherein to annotate a selected sample video frame comprises: apply the trained annotation model to each selected sample video frame; identify one or more entities present in the selected sample video frame based on the application of the trained annotation model, an identified entity of the video representing an object of interest in the selected sample video frame; represent each identified entity by a set of annotation parameters; segment the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames, a segment for an entity of the video representing a semantically meaningful spatial-temporal region of the video; and generate an overall temporal segmentation of the video based on the plurality of segments of each entity of the video. 7. The computer readable medium of claim 6 , wherein the set of annotation parameters for an entity in the selected sample video frame includes a descriptive label describing the semantics of the entity, a portion of the selected sample video frame containing the entity and a confidence score indicating likelihood that the entity is accurately identified. 8. The computer readable medium of claim 1 , wherein the computer program instructions for segmenting the selected sample video frames into a plurality of segments for each entity of the video based on the annotation of the selected sample video frames comprise instructions that when executed cause the computer processor to: for each entity of the video: generate a time series for the entity, the time series comprising a plurality of timestamps of the selected sample video frames containing the entity and corresponding confidence scores of the entity; apply a smoothing function to the generated time series of the entity; and identify boundaries for each segment containing the entity based on the confidence scores of the smoothed time series of the entity. 9. The computer readable medium of claim 8 , wherein the computer program instructions for applying the smoothing function to the generated time series of the entity comprise instructions that when executed cause the computer processor to: apply a moving window to the time series of the entity, the moving window being defined by a size and a step, and the moving window selecting a plurality of confidences scores of timestamps that are within the moving window; and compute an average confidence score of the confidence scores selected by the moving window. 10. The computer readable medium of claim 9 , wherein the computer program instructions for identifying boundaries for each segment containing the entity comprise instructions that when executed cause the computer processor to: select an onset threshold value for the segment, the onset threshold value indicating the start of the segment; select an offset threshold value for the segment, the offset threshold value indicating the end of the segment; compare the confidence scores of the smoothed time series of the entity with the onset threshold value and the offset threshold value; and identify the boundaries of the segment based on the comparison of the confidence scores of the smoothed time series of the entity. 11. A computer system for temporally segmenting a video, the system comprising: a computer processor to perform steps, comprising: selecting sample video frames from a plurality of decoded video frames of the video; training an annotation model on a corpus of training images with a neural network model; annotating each of the sample video frame with the trained annotation model, wherein annotating a selected sample video frame comprises: applying the trained annotation model to each selected sample video frame; identifying one or more entities present in the selected sample video frame based on the application of the trained annotation model, an identified entity of the video representing an

Assignees

Inventors

Classifications

  • H04N5/91Primary

    Television signal processing therefor · CPC title

  • using classification, e.g. of video objects · CPC title

  • G06V20/49Primary

    Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title

  • based on the proximity to a decision surface, e.g. support vector machines · CPC title

  • based on distances to training or reference patterns · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9607224B2 cover?
A solution is provided for temporally segmenting a video based on analysis of entities identified in the video frames of the video. The video is decoded into multiple video frames and multiple video frames are selected for annotation. The annotation process identifies entities present in a sample video frame and each identified entity has a timestamp and confidence score indicating the likeliho…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification H04N5/91. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).