Voice-Based Video Tagging

US2016055885A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016055885-A1
Application numberUS-201414530245-A
CountryUS
Kind codeA1
Filing dateOct 31, 2014
Priority dateJul 23, 2014
Publication dateFeb 25, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video clips selected from among sets of candidate video clips. Best scenes can also be identified by receiving an indication of an event of interest within video from a user during the capture of the video. Metadata patterns representing activities identified within video clips can be identified within other videos, which can subsequently be associated with the identified activities.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for identifying events of interest in captured video, the method comprising: storing an audio pattern at a camera corresponding to a spoken command associated with flagging events of interest within captured video; capturing video with a camera; capturing an audio signal from a user while capturing the video; identifying the stored audio pattern within the captured audio signal; and in response to identifying the stored audio pattern within the captured audio signal, storing an indication of an event of interest in metadata associated with the captured video. 2 . The method of claim 1 , further comprising providing the captured video and the associated metadata to a post-processing system configured to identify a video clip associated with the event of interest within the captured video based on the metadata, the video clip comprising an amount of video occurring before and after a video portion corresponding to the event of interest. 3 . The method of claim 2 , wherein the post-processing system is further configured to receive a request to generate a video summary, and to generate a video summary in response to the request, the video summary comprising a plurality of video clips including the identified video clip associated with the event of interest. 4 . The method of claim 1 , wherein the event of interest is an activity type, wherein the spoken command identifies the activity type, and wherein storing an indication of the event of interest in metadata comprises storing an indication of the activity type in the metadata. 5 . The method of claim 1 , wherein storing the audio pattern corresponding to a spoken command comprises: receiving, from the user, an input configuring the camera into a training mode to learn a spoken command; capturing an audio signal from a user corresponding to the spoken command; identifying an audio pattern within the audio signal; and storing the audio pattern. 6 . The method of claim 5 , wherein capturing an audio signal from a user corresponding to the spoken command comprises receiving a plurality of audio signals corresponding to a threshold number of instances of recitation of the spoken command by the user, and wherein identifying an audio pattern comprises identifying an audio pattern within the plurality of audio signals. 7 . The method of claim 1 , wherein the stored audio pattern is specific to commands spoken by the user. 8 . A system for identifying events of interest in captured video, the system comprising: a non-transitory computer-readable storage medium storing instructions configured to, when executed: store an audio pattern at a camera corresponding to a spoken command associated with flagging events of interest within captured video; capture video with a camera; capture an audio signal from a user while capturing the video; identify the stored audio pattern within the captured audio signal; and in response to identifying the stored audio pattern within the captured audio signal, store an indication of an event of interest in metadata associated with the captured video; and a processor configured to execute the instructions. 9 . The system of claim 8 , the instructions further configured to provide the captured video and the associated metadata to a post-processing system configured to identify a video clip associated with the event of interest within the captured video based on the metadata, the video clip comprising an amount of video occurring before and after a video portion corresponding to the event of interest. 10 . The system of claim 9 , wherein the post-processing system is further configured to receive a request to generate a video summary, and to generate a video summary in response to the request, the video summary comprising a plurality of video clips including the identified video clip associated with the event of interest. 11 . The system of claim 8 , wherein the event of interest is an activity type, wherein the spoken command identifies the activity type, and wherein storing an indication of the event of interest in metadata comprises storing an indication of the activity type in the metadata. 12 . The system of claim 8 , wherein storing the audio pattern corresponding to a spoken command comprises: receiving, from the user, an input configuring the camera into a training mode to learn a spoken command; capturing an audio signal from a user corresponding to the spoken command; identifying an audio pattern within the audio signal; and storing the audio pattern. 13 . The system of claim 12 , wherein capturing an audio signal from a user corresponding to the spoken command comprises receiving a plurality of audio signals corresponding to a threshold number of instances of recitation of the spoken command by the user, and wherein identifying an audio pattern comprises identifying an audio pattern within the plurality of audio signals. 14 . The system of claim 8 , wherein the stored audio pattern is specific to commands spoken by the user. 15 . A non-transitory computer-readable storage medium storing instructions for identifying events of interest in captured video, the instructions for: storing an audio pattern at a camera corresponding to a spoken command associated with flagging events of interest within captured video; capturing video with a camera; capturing an audio signal from a user while capturing the video; identifying the stored audio pattern within the captured audio signal; and in response to identifying the stored audio pattern within the captured audio signal, storing an indication of an event of interest in metadata associated with the captured video. 16 . The computer-readable storage medium of claim 15 , the instructions further for providing the captured video and the associated metadata to a post-processing system configured to identify a video clip associated with the event of interest within the captured video based on the metadata, the video clip comprising an amount of video occurring before and after a video portion corresponding to the event of interest. 17 . The computer-readable storage medium of claim 16 , wherein the post-processing system is further configured to receive a request to generate a video summary, and to generate a video summary in response to the request, the video summary comprising a plurality of video clips including the identified video clip associated with the event of interest. 18 . The computer-readable storage medium of claim 15 , wherein the event of interest is an activity type, wherein the spoken command identifies the activity type, and wherein storing an indication of the event of interest in metadata comprises storing an indication of the activity type in the metadata. 19 . The computer-readable storage medium of claim 15 , wherein storing the audio pattern corresponding to a spoken command comprises: receiving, from the user, an input configuring the camera into a training mode to learn a spoken command; capturing an audio signal from a user corresponding to the spoken command; identifying an audio pattern within the audio signal; and storing the audio pattern. 20 . The computer-readable storage medium of claim 19 , wherein capturing an audio signal from a user corresponding to the spoken command comprises receiving a plurality of audio signals corresponding to a threshold number of instances of recitation of the spoken command by the user, and wherein identifying an audio pattern comprise

Assignees

Inventors

Classifications

  • Creating reference templates; Clustering · CPC title

  • the information being derived from movement of the record carrier, e.g. using tachometer · CPC title

  • Television signal processing therefor · CPC title

  • between a recording apparatus and a television camera · CPC title

  • Electronic editing of digitised analogue information signals, e.g. audio or video signals · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016055885A1 cover?
Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video cl…
Who is the assignee on this patent?
Gopro Inc
What technology area does this patent fall under?
Primary CPC classification G11B27/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 25 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).