Voice-based video tagging

US10339975B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10339975-B2
Application numberUS-201715626931-A
CountryUS
Kind codeB2
Filing dateJun 19, 2017
Priority dateJul 23, 2014
Publication dateJul 2, 2019
Grant dateJul 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video clips selected from among sets of candidate video clips. Best scenes can also be identified by receiving an indication of an event of interest within video from a user during the capture of the video. Metadata patterns representing activities identified within video clips can be identified within other videos, which can subsequently be associated with the identified activities.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for identifying an event of interest in a video, the method performed by a camera including one or more processors, the method comprising: accessing, by the camera, a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video; matching, by the camera, the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and in response to matching the captured speech pattern to the given stored speech pattern, storing, by the camera, event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern. 2. The method of claim 1 , further comprising: identifying a portion of the video as a video clip associated with the event of interest based on the event of interest information, the video clip comprising a first amount of the video occurring before the event moment and a second amount of the video occurring after the event moment, the first amount and the second storing clip information indicating the association of the video clip with the event of interest and the portion of the video included in the video clip. 3. The method of claim 2 , further comprising: receiving a request to generate a video summary; and generating the video summary in response to the request, the video summary comprising the video clip associated with the event of interest. 4. The method of claim 1 , wherein the event of interest corresponds to an activity type, wherein the given stored speech pattern identifies the activity type, and wherein storing the event of interest information comprises storing an indication of the activity type in metadata associated with the video. 5. The method of claim 1 , wherein the given stored speech pattern is specific to the user. 6. The method of claim 1 , wherein the given stored speech pattern includes a speech pattern of a spoken word. 7. The method of claim 1 , wherein the given stored speech pattern includes a speech pattern of a spoken phrase. 8. A system for identifying an event of interest in a video, the system comprising: one or more processors configured by instructions to: access a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video; match the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and in response a match of the captured speech pattern to the given stored speech pattern, store event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern. 9. The system of claim 8 , wherein the one or more processors are further configured to: identify a portion of the video as a video clip associated with the event of interest based on the event of interest information, the video clip comprising a first amount of the video occurring before the event moment and a second amount of the video occurring after the event moment, the first amount and the second amount being determined based on the matching of the captured speech pattern to the given stored speech pattern; and store clip information indicating the association of the video clip with the event of interest and the portion of the video included in the video clip. 10. The system of claim 9 , wherein the one or more processors are further configured to: receive a request to generate a video summary; and generate the video summary in response to the request, the video summary comprising the video clip associated with the event of interest. 11. The system of claim 8 , wherein the event of interest corresponds to an activity type, wherein the given stored speech pattern identifies the activity type, and wherein the event of interest information is stored such that an indication of the activity type is stored in metadata associated with the video. 12. The system of claim 8 , wherein the given stored speech pattern is specific to the user. 13. The system of claim 8 , wherein the given stored speech pattern includes a speech pattern of a spoken word. 14. The system of claim 8 , wherein the given stored speech pattern includes a speech pattern of a spoken phrase. 15. A non-transitory computer-readable storage medium storing instructions for identifying an event of interest in a video, the instructions, when executed, causing one or more processors to: access a captured speech pattern, the captured speech pattern captured from a user at a moment during capture of the video; match the captured speech pattern to a given stored speech pattern of multiple stored speech patterns, the multiple stored speech patterns corresponding to a command for identifying the event of interest within the video, individual ones of the multiple stored speech patterns stored based on a number of times the individual ones of the multiple stored speech patterns are captured by the camera from a user while the camera is operating in a training mode, wherein the individual ones of the multiple stored speech patterns correspond to an identification of the event of interest as occurring before, during, or after the moment; and in response a match of the captured speech pattern to the given stored speech pattern, store event of interest information associated with the video, the event of interest information identifying an event moment during the capture of the video at which the event of interest occurs, the event moment being determined to occur before, during, or after the moment based on the matching of the captured speech pattern to the given stored speech pattern. 16. The computer-readable storage medium of claim 15 , wherein the instructions, when executed, further cause the one or more processors to: identify a portion of the video as a video clip associated with the event of interest based on the event of interest information, the video clip comprising a first amount of the video occurring before the event moment and a second amount of the video occurring after the event moment, the first amount and the second amount bei

Assignees

Inventors

Classifications

  • involving the multiplexing of an additional signal and the colour video signal · CPC title

  • by using information signals recorded by the same method as the main recording {(G11B27/22 takes precedence)} · CPC title

  • for retrieval · CPC title

  • Television signal processing therefor · CPC title

  • G11B27/10Primary

    Indexing; Addressing; Timing or synchronising; Measuring tape travel · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10339975B2 cover?
Video and corresponding metadata is accessed. Events of interest within the video are identified based on the corresponding metadata, and best scenes are identified based on the identified events of interest. A video summary can be generated including one or more of the identified best scenes. The video summary can be generated using a video summary template with slots corresponding to video cl…
Who is the assignee on this patent?
Gopro Inc
What technology area does this patent fall under?
Primary CPC classification G11B27/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).