Classification, search and retrieval of complex video events

US10198509B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10198509-B2
Application numberUS-201615005795-A
CountryUS
Kind codeB2
Filing dateJan 25, 2016
Priority dateApr 23, 2012
Publication dateFeb 5, 2019
Grant dateFeb 5, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A complex video event classification, search and retrieval system can generate a semantic representation of a video or of segments within the video, based on one or more complex events that are depicted in the video, without the need for manual tagging. The system can use the semantic representations to, among other things, provide enhanced video search and retrieval capabilities.

First claim

Opening claim text (preview).

The invention claimed is: 1. A video classification system embodied in one or more non-transitory computer readable media and comprising instructions accessible by a computing system to: execute a machine learning-based process to extract a plurality of lower-level features from a video and associate a set of semantic elements with the video based on the extracted lower-level features, wherein each of the semantic elements is descriptive of at least two of a scene, an action, and an object depicted in the video; infer a higher level complex event based on the set of semantic elements associated with the lower-level features; recognize the higher level complex event by referencing a video event model comprising data relating each of a plurality of complex event types to an associated set of semantic elements; and classify the video based on a mathematically-determined strength of association between the higher level complex event and one or more of the semantic elements. 2. The video classification system of claim 1 , accessible by a computing system to determine a relative evidentiary value associated with each of the set of semantic elements evidencing the lower level complex event. 3. The video classification system of claim 1 , wherein the set of semantic elements relates to text included in the video, and the video classification system is accessible by a computing system to include a description of the text-related semantic element in a classification of the video. 4. The video classification system of claim 1 , wherein the set of semantic elements further comprises at least one text translation of a portion of audio of the video and is associated in time with the set of semantic elements. 5. The video classification system of claim 1 , wherein the set of semantic elements further comprises at least one metatag selected from the group consisting of geolocation data, gyroscopic information, time, accelerometer information, and combinations thereof. 6. The video classification system of claim 1 , accessible by a computing system to associate an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to locate a portion of the video corresponding to ones of the set of semantic elements. 7. The video classification system of claim 1 , accessible by a computing system to associate an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to expand or contract the ones of the set of semantic elements to a different level of detail. 8. The video classification system of claim 1 , accessible by a computing system to determine a relative evidentiary value associated with each of the set of semantic elements evidencing the lower level complex event. 9. A method for classifying video embodied in one or more machine readable storage media accessible by a computing system, the method comprising: extracting a plurality of lower-level features from a video and associating a set of semantic elements with the video based on the extracted lower-level features, wherein each of the semantic elements is descriptive of at least two of a scene, an action, and an object depicted in the video; inferring a higher level complex event based on the set of semantic elements associated with the lower-level features; recognizing the higher level complex event by referencing a video event model comprising data relating each of a plurality of complex event types to an associated set of semantic elements; and classifying the video based on a mathematically-determined strength of association between the higher level complex event and one or more of the semantic elements. 10. The method of claim 9 , wherein the set of semantic elements relates to text included in the video, and the video classification system is accessible by a computing system to include a description of the text-related semantic element in a classification of the video. 11. The method of claim 9 , wherein the set of semantic elements further comprises at least one text translation of a portion of audio of the video and is associated in time with the set of semantic elements. 12. The method of claim 9 , wherein the set of semantic elements further comprises at least one metatag selected from the group consisting of geolocation data, gyroscopic information, time, accelerometer information, and combinations thereof. 13. The method of claim 9 , further comprising associating an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to locate a portion of the video corresponding to ones of the set of semantic elements. 14. The method of claim 9 , further comprising associating an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to expand or contract the ones of the set of semantic elements to a different level of detail. 15. A video classification system embodied in one or more non-transitory computer readable media and accessible by a computing device to generate a description of a video, by: accessing a set of inputs associated with the video, comprising at least two of: (i) a text translation of an audio track of the video, (ii) text recognized in visual content of the video, and (iii) a tag associated with the video; extracting a plurality of low-level non-text features from the visual content of the video; generating a set of semantic elements associated with the video based on the set of inputs and the low-level non-text features extracted from the visual content of the video, each of the semantic elements descriptive of one or more of a scene, an action, an actor, and an object depicted in the video, wherein at least a portion of the set of semantic elements is, in combination, indicative of a lower level complex event; inferring a higher level complex event as being likely depicted in the video, based on evidence comprising a combination of lower level complex events; recognizing the higher level complex event by referencing a video event model comprising data relating each of a plurality of complex event types to an associated set of semantic elements; generating a human-intelligible classification of the video based on the higher level complex event; and associating the human-intelligible classification with the video. 16. The video classification system of claim 15 , accessible by a computing device to omit from semantic elements of the human-intelligible classification that do not evidence the higher level complex event. 17. The video classification system of claim 15 , accessible by a computing device to present the human-intelligible classification in response to a user-specified request. 18. The video classification assistant of claim 15 , accessible by a computing device to determine a relative evidentiary value associated with each of the first set of semantic elements evidencing the higher level complex event and formulate the human-intelligible classification based on the relative evidentiary values.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

  • Physics · mapped topic

  • G06F16/78Primary

    Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10198509B2 cover?
A complex video event classification, search and retrieval system can generate a semantic representation of a video or of segments within the video, based on one or more complex events that are depicted in the video, without the need for manual tagging. The system can use the semantic representations to, among other things, provide enhanced video search and retrieval capabilities.
Who is the assignee on this patent?
Stanford Res Inst Int
What technology area does this patent fall under?
Primary CPC classification G06F17/30823. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 05 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).