Method for online learning and recognition of visual behaviors
US-8948499-B1 · Feb 3, 2015 · US
US10198509B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10198509-B2 |
| Application number | US-201615005795-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 25, 2016 |
| Priority date | Apr 23, 2012 |
| Publication date | Feb 5, 2019 |
| Grant date | Feb 5, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A complex video event classification, search and retrieval system can generate a semantic representation of a video or of segments within the video, based on one or more complex events that are depicted in the video, without the need for manual tagging. The system can use the semantic representations to, among other things, provide enhanced video search and retrieval capabilities.
Opening claim text (preview).
The invention claimed is: 1. A video classification system embodied in one or more non-transitory computer readable media and comprising instructions accessible by a computing system to: execute a machine learning-based process to extract a plurality of lower-level features from a video and associate a set of semantic elements with the video based on the extracted lower-level features, wherein each of the semantic elements is descriptive of at least two of a scene, an action, and an object depicted in the video; infer a higher level complex event based on the set of semantic elements associated with the lower-level features; recognize the higher level complex event by referencing a video event model comprising data relating each of a plurality of complex event types to an associated set of semantic elements; and classify the video based on a mathematically-determined strength of association between the higher level complex event and one or more of the semantic elements. 2. The video classification system of claim 1 , accessible by a computing system to determine a relative evidentiary value associated with each of the set of semantic elements evidencing the lower level complex event. 3. The video classification system of claim 1 , wherein the set of semantic elements relates to text included in the video, and the video classification system is accessible by a computing system to include a description of the text-related semantic element in a classification of the video. 4. The video classification system of claim 1 , wherein the set of semantic elements further comprises at least one text translation of a portion of audio of the video and is associated in time with the set of semantic elements. 5. The video classification system of claim 1 , wherein the set of semantic elements further comprises at least one metatag selected from the group consisting of geolocation data, gyroscopic information, time, accelerometer information, and combinations thereof. 6. The video classification system of claim 1 , accessible by a computing system to associate an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to locate a portion of the video corresponding to ones of the set of semantic elements. 7. The video classification system of claim 1 , accessible by a computing system to associate an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to expand or contract the ones of the set of semantic elements to a different level of detail. 8. The video classification system of claim 1 , accessible by a computing system to determine a relative evidentiary value associated with each of the set of semantic elements evidencing the lower level complex event. 9. A method for classifying video embodied in one or more machine readable storage media accessible by a computing system, the method comprising: extracting a plurality of lower-level features from a video and associating a set of semantic elements with the video based on the extracted lower-level features, wherein each of the semantic elements is descriptive of at least two of a scene, an action, and an object depicted in the video; inferring a higher level complex event based on the set of semantic elements associated with the lower-level features; recognizing the higher level complex event by referencing a video event model comprising data relating each of a plurality of complex event types to an associated set of semantic elements; and classifying the video based on a mathematically-determined strength of association between the higher level complex event and one or more of the semantic elements. 10. The method of claim 9 , wherein the set of semantic elements relates to text included in the video, and the video classification system is accessible by a computing system to include a description of the text-related semantic element in a classification of the video. 11. The method of claim 9 , wherein the set of semantic elements further comprises at least one text translation of a portion of audio of the video and is associated in time with the set of semantic elements. 12. The method of claim 9 , wherein the set of semantic elements further comprises at least one metatag selected from the group consisting of geolocation data, gyroscopic information, time, accelerometer information, and combinations thereof. 13. The method of claim 9 , further comprising associating an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to locate a portion of the video corresponding to ones of the set of semantic elements. 14. The method of claim 9 , further comprising associating an interactive hyperlink with ones of the set of semantic elements, wherein the hyperlink may be user-activated to expand or contract the ones of the set of semantic elements to a different level of detail. 15. A video classification system embodied in one or more non-transitory computer readable media and accessible by a computing device to generate a description of a video, by: accessing a set of inputs associated with the video, comprising at least two of: (i) a text translation of an audio track of the video, (ii) text recognized in visual content of the video, and (iii) a tag associated with the video; extracting a plurality of low-level non-text features from the visual content of the video; generating a set of semantic elements associated with the video based on the set of inputs and the low-level non-text features extracted from the visual content of the video, each of the semantic elements descriptive of one or more of a scene, an action, an actor, and an object depicted in the video, wherein at least a portion of the set of semantic elements is, in combination, indicative of a lower level complex event; inferring a higher level complex event as being likely depicted in the video, based on evidence comprising a combination of lower level complex events; recognizing the higher level complex event by referencing a video event model comprising data relating each of a plurality of complex event types to an associated set of semantic elements; generating a human-intelligible classification of the video based on the higher level complex event; and associating the human-intelligible classification with the video. 16. The video classification system of claim 15 , accessible by a computing device to omit from semantic elements of the human-intelligible classification that do not evidence the higher level complex event. 17. The video classification system of claim 15 , accessible by a computing device to present the human-intelligible classification in response to a user-specified request. 18. The video classification assistant of claim 15 , accessible by a computing device to determine a relative evidentiary value associated with each of the first set of semantic elements evidencing the higher level complex event and formulate the human-intelligible classification based on the relative evidentiary values.
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.