Dynamic hybrid models for multimodal analysis
US-2016071024-A1 · Mar 10, 2016 · US
US10445582B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10445582-B2 |
| Application number | US-201615385291-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 20, 2016 |
| Priority date | Dec 20, 2016 |
| Publication date | Oct 15, 2019 |
| Grant date | Oct 15, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of determining a composite action from a video clip, using a conditional random field (CRF), the method includes determining a plurality of features from the video clip, each of the features having a corresponding temporal segment from the video clip. The method may continue by determining, for each of the temporal segments corresponding to one of the features, an initial estimate of an action unit label from a corresponding unary potential function, the corresponding unary potential function having as ordered input the plurality of features from a current temporal segment and at least one other of the temporal segments. The method may further include determining the composite action by jointly optimizing the initial estimate of the action unit labels.
Opening claim text (preview).
The invention claimed is: 1. A method of determining a classification of a composite action including a plurality of action units in a video clip, the method comprising: extracting a plurality of features from the video clip; determining a corresponding feature in the plurality of features for each of temporal segments of the video clip; determining an initial estimate of an action unit for each of the temporal segments using a potential function for each segment modeling dependency between a concatenation of features and a classification of a corresponding action unit by inputting a feature from a current temporal segment and a feature from at least one of preceding temporal segments or subsequent temporal segments as the concatenation of features; aggregating the potential functions into a probability distribution; and determining the classification of the composite action using the probability distribution by jointly inferring the classification of the composite action and classifications of the action units of the temporal segments based on the initial estimate of each action unit for each of the temporal segments. 2. The method according to claim 1 , wherein the plurality of features are semantic features. 3. The method according to claim 1 , wherein the plurality of features are low level features. 4. The method according to claim 1 , further comprising: classifying at least one contextual object in at least one other of the temporal segments preceding the current segment, the at least one contextual object being independent of any action units of interest in the at least one other of the temporal segments preceding the current segment; and determining an action unit of interest in the current segment of the video clip, the action unit of interest being performed with the classified at least one contextual object and the determination of the action unit of interest in the current segment being based on the classification of the at least one contextual object wherein the current segment and the other segment preceding the current segment are disjoint. 5. The method according to claim 1 , wherein the probability distribution is a conditional random field (CRF) probability distribution. 6. The method according to claim 5 , wherein the CRF has a tree structure. 7. The method according to claim 5 , wherein the CRF is in log-linear form. 8. A non-transitory computer readable medium having a computer program recorded on the computer readable medium, the computer program being executable by a computer system to perform a method of determining a classification of a composite action including a plurality of action units in a video clip, the method comprising: extracting a plurality of features from the video clip; determining a corresponding feature in the plurality of features for each of temporal segments of the video clip; determining an initial estimate of an action unit for each of the temporal segments using a potential function for each segment modeling dependency between a concatenation of features and a classification of a corresponding action unit by inputting a feature from a current temporal segment and a feature from at least one of preceding temporal segments or subsequent temporal segments as the concatenation of features; aggregating the potential functions into a probability distribution; and determining the classification of the composite action using the probability distribution by jointly inferring the classification of the composite action and classifications of the action units of the temporal segments based on the initial estimate of each action unit for each of the temporal segments. 9. A computer system, comprising: a processor; a memory having a computer program recorded thereon, the memory being in communication with the processor; the processor executing the computer program to perform a method of determining a classification of a composite action including a plurality of action units in a video clip, the method comprising: extracting a plurality of features from the video clip; determining a corresponding feature in the plurality of features for each of temporal segments of the video dip; determining an initial estimate of an action unit for each of the temporal segments using a potential function for each segment modeling dependency between a concatenation of features and a classification of a corresponding action unit by inputting a feature from a current temporal segment and a feature from at least one of preceding temporal segments or subsequent temporal segments as the concatenation of features; aggregating the potential functions into a probability distribution; and determining the classification of the composite action using the probability distribution by jointly inferring the classification of the composite action and classifications of the action units of the temporal segments based on the initial estimate of each action unit for each of the temporal segments.
Markov-related models; Markov random fields · CPC title
Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title
using classification, e.g. of video objects · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.