System for fast, probabilistic skeletal tracking
US-8953844-B2 · Feb 10, 2015 · US
US9875445B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9875445-B2 |
| Application number | US-201514631124-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 25, 2015 |
| Priority date | Feb 25, 2014 |
| Publication date | Jan 23, 2018 |
| Grant date | Jan 23, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Technologies for analyzing temporal components of multimodal data to detect short-term multimodal events, determine relationships between short-term multimodal events, and recognize long-term multimodal events, using a deep learning architecture, are disclosed.
Opening claim text (preview).
The invention claimed is: 1. A multimodal data analyzer comprising instructions embodied in one or more non-transitory machine accessible storage media, the multimodal data analyzer configured to cause a computing system comprising one or more computing devices to: access a set of time-varying instances of multimodal data having at least two different modalities, each instance of the multimodal data having a temporal component; algorithmically learn a feature representation of the temporal component of the multimodal data using a deep learning architecture and a fusion technique which enables a level of fusion of the multimodal data to be learned; and infer at least one of long-term feature representations or temporal dependencies between feature representations of the multimodal data. 2. The multimodal data analyzer of claim 1 , configured to classify the set of multimodal data by applying a temporal discriminative model to the feature representations of the temporal component of the multimodal data having inferred temporal dependencies. 3. The multimodal data analyzer of claim 1 , configured to, using the deep learning architecture, identify short-term temporal features in the multimodal data. 4. The multimodal data analyzer of claim 1 , wherein the multimodal data comprises recorded speech and the multimodal data analyzer is configured to identify an intra-utterance dynamic feature of the recorded speech. 5. The multimodal data analyzer of claim 1 , configured to, using the deep learning architecture, identify a long-term temporal feature in the multimodal data. 6. The multimodal data analyzer of claim 1 , wherein the multimodal data comprises recorded speech and the multimodal data analyzer is configured to identify an inter-utterance dynamic feature in the recorded speech. 7. The multimodal data analyzer of claim 1 , wherein the multimodal data comprises audio and video, and the multimodal data analyzer is configured to (i) identify short-term dynamic features in the audio and video data and (ii) infer a long-term dynamic feature based on a combination of temporally-spaced audio and video short-term dynamic features. 8. The multimodal data analyzer of claim 1 , wherein the temporal deep learning architecture comprises a hybrid model having a generative component and a discriminative component, and wherein the multimodal data analyzer uses output of the generative component as input to the discriminative component. 9. The multimodal data analyzer of claim 1 , wherein the multimodal data analyzer is configured to identify at least two different temporally-spaced events in the multimodal data and infer a correlation between the at least two different temporally-spaced multimodal events. 10. The multimodal data analyzer of claim 1 , configured to algorithmically learn the feature representation of the temporal component of the multimodal data using an unsupervised machine learning technique. 11. The multimodal data analyzer of claim 1 , configured to algorithmically infer missing data both within a modality and across modalities. 12. A method for classifying multimodal data, the multimodal data comprising data having at least two different modalities, the method comprising, with a computing system comprising one or more computing devices: accessing a set of time-varying instances of multimodal data, each instance of the multimodal data having a temporal component; and algorithmically classifying the set of time-varying instances of multimodal data using a discriminative temporal model, the discriminative temporal model trained using a feature representation generated by a deep temporal generative model based on the temporal component of the multimodal data and a fusion technique which enables a level of fusion of the multimodal data to be learned. 13. The method of claim 12 , comprising identifying, within each modality of the multimodal data, a plurality of short-term features having different time scales. 14. The method of claim 13 , comprising, for each modality within the multimodal data, inferring a long-term dynamic feature based on the short-term dynamic features identified within the modality. 15. The method of claim 13 , comprising fusing short-term features across the different modalities of the multimodal data, and inferring a long-term dynamic feature based on the short-term features fused across the different modalities of the multimodal data. 16. A system embodied in one or more computer accessible storage media for algorithmically recognizing a multimodal event in data, the system comprising: a data access module to access a set of time-varying instances of multimodal data, each instance of the multimodal data having a temporal component; a classifier module to classify different instances in the set of time-varying instances of multimodal data as indicative of different short-term events, wherein the classifying includes the use of a discriminative temporal model trained using a feature representation generated by a deep temporal generative model based on the temporal component of the multimodal data and a fusion technique which enables a level of fusion of the multimodal data to be learned; and an event recognizer module to (i) recognize a longer-term multimodal event based on a plurality of multimodal short-term events identified by the classifier module and (ii) generate a semantic label for the recognized multimodal event. 17. The system of claim 16 , wherein the event recognizer module is to use a discriminative temporal model to recognize the longer-term multimodal event. 18. The system of claim 17 , wherein the system is to train the discriminative temporal model using a feature representation generated by the deep temporal generative model. 19. The system of claim 16 , wherein the event recognizer module is to recognize the longer-term multimodal event by correlating a plurality of different short-term multimodal events having different time scales. 20. The system of claim 16 , wherein the fusion technique comprises a neuro-inspired dynamic hybrid model, which enables the level of fusion of the multimodal data to be learned through the use of data driven learning.
Graphical models, e.g. Bayesian networks · CPC title
Probabilistic graphical models, e.g. probabilistic networks · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.