Dynamic hybrid models for multimodal analysis

US9875445B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9875445-B2
Application numberUS-201514631124-A
CountryUS
Kind codeB2
Filing dateFeb 25, 2015
Priority dateFeb 25, 2014
Publication dateJan 23, 2018
Grant dateJan 23, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies for analyzing temporal components of multimodal data to detect short-term multimodal events, determine relationships between short-term multimodal events, and recognize long-term multimodal events, using a deep learning architecture, are disclosed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A multimodal data analyzer comprising instructions embodied in one or more non-transitory machine accessible storage media, the multimodal data analyzer configured to cause a computing system comprising one or more computing devices to: access a set of time-varying instances of multimodal data having at least two different modalities, each instance of the multimodal data having a temporal component; algorithmically learn a feature representation of the temporal component of the multimodal data using a deep learning architecture and a fusion technique which enables a level of fusion of the multimodal data to be learned; and infer at least one of long-term feature representations or temporal dependencies between feature representations of the multimodal data. 2. The multimodal data analyzer of claim 1 , configured to classify the set of multimodal data by applying a temporal discriminative model to the feature representations of the temporal component of the multimodal data having inferred temporal dependencies. 3. The multimodal data analyzer of claim 1 , configured to, using the deep learning architecture, identify short-term temporal features in the multimodal data. 4. The multimodal data analyzer of claim 1 , wherein the multimodal data comprises recorded speech and the multimodal data analyzer is configured to identify an intra-utterance dynamic feature of the recorded speech. 5. The multimodal data analyzer of claim 1 , configured to, using the deep learning architecture, identify a long-term temporal feature in the multimodal data. 6. The multimodal data analyzer of claim 1 , wherein the multimodal data comprises recorded speech and the multimodal data analyzer is configured to identify an inter-utterance dynamic feature in the recorded speech. 7. The multimodal data analyzer of claim 1 , wherein the multimodal data comprises audio and video, and the multimodal data analyzer is configured to (i) identify short-term dynamic features in the audio and video data and (ii) infer a long-term dynamic feature based on a combination of temporally-spaced audio and video short-term dynamic features. 8. The multimodal data analyzer of claim 1 , wherein the temporal deep learning architecture comprises a hybrid model having a generative component and a discriminative component, and wherein the multimodal data analyzer uses output of the generative component as input to the discriminative component. 9. The multimodal data analyzer of claim 1 , wherein the multimodal data analyzer is configured to identify at least two different temporally-spaced events in the multimodal data and infer a correlation between the at least two different temporally-spaced multimodal events. 10. The multimodal data analyzer of claim 1 , configured to algorithmically learn the feature representation of the temporal component of the multimodal data using an unsupervised machine learning technique. 11. The multimodal data analyzer of claim 1 , configured to algorithmically infer missing data both within a modality and across modalities. 12. A method for classifying multimodal data, the multimodal data comprising data having at least two different modalities, the method comprising, with a computing system comprising one or more computing devices: accessing a set of time-varying instances of multimodal data, each instance of the multimodal data having a temporal component; and algorithmically classifying the set of time-varying instances of multimodal data using a discriminative temporal model, the discriminative temporal model trained using a feature representation generated by a deep temporal generative model based on the temporal component of the multimodal data and a fusion technique which enables a level of fusion of the multimodal data to be learned. 13. The method of claim 12 , comprising identifying, within each modality of the multimodal data, a plurality of short-term features having different time scales. 14. The method of claim 13 , comprising, for each modality within the multimodal data, inferring a long-term dynamic feature based on the short-term dynamic features identified within the modality. 15. The method of claim 13 , comprising fusing short-term features across the different modalities of the multimodal data, and inferring a long-term dynamic feature based on the short-term features fused across the different modalities of the multimodal data. 16. A system embodied in one or more computer accessible storage media for algorithmically recognizing a multimodal event in data, the system comprising: a data access module to access a set of time-varying instances of multimodal data, each instance of the multimodal data having a temporal component; a classifier module to classify different instances in the set of time-varying instances of multimodal data as indicative of different short-term events, wherein the classifying includes the use of a discriminative temporal model trained using a feature representation generated by a deep temporal generative model based on the temporal component of the multimodal data and a fusion technique which enables a level of fusion of the multimodal data to be learned; and an event recognizer module to (i) recognize a longer-term multimodal event based on a plurality of multimodal short-term events identified by the classifier module and (ii) generate a semantic label for the recognized multimodal event. 17. The system of claim 16 , wherein the event recognizer module is to use a discriminative temporal model to recognize the longer-term multimodal event. 18. The system of claim 17 , wherein the system is to train the discriminative temporal model using a feature representation generated by the deep temporal generative model. 19. The system of claim 16 , wherein the event recognizer module is to recognize the longer-term multimodal event by correlating a plurality of different short-term multimodal events having different time scales. 20. The system of claim 16 , wherein the fusion technique comprises a neuro-inspired dynamic hybrid model, which enables the level of fusion of the multimodal data to be learned through the use of data driven learning.

Assignees

Inventors

Classifications

  • Graphical models, e.g. Bayesian networks · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

  • Physics · mapped topic

  • G06N99/005Primary

    Physics · mapped topic

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9875445B2 cover?
Technologies for analyzing temporal components of multimodal data to detect short-term multimodal events, determine relationships between short-term multimodal events, and recognize long-term multimodal events, using a deep learning architecture, are disclosed.
Who is the assignee on this patent?
Stanford Res Inst Int
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 23 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).