Technique for controlling virtual image generation system using emotional states of user
US-2018024626-A1 · Jan 25, 2018 · US
US12586409B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12586409-B2 |
| Application number | US-202318101856-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 26, 2023 |
| Priority date | Jan 26, 2022 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for detecting an emotional state of a user includes obtaining a first data stream indicative of facial appearance and gaze direction of the user as the user is viewing a scene, determining, based on the first data stream, facial expression feature information indicative of emotional facial expression of the user, obtaining a second data stream indicative of visual content in a field of view of the user, determining, based on the second data stream, visual feature information indicative of visual content in the scene, determining emotional state information based on analyzing the facial expression feature information determined based on the first data stream and the visual feature information determined based on the second data stream, and performing an operation with respect to the emotional state information, wherein the emotional state information is indicative of the emotional state of the user.
Opening claim text (preview).
What is claimed is: 1 . A method for detecting an emotional state of a user, the method comprising: obtaining, by a processor, a first data stream indicative of facial appearance and gaze direction of the user as the user is viewing a scene; determining, by the processor based on the first data stream, a facial expression feature vector indicative of emotional facial expression of the user as the user is viewing the scene; obtaining, by the processor, a second data stream indicative of visual content in a field of view of the user as the user is viewing the scene; determining, by the processor based on the second data stream, a visual content feature vector indicative of visual content in the scene; fusing, by the processor, the facial expression feature vector with the visual content feature vector to generate a fused feature vector that includes fused features of both the emotional facial expression of the user and the visual content in the scene; analyzing, by the processor, the fused feature vector using a neural network trained to provide a scaling vector, the scaling vector including respective scalars that reflect respective degrees of importance of respective channels in the fused feature vector; determining, by the processor, emotional state information based on analyzing the fused feature vector scaled based on the scaling vector; and performing, by the processor, an operation with respect to the emotional state information, wherein the emotional state information is indicative of the emotional state of the user. 2 . The method of claim 1 , wherein performing the operation with respect to the emotional state information comprises performing one or more of i) inferring, by the processor, further information from the emotional state information, ii) causing, by the processor, one or both of the emotional state information and the further information inferred from the emotional state information to be provided to the user, or iii) storing, by the processor in a memory, one or both of the emotional state information and the further information inferred from the emotional state information for subsequent use. 3 . The method of claim 1 , wherein determining the emotional state information includes: determining, based on the second data stream, semantic information corresponding to the visual content in the scene, identifying, based on the visual content feature vector indicative of the visual content in the scene and the semantic information corresponding to the visual content in the scene, a visual attention region of interest in the scene, and generating a semantic representation summarizing the visual content in the visual attention region of interest in the scene, wherein the semantic representation indicates a cause for the emotional state of the user. 4 . The method of claim 1 , wherein: obtaining the first data stream comprises obtaining one or more images depicting an eye region of a face of the user, and determining the facial expression feature vector includes: extracting eye expression features and eye pupil information from the one or more images depicting the eye region of the face of the user, and generating an eye feature vector that includes the eye expression features concatenated with the eye pupil information. 5 . The method of claim 4 , further comprising: prior to obtaining the second data stream, detecting, by the processor based on the eye feature vector, a non-neutral emotional state of the user, and in response to detecting the non-neutral emotional state of the user, triggering, by the processor, capture of the second data stream to capture the visual content in the field of view of the user. 6 . The method of claim 5 , wherein detecting the non-neutral emotional state of the user comprises classifying the eye feature vector into one of a neutral emotional state of the user and the non-neutral emotional state of the user. 7 . The method of claim 4 , wherein determining the visual content feature vector based on the second data stream includes: identifying, based the second data stream, a plurality of regions of interest in the scene, obtaining respective visual feature vectors corresponding to the plurality of regions of interest in the scene, and selecting a predetermined number of regions of interest that are closest to a gaze point of the user, wherein the gaze point of the user is determined based on the first data stream. 8 . The method of claim 7 , wherein: fusing the facial expression feature vector with the visual content feature vector includes generating a concatenated feature vector including the eye feature vector concatenated with the respective visual feature vectors corresponding to the predetermined number of regions of interest that are closest to the gaze point of the user, analyzing the fused feature vector includes determining, based the concatenated feature vector, the scaling vector comprising importance scalars for respective features of the concatenated feature vector, and generating a weighted concatenated feature vector by channel-wise multiplication between the scaling vector and the concatenated feature vector, and determining the emotional state includes classifying the weighted concatenated feature vector into an emotional state classes among a plurality of predetermined emotional state classes. 9 . The method of claim 8 , wherein determining the emotional state information further includes: determining, based on the second data stream, respective semantic feature vectors corresponding to the regions of interest that are closest to the gaze point of the user, identifying, based on the respective visual feature vectors and the respective semantic feature vectors corresponding to the regions of interest that are closest to the gaze point of the user, a visual attention region of interest that evokes the emotional state of the user, and generating, based on a visual feature vector corresponding to the visual attention region of interest in the scene, a semantic representation summarizing the visual content in the visual attention region of interest in the scene, wherein the semantic representation indicates a cause for the emotional state of the user. 10 . The method of claim 9 , wherein determining the emotional state information further includes determining, based on the scaling vector, an influence score indicating a degree of emotional impact of the visual content on the user, and wherein generating the semantic representation comprises generating the semantic representation when the degree of emotional impact exceeds a predetermined threshold. 11 . A method for detecting an emotional state of a user, the method comprising: obtaining, by a processor, a first data stream indicative of facial appearance and gaze direction of the user as the user is viewing a scene; determining, by the processor based on the first data stream, a facial expression feature vector indicative of emotional facial expression of the user as the user is viewing the scene; obtaining, by the processor, a second data stream indicative of visual content in a field of view of the user as the user is viewing the scene; determining, by the processor based on the second data stream, a visual content feature vector indicative of the visual content in the scene; fusing, by the processor, the facial expression feature vector with the visual content feature vector to generate a fused feature vector that includes fused features of both the emotional facial expression of the user and the visual content in the scene; analyzing, by the processor, the fused feature vector using a neural network trained to provide a sc
with means for monitoring data relating to the user, e.g. head-tracking, eye-tracking · CPC title
Head mounted · CPC title
Eyeglass type (eyeglass details G02C) · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.