Who is the assignee on this patent?

Univ Michigan Regents, Chang Yuhu, Zhao Yingying, and 2 more

What technology area does this patent fall under?

Primary CPC classification G02B27/017. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Detecting emotional state of a user based on facial appearance and visual perception information

US12586409B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12586409-B2
Application number	US-202318101856-A
Country	US
Kind code	B2
Filing date	Jan 26, 2023
Priority date	Jan 26, 2022
Publication date	Mar 24, 2026
Grant date	Mar 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for detecting an emotional state of a user includes obtaining a first data stream indicative of facial appearance and gaze direction of the user as the user is viewing a scene, determining, based on the first data stream, facial expression feature information indicative of emotional facial expression of the user, obtaining a second data stream indicative of visual content in a field of view of the user, determining, based on the second data stream, visual feature information indicative of visual content in the scene, determining emotional state information based on analyzing the facial expression feature information determined based on the first data stream and the visual feature information determined based on the second data stream, and performing an operation with respect to the emotional state information, wherein the emotional state information is indicative of the emotional state of the user.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for detecting an emotional state of a user, the method comprising: obtaining, by a processor, a first data stream indicative of facial appearance and gaze direction of the user as the user is viewing a scene; determining, by the processor based on the first data stream, a facial expression feature vector indicative of emotional facial expression of the user as the user is viewing the scene; obtaining, by the processor, a second data stream indicative of visual content in a field of view of the user as the user is viewing the scene; determining, by the processor based on the second data stream, a visual content feature vector indicative of visual content in the scene; fusing, by the processor, the facial expression feature vector with the visual content feature vector to generate a fused feature vector that includes fused features of both the emotional facial expression of the user and the visual content in the scene; analyzing, by the processor, the fused feature vector using a neural network trained to provide a scaling vector, the scaling vector including respective scalars that reflect respective degrees of importance of respective channels in the fused feature vector; determining, by the processor, emotional state information based on analyzing the fused feature vector scaled based on the scaling vector; and performing, by the processor, an operation with respect to the emotional state information, wherein the emotional state information is indicative of the emotional state of the user. 2 . The method of claim 1 , wherein performing the operation with respect to the emotional state information comprises performing one or more of i) inferring, by the processor, further information from the emotional state information, ii) causing, by the processor, one or both of the emotional state information and the further information inferred from the emotional state information to be provided to the user, or iii) storing, by the processor in a memory, one or both of the emotional state information and the further information inferred from the emotional state information for subsequent use. 3 . The method of claim 1 , wherein determining the emotional state information includes: determining, based on the second data stream, semantic information corresponding to the visual content in the scene, identifying, based on the visual content feature vector indicative of the visual content in the scene and the semantic information corresponding to the visual content in the scene, a visual attention region of interest in the scene, and generating a semantic representation summarizing the visual content in the visual attention region of interest in the scene, wherein the semantic representation indicates a cause for the emotional state of the user. 4 . The method of claim 1 , wherein: obtaining the first data stream comprises obtaining one or more images depicting an eye region of a face of the user, and determining the facial expression feature vector includes: extracting eye expression features and eye pupil information from the one or more images depicting the eye region of the face of the user, and generating an eye feature vector that includes the eye expression features concatenated with the eye pupil information. 5 . The method of claim 4 , further comprising: prior to obtaining the second data stream, detecting, by the processor based on the eye feature vector, a non-neutral emotional state of the user, and in response to detecting the non-neutral emotional state of the user, triggering, by the processor, capture of the second data stream to capture the visual content in the field of view of the user. 6 . The method of claim 5 , wherein detecting the non-neutral emotional state of the user comprises classifying the eye feature vector into one of a neutral emotional state of the user and the non-neutral emotional state of the user. 7 . The method of claim 4 , wherein determining the visual content feature vector based on the second data stream includes: identifying, based the second data stream, a plurality of regions of interest in the scene, obtaining respective visual feature vectors corresponding to the plurality of regions of interest in the scene, and selecting a predetermined number of regions of interest that are closest to a gaze point of the user, wherein the gaze point of the user is determined based on the first data stream. 8 . The method of claim 7 , wherein: fusing the facial expression feature vector with the visual content feature vector includes generating a concatenated feature vector including the eye feature vector concatenated with the respective visual feature vectors corresponding to the predetermined number of regions of interest that are closest to the gaze point of the user, analyzing the fused feature vector includes determining, based the concatenated feature vector, the scaling vector comprising importance scalars for respective features of the concatenated feature vector, and generating a weighted concatenated feature vector by channel-wise multiplication between the scaling vector and the concatenated feature vector, and determining the emotional state includes classifying the weighted concatenated feature vector into an emotional state classes among a plurality of predetermined emotional state classes. 9 . The method of claim 8 , wherein determining the emotional state information further includes: determining, based on the second data stream, respective semantic feature vectors corresponding to the regions of interest that are closest to the gaze point of the user, identifying, based on the respective visual feature vectors and the respective semantic feature vectors corresponding to the regions of interest that are closest to the gaze point of the user, a visual attention region of interest that evokes the emotional state of the user, and generating, based on a visual feature vector corresponding to the visual attention region of interest in the scene, a semantic representation summarizing the visual content in the visual attention region of interest in the scene, wherein the semantic representation indicates a cause for the emotional state of the user. 10 . The method of claim 9 , wherein determining the emotional state information further includes determining, based on the scaling vector, an influence score indicating a degree of emotional impact of the visual content on the user, and wherein generating the semantic representation comprises generating the semantic representation when the degree of emotional impact exceeds a predetermined threshold. 11 . A method for detecting an emotional state of a user, the method comprising: obtaining, by a processor, a first data stream indicative of facial appearance and gaze direction of the user as the user is viewing a scene; determining, by the processor based on the first data stream, a facial expression feature vector indicative of emotional facial expression of the user as the user is viewing the scene; obtaining, by the processor, a second data stream indicative of visual content in a field of view of the user as the user is viewing the scene; determining, by the processor based on the second data stream, a visual content feature vector indicative of the visual content in the scene; fusing, by the processor, the facial expression feature vector with the visual content feature vector to generate a fused feature vector that includes fused features of both the emotional facial expression of the user and the visual content in the scene; analyzing, by the processor, the fused feature vector using a neural network trained to provide a sc

Assignees

Inventors

Classifications

G02B27/0093
with means for monitoring data relating to the user, e.g. head-tracking, eye-tracking · CPC title
G02B27/017Primary
Head mounted · CPC title
G02B2027/0178
Eyeglass type (eyeglass details G02C) · CPC title
G06V20/41
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
G06F3/013
Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title

Patent family

Related publications grouped by family.

View patent family 87314321

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12586409B2 cover?: A method for detecting an emotional state of a user includes obtaining a first data stream indicative of facial appearance and gaze direction of the user as the user is viewing a scene, determining, based on the first data stream, facial expression feature information indicative of emotional facial expression of the user, obtaining a second data stream indicative of visual content in a field of…
Who is the assignee on this patent?: Univ Michigan Regents, Chang Yuhu, Zhao Yingying, and 2 more
What technology area does this patent fall under?: Primary CPC classification G02B27/017. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Technique for controlling virtual image generation system using emotional states of user

Emotional engagement detector

Systems and methods for generating media asset representations based on user emotional responses

Frequently asked questions