What technology area does this patent fall under?

Primary CPC classification G10L15/1815. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Utilizing machine learning models to generate automated empathetic conversations

US11854540B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11854540-B2
Application number	US-202117301489-A
Country	US
Kind code	B2
Filing date	Apr 5, 2021
Priority date	Jan 8, 2021
Publication date	Dec 26, 2023
Grant date	Dec 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device may receive text data, audio data, and video data associated with a user, and may process the received data, with a first model, to determine a stress level of the user. The device may process the received data, with second models, to determine depression levels of the user, and may combine the depression levels to identify an overall depression level. The device may process the received data, with a third model, to determine a continuous affect prediction, and may process the received data, with a fourth model, to determine an emotion of the user. The device may process the received data, with a fifth model, to determine a response to the user, and may utilize a sixth model to determine a context for the response. The device may utilize seventh models to generate contextual conversation data, and may perform actions based on the contextual conversational data.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, by a device and from a user device, text data identifying text input by a user of the user device, audio data identifying audio associated with the user, and video data identifying a video associated with the user; processing, by the device, the text data, the audio data, and the video data, with a support vector machine model, to determine a stress level of the user; processing, by the device, the text data, the audio data, and the video data, with different regression models, to determine a first depression level of the user based on the text data, a second depression level of the user based on the audio data, and a third depression level of the user based on the video data; combining, by the device, the first depression level, the second depression level, and the third depression level to identify an overall depression level of the user; processing, by the device, the text data, the audio data, and the video data, with a deep learning convolutional neural network model, to determine a continuous affect prediction for the user; processing, by the device, the text data, the audio data, and the video data, with a classifier model, to determine an emotion of the user; processing, by the device, the text data, the audio data, and the video data, with a generative pretrained transformer language model, to determine a response to the user; utilizing, by the device, a plug and play language model to determine a context for the response, based on the response, the stress level, the overall depression level, the continuous affect prediction, and the emotion; utilizing, by the device, one or more dialog manager models to generate contextual conversation data, based on the text data, the audio data, the video data, the response, and the context; and performing, by the device, one or more actions based on the contextual conversational data. 2. The method of claim 1 , wherein processing the text data, the audio data, and the video data, with the support vector machine model, to determine the stress level of the user comprises: determining a first stress level of the user based on the text input by the user, as provided in the text data; determining a second stress level of the user based on an intonation of a voice of the user, a rhythm of the voice, a pitch of the voice, an intensity of the voice, a loudness of the voice, and a jitter of the voice, as provided in the audio data; determining a third stress level of the user based on a head pose of the user, an eye gaze of the user, and an intensity of a facial muscle contraction of the user, as provided in the video data; and combining the first stress level, the second stress level, and the third stress level to determine the stress level of the user. 3. The method of claim 1 , wherein processing the text data, the audio data, and the video data, with the different regression models, to determine the first depression level of the user based on the text data, the second depression level of the user based on the audio data, and the third depression level of the user based on the video data comprises: processing the text data, with a first regression model, to determine the first depression level of the user; processing the audio data, with a second regression model, to determine the second depression level of the user; and processing the video data, with a third regression model, to determine the third depression level of the user. 4. The method of claim 1 , wherein combining the first depression level, the second depression level, and the third depression level to identify the overall depression level of the user comprises: assigning a first weight to the first depression level to generate a first weighted depression level; assigning a second weight to the second depression level to generate a second weighted depression level; assigning a third weight to the third depression level to generate a third weighted depression level; and aggregating the first weighted depression level, the second weighted depression level, and the third weighted depression level to identify the overall depression level of the user. 5. The method of claim 1 , wherein the continuous affect prediction for the user includes an arousal prediction for the user and a valence prediction for the user. 6. The method of claim 1 , wherein the deep learning convolutional neural network model includes a multi-modal sequence-to-sequence model. 7. The method of claim 1 , wherein the classifier model includes a random forest classifier model, and wherein the emotion of the user includes one or more of happiness, sadness, anger, surprise, neutral, contempt, fear, or disgust. 8. A device, comprising: one or more memories; and one or more processors, communicatively coupled to the one or more memories, configured to: receive, from a user device, text data identifying text input by a user of the user device, audio data identifying audio associated with the user, and video data identifying a video associated with the user; process the text data, the audio data, and the video data, with a support vector machine model, to determine a stress level of the user; process the text data, the audio data, and the video data, with different regression models, to determine a first depression level of the user based on the text data, a second depression level of the user based on the audio data, and a third depression level of the user based on the video data; assign weights to the first depression level, the second depression level, and the third depression level to generate a first weighted depression level, a second weighted depression level, and a third weighted depression level; aggregate the first weighted depression level, the second weighted depression level, and the third weighted depression level to identify an overall depression level of the user; process the text data, the audio data, and the video data, with a deep learning convolutional neural network model, to determine a continuous affect prediction for the user; process the text data, the audio data, and the video data, with a classifier model, to determine an emotion of the user; process the text data, the audio data, and the video data, with a generative pretrained transformer language model, to determine a response to the user; utilize a plug and play language model to determine a context for the response, based on the response, the stress level, the overall depression level, the continuous affect prediction, and the emotion; utilize one or more dialog manager models to generate contextual conversation data, based on the text data, the audio data, the video data, the response, and the context; and perform one or more actions based on the contextual conversational data. 9. The device of claim 8 , wherein the generative pretrained transformer language model includes a sentiment portion that is trained based on an emotion class and by applying a cross-entropy loss to the sentiment portion. 10. The device of claim 8 , wherein the plug and play language model includes a language model and an attribute model, and wherein the one or more processors, when utilizing the plug and play language model to determine the context for the response, are configured to: process the response, the stress level, the overall depression level, the continuous affect prediction, and the emotion, with the attribute model, to determine attributes and gradients; perform a forward pass with the language model, of the plug and play language model, to compute a likelihood of the attribute; perform a backward pass with the language model, of the plug and play language model, to update internal lat

Assignees

Accenture Global Solutions Ltd

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G10L15/1815Primary
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

Patent family

Related publications grouped by family.

View patent family 82405243

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11854540B2 cover?: A device may receive text data, audio data, and video data associated with a user, and may process the received data, with a first model, to determine a stress level of the user. The device may process the received data, with second models, to determine depression levels of the user, and may combine the depression levels to identify an overall depression level. The device may process the receiv…
Who is the assignee on this patent?: Accenture Global Solutions Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/1815. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).