Real-time emotion recognition from audio signals

US2016019915A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016019915-A1
Application numberUS-201414336847-A
CountryUS
Kind codeA1
Filing dateJul 21, 2014
Priority dateJul 21, 2014
Publication dateJan 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer-readable storage media are provided for recognizing emotion in audio signals in real-time. An audio signal is detected and a rapid audio fingerprint is computed on a user's computing device. One or more features is extracted from the audio fingerprint and compared with features associated with defined emotions to determine relative degrees of similarity. Confidence scores are computed for the defined emotions based on the relative degrees of similarity and it is determined whether a confidence score for one or more particular emotions exceeds a threshold confidence score. If it is determined that a threshold confidence score for one or more particular emotions is exceeded, the particular emotion or emotions are associated with the audio signal. As desired, various action then may be initiated based upon the emotion/emotions associated with the audio signal.

First claim

Opening claim text (preview).

What is claimed is: 1 . One or more computer-readable storage media storing computer-useable instructions that, when used by one or more computing devices, cause the one or more computing devices to perform a method for recognizing emotion in audio signals, the method comprising: detecting an audio signal; computing an audio fingerprint from the detected audio signal; computing confidence scores for one or more of a plurality of defined emotions based upon the audio fingerprint; and associating one or more emotions with the audio signal based upon the computed confidence scores. 2 . The one or more computer-readable storage media of claim 1 , wherein the method further comprises determining that the confidence score computed for one or more particular emotions of the plurality of defined emotions exceeds a confidence score threshold, and wherein associating one or more emotions with the audio signal based upon the computed confidence scores comprises associating the one or more particular emotions of the plurality of defined emotions with the audio signal. 3 . The one or more computer-readable storage media of claim 2 , wherein detecting an audio signal comprises detecting a first audio signal at a first time and a second audio signal at a second time, both the first and second audio signals being associated with a particular speaker, and wherein the confidence score threshold is determined based, at least in part, upon at least one change between the first and second audio signals. 4 . The one or more computer-readable storage media of claim 1 , wherein the method further comprises initiating an action based, at least in part, upon the one or more emotions associated with the audio signal. 5 . The one or more computer-readable storage media of claim 4 , wherein the method further comprises recognizing at least one word from the detected audio signal, and wherein initiating an action comprises initiating the action based upon the one or more emotions associated with the audio signal and the at least one word. 6 . The one or more computer-readable storage media of claim 1 , wherein the method further comprises: extracting at least one feature from the audio fingerprint; and comparing the extracted at least one feature with features associated with the plurality of defined emotions to determine relative degrees of similarity. 7 . The one or more computer-readable storage media of claim 6 , wherein computing confidence scores for one or more of the plurality of defined emotions comprises computing confidence scores for one or more of the plurality of defined emotions based upon the relative degrees of similarity. 8 . The one or more computer-readable storage media of claim 6 , wherein the features associated with the plurality of defined emotions are aggregated based upon a plurality of speakers from whom audio fingerprints have been extracted. 9 . The one or more computer-readable storage media of claim 6 , wherein the method further comprises identifying a speaker of the audio signal based upon the computed audio fingerprint. 10 . The one or more computer-readable storage media of claim 9 , wherein the features associated with the plurality of defined emotions are specific to the identified speaker. 11 . The one or more computer-readable storage media of claim 1 , wherein the method further comprises altering at least one of the confidence scores based upon an additional signal, and wherein the additional signal is a non-audio signal. 12 . A method being performed by one or more computing devices including at least one processor, the method for recognizing emotion in audio signals, the method comprising: detecting an audio signal; computing an audio fingerprint from the detected audio signal; extracting at least one feature from the audio fingerprint; comparing the extracted at least one feature with features associated with a plurality of defined emotions to determine relative degrees of similarity; computing confidence scores for one or more of the plurality of defined emotions based upon the determined relative degrees of similarity; determining that the confidence score computed for one or more particular emotions of the plurality of defined emotions exceeds a confidence score threshold; and associating the one or more particular emotions of the plurality of defined emotions with the audio signal. 13 . The method of claim 12 , wherein the at least one feature extracted from the audio fingerprint and the features associated with the plurality of defined emotions include one or more of a frequency-time representation; variance of speech by amplitude, variance of speech by pacing of words, zero-crossing rate, fundamental estimation and its derivative, spectral distribution of the audio signal, ratio of voiced/unvoiced signal in speech, and prosody of speech. 14 . The method of claim 12 , wherein detecting an audio signal comprises detecting a first audio signal at a first time and a second audio signal at a second time, both the first and second audio signals being associated with a particular speaker, and wherein the confidence value threshold is determined based upon at least one change between the first and second audio signals. 15 . The method of claim 12 , further comprising initiating an action based, at least in part, upon the particular one or more emotions of the plurality of defined emotions associated with the audio signal. 16 . The method of claim 15 , further comprising recognizing at least one word from the detected audio signal, and wherein initiating an action comprises initiating the action based upon the particular one or more emotions of the plurality of defined emotions associated with the audio signal and the at least one word. 17 . The method of claim 12 , wherein the features associated with the plurality of defined emotions are aggregated based upon a plurality of speakers from whom audio fingerprints have been extracted. 18 . The method of claim 12 , further comprising identifying a speaker of the audio signal based upon the computed audio fingerprint, wherein the features associated with the plurality of defined emotions are specific to the identified speaker. 19 . The method of claim 12 , further comprising altering at least one of the confidence scores based upon an additional signal that is a non-audio signal. 20 . A system comprising: a microphone that detects audio signals; an emotion identification engine having one or more processors and one or more computer-readable storage media; and a data store coupled with the emotion identification engine, wherein the emotion identification engine: receives a detected audio signal from the microphone; computes an audio fingerprint from the received audio signal; determines confidence scores for one or more of a plurality of defined emotions based upon the computed audio fingerprint and at least a portion of data stored in association with the data store; and associates one or more emotions with the audio signal based upon the computed confidence scores.

Assignees

Inventors

Classifications

  • of the speaker; Human-factor methodology · CPC title

  • G10L25/63Primary

    for estimating an emotional state · CPC title

  • Speaker identification or verification techniques · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016019915A1 cover?
Systems, methods, and computer-readable storage media are provided for recognizing emotion in audio signals in real-time. An audio signal is detected and a rapid audio fingerprint is computed on a user's computing device. One or more features is extracted from the audio fingerprint and compared with features associated with defined emotions to determine relative degrees of similarity. Confidenc…
Who is the assignee on this patent?
Microsoft Corp
What technology area does this patent fall under?
Primary CPC classification G10L25/63. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).