System and method for using prosody for voice-enabled search

US10002608B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10002608-B2
Application numberUS-88495910-A
CountryUS
Kind codeB2
Filing dateSep 17, 2010
Priority dateSep 17, 2010
Publication dateJun 19, 2018
Grant dateJun 19, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice and the prosodic analysis, approximates based on the reweighted word lattice one or more relevant responses to the query, and presents to a user the responses to the query. The prosodic analysis examines metalinguistic information of the user speech and can identify the most salient subject matter of the speech, assess how confident a speaker is in the content of his or her speech, and identify the attitude, mood, emotion, sentiment, etc. of the speaker. Other information not described in the content of the speech can also be used.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: receiving a word lattice generated by an automatic speech recognizer processing a query, wherein the word lattice is weighted according to the query; identifying a policy which allows use of a user emotional state in responding to a user who produced the query; performing, via a processor of the automatic speech recognizer, a prosodic analysis of the query, wherein the prosodic analysis identifies an audible gesture in the query and a rhythm of words spoken in the query; identifying, according to the prosodic analysis, the user emotional state; reweighting, via the processor, the word lattice according to the prosodic analysis, the user emotional state and one of a time of day, a time of year, and a behavioral history of the user, to yield a reweighted word lattice; determining, via the processor and according to the reweighted word lattice, a response to the query, the response addressing the audible gesture; and presenting to the user the response to the query. 2. The method of claim 1 , wherein the prosodic analysis further examines metalinguistic information of the query according to one or more of a tonality, a volume, a stress, an intonation, and a speed of the query. 3. The method of claim 2 , wherein the prosodic analysis identifies a most salient subject matter of the query. 4. The method of claim 2 , wherein the prosodic analysis assesses a confidence of the user in content of the query. 5. The method of claim 2 , wherein the prosodic analysis identifies a mood of the user. 6. The method of claim 1 , wherein determining the response is performed by composing the reweighted word lattice with a search finite state transducer according to a plurality of pre-indexed documents. 7. The method of claim 6 , further comprising generating a finite state transducer according to composing the reweighted word lattice. 8. The method of claim 1 , wherein the audible gesture comprises one of a laugh, a sob, a cough, and a yawn. 9. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising: receiving a word lattice generated by an automatic speech recognizer a query, wherein the word lattice is weighted according to the query; identifying a policy which allows use of a user emotional state in responding to a user who produced the query; performing a prosodic analysis of the query, wherein the prosodic analysis identifies an audible gesture in the query and a rhythm of words spoken in the query; identifying, according to the prosodic analysis, the user emotional state; reweighting the word lattice according to the prosodic analysis, the user emotional state and one of a time of day, a time of year, and a behavioral history of the user to yield a reweighted word lattice; determining, according to the reweighted word lattice, a response to the query, the response addressing the audible gesture; and presenting to the user the response to the query. 10. The system of claim 9 , wherein the prosodic analysis examines metalinguistic information of the query according to one or more of a tonality, a volume, a stress, an intonation, and a speed of the query. 11. The system of claim 10 , wherein the prosodic analysis identifies a most salient subject matter of the query. 12. The system of claim 10 , wherein the prosodic analysis assesses a confidence of the user in content of the query. 13. The system of claim 10 , wherein the prosodic analysis identifies a mood of the user. 14. A computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising: receiving a word lattice generated by an automatic speech recognizer processing a query, wherein the word lattice is weighted according to the query; identifying a policy which allows use of a user emotional state in responding to a user who produced the query; performing a prosodic analysis of the query, wherein the prosodic analysis identifies an audible gesture in the query and a rhythm of words spoken in the query; identifying, according to the prosodic analysis, the user emotional state; reweighting the word lattice according to the prosodic analysis, the user emotional state and one of a time of day, a time of year, and a behavioral history of the user, to yield a reweighted word lattice; determining, according to the reweighted word lattice, a response to the query, the response addressing the audible gesture; and presenting to the user the response to the query. 15. The computer-readable storage device of claim 14 , wherein determining the response is performed by composing the reweighted word lattice with a search finite state transducer according to a plurality of pre-indexed documents. 16. The computer-readable storage device of claim 15 , having additional instructions stored which, when executed by the computing device, result in operations comprising generating a finite state transducer according to composing the reweighted word lattice. 17. The computer-readable storage device of claim 16 , having additional instructions stored which, when executed by the computing device, result in operations comprising reranking, according to the finite state transducer, the word lattice generated by the automatic speech recognizer.

Assignees

Inventors

Classifications

  • for retrieval · CPC title

  • using non-speech characteristics · CPC title

  • for estimating an emotional state · CPC title

  • of the speaker; Human-factor methodology · CPC title

  • using prosody or stress · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10002608B2 cover?
Disclosed herein are systems, methods, and non-transitory computer-readable storage media for approximating relevant responses to a user query with voice-enabled search. A system practicing the method receives a word lattice generated by an automatic speech recognizer based on a user speech and a prosodic analysis of the user speech, generates a reweighted word lattice based on the word lattice…
Who is the assignee on this patent?
Bangalore Srinivas, Feng Junlan, Johnston Michael, and 2 more
What technology area does this patent fall under?
Primary CPC classification G10L15/1807. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 19 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).