What technology area does this patent fall under?

Primary CPC classification G10L15/1822. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Contextual utterance resolution in multimodal systems

US11455982B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11455982-B2
Application number	US-201916241015-A
Country	US
Kind code	B2
Filing date	Jan 7, 2019
Priority date	Jan 7, 2019
Publication date	Sep 27, 2022
Grant date	Sep 27, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method of responding to a vocal utterance may include capturing and converting the utterance to word(s) using a language processing method, such as natural language processing. The context of the utterance and of the system, which may include multimodal inputs, may be used to determine the meaning and intent of the words.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: storing context data in a computer-readable medium, wherein the context data that are stored include data related to multimodal inputs, interactions with automotive systems, and automotive system status indicators, wherein said automotive system status indicators comprise automotive sensor readings; receiving a first utterance via a microphone, wherein said first utterance is a command given to a system that is selected from the group that consists of an infotainment system, a vehicular environment system, and a vehicle status-and-control system; and causing a computer-system having a multi-modal input to carry out a process for disambiguating the first utterance; wherein the process for disambiguating the first utterance comprises causing the computer-system having the multi-modal input to carry out the steps of applying the context data to the first utterance and determining a meaning of the first utterance based at least in part on the context data, wherein said context data comprises context factors, wherein applying context data to said first utterance is carried out after having determined that said first utterance remains ambiguous, wherein determining said meaning comprises, after having applied said context data, determining that said first utterance is no longer ambiguous, wherein said method further comprises responding to said first utterance, said method further comprising, after having caused said computer-system to carry out the process for disambiguating said first utterance, receiving a second utterance, determining that said second utterance is unambiguous, responding to said second utterance without application of said context data, receiving a third utterance, applying said context data to said third utterance, determining that said third utterance remains ambiguous, and requesting further information from a driver and/or a passenger in a vehicle that comprises said computer-system. 2. The method of claim 1 , wherein applying said context data to said first utterance comprises applying a first context factor, determining that application of said first context factor fails to resolve said ambiguity, applying a second context factor, and determining that application of said second context factor has resolved said ambiguity. 3. The method of claim 1 , wherein applying said context data to said first utterance comprises applying a recent antecedent, said recent antecedent being data indicative of a most recent interaction of an utterer of said first utterance with said computer-system, said recent interaction comprising an input on one mode of said multi-modal input. 4. The method of claim 3 , wherein, prior to applying said context data, a time interval between said recent antecedent and said first utterance is determined to be less than a threshold period of time. 5. The method of claim 1 , wherein the computer-readable medium in which the context data was stored is in an electronic vehicle-platform that comprises a head unit, a core, a vehicle interface, and a communications link, wherein the head unit is a component of an infotainment system, wherein the head unit is configured to execute an application suite, wherein the core comprises a processor, storage, software, and firmware to perform core functions of the system, and wherein the vehicle interface comprises sensors that provide data to the electronic vehicle-platform and that control mechanisms for operating vehicle components. 6. The method of claim 1 , further comprising causing the computer-system to receive an input indicative of a gaze direction, wherein determining the meaning comprises determining the meaning based at least in part on said input. 7. The method of claim 1 , further comprising causing the computer-system to receive an input selected from the group consisting of a stylus input, a haptic input, and a text input, wherein determining the meaning comprises determining the meaning based at least in part on the input. 8. The method of claim 1 , further comprising causing the computer-system to respond to text input. 9. The method of claim 1 , wherein disambiguating the first utterance comprises using embedded processing and using cloud processing and wherein embedded processing comprises using embedded elements that are contained within an electronic vehicle-platform. 10. The method of claim 1 , further comprising using natural language processing to determine a word of the first utterance. 11. The method of claim 1 , further comprising using a recent antecedent interaction with the computer-system as a context factor in applying context to the determination of the meaning of the first utterance. 12. The method of claim 1 , further comprising using gaze data as a context factor in applying context to the determination of the meaning of the first utterance. 13. The method of claim 1 , further comprising using current media-playing data as a context factor in applying context to the determination of the meaning of the first utterance. 14. The method of claim 1 , further comprising using an associated system's status as a context factor in applying context to the determination of the meaning of the first utterance. 15. The method of claim 14 , further comprising using a vehicle's status as a context factor in applying context to the determination of the meaning of the first utterance. 16. The method of claim 15 , further comprising using a sensor reading as an indication of the vehicle's status. 17. The method of claim 1 , wherein determining the meaning of the first utterance comprises using a speech analysis technique selected from the group consisting of voice activity detection and natural language understanding. 18. An apparatus comprising a multi-modal system for disambiguating utterances received via a microphone, said utterances comprising first, second, and third utterances, said second utterance having been received after said first utterance and said third utterance having been received after said second utterance, each of said utterances being a command given to a system that is selected from the group that consists of an infotainment system, a vehicular environment system, and a vehicle status-and-control system, wherein said multi-modal system is configured to store context data in a computer-readable medium, wherein said context data comprises context factors, data related to multimodal inputs received by said multi-modal system, data related to interactions with automotive systems, and data related to automotive system status indicators that comprise automotive sensor readings, wherein said multi-modal system is further configured to apply said context data for determining a meaning of each of said first, second, and third utterances, said context data being applied to said first utterance after having determined that said first utterance remains ambiguous, wherein determining a meaning of said first utterance comprises, after having applied said context data to said first utterance, determining that said first utterance is no longer ambiguous, wherein said multi-modal system is further configured to respond to said first utterance and, after having disambiguated said first utterance, to receive said second utterance, to determine that said second utterance is unambiguous, and to respond to said second utterance without application of said context data, wherein said multi-modal system is further configured to receive said third utterance, to apply said context data to said third utterance, to determine that said thi

Assignees

Cerence Operating Co

Inventors

Classifications

G06F3/013
Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G06F3/167
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
B60R16/0373
Voice control (in general G10L) · CPC title
G10L15/1822Primary
Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

View patent family 71404520

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11455982B2 cover?: A system and method of responding to a vocal utterance may include capturing and converting the utterance to word(s) using a language processing method, such as natural language processing. The context of the utterance and of the system, which may include multimodal inputs, may be used to determine the meaning and intent of the words.
Who is the assignee on this patent?: Cerence Operating Co
What technology area does this patent fall under?: Primary CPC classification G10L15/1822. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 27 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System and Method for Assessing and Correcting Potential Underserved Content In Natural Language Understanding Applications

Eye Gaze for Spoken Language Understanding in Multi-Modal Conversational Interactions

Information processing device, information processing system, information processing method, and information processing program

Frequently asked questions