What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

User dedicated automatic speech recognition

US10789950B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10789950-B2
Application number	US-201815876545-A
Country	US
Kind code	B2
Filing date	Jan 22, 2018
Priority date	Mar 16, 2012
Publication date	Sep 29, 2020
Grant date	Sep 29, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface switches listening modes in response to one or more switching cues.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for automatic speech recognition (ASR) comprising: a multi-mode voice-controlled user interface employing at least one hardware implemented computer processor, wherein the user interface is adapted to conduct a speech dialog with one or more possible speakers and includes: a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, wherein the user interface is adapted to: switch from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, and the user interface is adapted to remain in the selective listening mode so long as a location of the specific speaker is known. 2. A device according to claim 1 , wherein the switching cues include one or more mode switching words from the speech inputs. 3. A device according to claim 1 , wherein the switching cues include one or more dialog states in the speech dialog. 4. A device according to claim 1 , wherein the switching cues include one or more visual cues from the possible speakers. 5. A device according to claim 1 , wherein the selective listening mode uses acoustic speaker localization for the spatial filtering. 6. A device according to claim 1 , wherein the selective listening mode uses image processing for the spatial filtering. 7. A device according to claim 1 , wherein the user interface operates in the selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface. 8. A device according to claim 1 , wherein the user interface is adapted to operate in both listening modes in parallel, whereby the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode. 9. The device according to claim 1 , wherein the user interface is adapted to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word. 10. A computer program product encoded in a non-transitory computer-readable medium for operating an automatic speech recognition (ASR) system, the product comprising: program code executable to conduct a speech dialog with one or more possible speakers via a multi-mode voice-controlled user interface adapted to: accept speech inputs from the possible speakers in a broad listening mode without spatial filtering, the broad listening mode having an associated limited broad mode recognition vocabulary; and limit speech inputs to a specific speaker in a selective listening mode using spatial filtering, the selective listening mode having an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, wherein the program code is executable to cause the user interface to: switch from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, and the program code is executable to cause the user interface to remain in the selective listening mode so long as a location of the specific speaker is known. 11. The computer program product of claim 10 , wherein the program code is executable to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word. 12. A method for automatic speech recognition (ASR) comprising: employing a multi-mode voice-controlled user interface having a computer processor to conduct a speech dialog with one or more possible speakers by: employing a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and employing a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, the user interface: switching from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engaging the specific speaker in a dialog using the selective mode recognition vocabulary, and remaining in the selective listening mode so long as a location of the specific speaker is known. 13. The method according to claim 12 , wherein the switching cues include one or more mode switching words from the speech inputs. 14. The method according to claim 12 , wherein the switching cues include one or more dialog states in the speech dialog. 15. The method according to claim 12 , wherein the switching cues include one or more visual cues from the possible speakers. 16. The method according to claim 12 , wherein the selective listening mode includes using acoustic speaker localization for the spatial filtering. 17. The method according to claim 12 , wherein the selective listening mode includes using image processing for the spatial filtering. 18. The method according to claim 12 , wherein the user interface operates in selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface. 19. The method according to claim 12 , wherein the user interface operates in both listening modes in parallel, such that the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode. 20. The method according to claim 12 , including the user interface switching from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word.

Assignees

Nuance Communications Inc

Inventors

Classifications

G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L2021/02166
Microphone arrays; Beamforming · CPC title
G10L17/00
Speaker identification or verification techniques · CPC title
G10L15/28
Constructional details of speech recognition systems · CPC title
G10L25/51
for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

View patent family 45888502

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10789950B2 cover?: A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface swit…
Who is the assignee on this patent?: Nuance Communications Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).