User dedicated automatic speech recognition

US10789950B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10789950-B2
Application numberUS-201815876545-A
CountryUS
Kind codeB2
Filing dateJan 22, 2018
Priority dateMar 16, 2012
Publication dateSep 29, 2020
Grant dateSep 29, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface switches listening modes in response to one or more switching cues.

First claim

Opening claim text (preview).

What is claimed is: 1. A device for automatic speech recognition (ASR) comprising: a multi-mode voice-controlled user interface employing at least one hardware implemented computer processor, wherein the user interface is adapted to conduct a speech dialog with one or more possible speakers and includes: a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, wherein the user interface is adapted to: switch from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, and the user interface is adapted to remain in the selective listening mode so long as a location of the specific speaker is known. 2. A device according to claim 1 , wherein the switching cues include one or more mode switching words from the speech inputs. 3. A device according to claim 1 , wherein the switching cues include one or more dialog states in the speech dialog. 4. A device according to claim 1 , wherein the switching cues include one or more visual cues from the possible speakers. 5. A device according to claim 1 , wherein the selective listening mode uses acoustic speaker localization for the spatial filtering. 6. A device according to claim 1 , wherein the selective listening mode uses image processing for the spatial filtering. 7. A device according to claim 1 , wherein the user interface operates in the selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface. 8. A device according to claim 1 , wherein the user interface is adapted to operate in both listening modes in parallel, whereby the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode. 9. The device according to claim 1 , wherein the user interface is adapted to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word. 10. A computer program product encoded in a non-transitory computer-readable medium for operating an automatic speech recognition (ASR) system, the product comprising: program code executable to conduct a speech dialog with one or more possible speakers via a multi-mode voice-controlled user interface adapted to: accept speech inputs from the possible speakers in a broad listening mode without spatial filtering, the broad listening mode having an associated limited broad mode recognition vocabulary; and limit speech inputs to a specific speaker in a selective listening mode using spatial filtering, the selective listening mode having an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, wherein the program code is executable to cause the user interface to: switch from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, and the program code is executable to cause the user interface to remain in the selective listening mode so long as a location of the specific speaker is known. 11. The computer program product of claim 10 , wherein the program code is executable to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word. 12. A method for automatic speech recognition (ASR) comprising: employing a multi-mode voice-controlled user interface having a computer processor to conduct a speech dialog with one or more possible speakers by: employing a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and employing a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, the user interface: switching from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engaging the specific speaker in a dialog using the selective mode recognition vocabulary, and remaining in the selective listening mode so long as a location of the specific speaker is known. 13. The method according to claim 12 , wherein the switching cues include one or more mode switching words from the speech inputs. 14. The method according to claim 12 , wherein the switching cues include one or more dialog states in the speech dialog. 15. The method according to claim 12 , wherein the switching cues include one or more visual cues from the possible speakers. 16. The method according to claim 12 , wherein the selective listening mode includes using acoustic speaker localization for the spatial filtering. 17. The method according to claim 12 , wherein the selective listening mode includes using image processing for the spatial filtering. 18. The method according to claim 12 , wherein the user interface operates in selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface. 19. The method according to claim 12 , wherein the user interface operates in both listening modes in parallel, such that the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode. 20. The method according to claim 12 , including the user interface switching from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word.

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Microphone arrays; Beamforming · CPC title

  • Speaker identification or verification techniques · CPC title

  • Constructional details of speech recognition systems · CPC title

  • for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10789950B2 cover?
A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface swit…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 29 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).