User Dedicated Automatic Speech Recognition
US-2015046157-A1 · Feb 12, 2015 · US
US10789950B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10789950-B2 |
| Application number | US-201815876545-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 22, 2018 |
| Priority date | Mar 16, 2012 |
| Publication date | Sep 29, 2020 |
| Grant date | Sep 29, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A multi-mode voice controlled user interface is described. The user interface is adapted to conduct a speech dialog with one or more possible speakers and includes a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering, and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering. The user interface switches listening modes in response to one or more switching cues.
Opening claim text (preview).
What is claimed is: 1. A device for automatic speech recognition (ASR) comprising: a multi-mode voice-controlled user interface employing at least one hardware implemented computer processor, wherein the user interface is adapted to conduct a speech dialog with one or more possible speakers and includes: a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, wherein the user interface is adapted to: switch from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, and the user interface is adapted to remain in the selective listening mode so long as a location of the specific speaker is known. 2. A device according to claim 1 , wherein the switching cues include one or more mode switching words from the speech inputs. 3. A device according to claim 1 , wherein the switching cues include one or more dialog states in the speech dialog. 4. A device according to claim 1 , wherein the switching cues include one or more visual cues from the possible speakers. 5. A device according to claim 1 , wherein the selective listening mode uses acoustic speaker localization for the spatial filtering. 6. A device according to claim 1 , wherein the selective listening mode uses image processing for the spatial filtering. 7. A device according to claim 1 , wherein the user interface operates in the selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface. 8. A device according to claim 1 , wherein the user interface is adapted to operate in both listening modes in parallel, whereby the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode. 9. The device according to claim 1 , wherein the user interface is adapted to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word. 10. A computer program product encoded in a non-transitory computer-readable medium for operating an automatic speech recognition (ASR) system, the product comprising: program code executable to conduct a speech dialog with one or more possible speakers via a multi-mode voice-controlled user interface adapted to: accept speech inputs from the possible speakers in a broad listening mode without spatial filtering, the broad listening mode having an associated limited broad mode recognition vocabulary; and limit speech inputs to a specific speaker in a selective listening mode using spatial filtering, the selective listening mode having an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, wherein the program code is executable to cause the user interface to: switch from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engage the specific speaker in a dialog using the selective mode recognition vocabulary, and the program code is executable to cause the user interface to remain in the selective listening mode so long as a location of the specific speaker is known. 11. The computer program product of claim 10 , wherein the program code is executable to switch from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word. 12. A method for automatic speech recognition (ASR) comprising: employing a multi-mode voice-controlled user interface having a computer processor to conduct a speech dialog with one or more possible speakers by: employing a broad listening mode which accepts speech inputs from the possible speakers without spatial filtering and has an associated limited broad mode recognition vocabulary; and employing a selective listening mode which limits speech inputs to a specific speaker using spatial filtering and has an associated selective mode recognition vocabulary that is larger than the limited broad mode recognition vocabulary, the user interface: switching from the broad listening mode to the selective listening mode in response to one or more switching cues, in the selective listening mode, engaging the specific speaker in a dialog using the selective mode recognition vocabulary, and remaining in the selective listening mode so long as a location of the specific speaker is known. 13. The method according to claim 12 , wherein the switching cues include one or more mode switching words from the speech inputs. 14. The method according to claim 12 , wherein the switching cues include one or more dialog states in the speech dialog. 15. The method according to claim 12 , wherein the switching cues include one or more visual cues from the possible speakers. 16. The method according to claim 12 , wherein the selective listening mode includes using acoustic speaker localization for the spatial filtering. 17. The method according to claim 12 , wherein the selective listening mode includes using image processing for the spatial filtering. 18. The method according to claim 12 , wherein the user interface operates in selective listening mode simultaneously in parallel for each of a plurality of selected speakers, so that each of the plurality of selected speakers has its own selective listening mode and dialog with the user interface. 19. The method according to claim 12 , wherein the user interface operates in both listening modes in parallel, such that the user interface accepts speech inputs in the broad listening mode, and at the same time accepts speech inputs from at least one selected speaker in at least one selective listening mode. 20. The method according to claim 12 , including the user interface switching from the selective listening mode to the broad listening mode in response to either an end of the dialog or an activation word.
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Microphone arrays; Beamforming · CPC title
Speaker identification or verification techniques · CPC title
Constructional details of speech recognition systems · CPC title
for comparison or discrimination · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.