Non-speech input to speech processing system
US-10692489-B1 · Jun 23, 2020 · US
US11538478B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11538478-B2 |
| Application number | US-202017114047-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 7, 2020 |
| Priority date | Dec 7, 2020 |
| Publication date | Dec 27, 2022 |
| Grant date | Dec 27, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech-processing system may provide access to multiple virtual assistants via one or more voice-controlled devices. Each assistant may leverage language processing and language generation features of the speech-processing system, while handling different commands and/or providing access to different back applications. Different assistants may be available for use with a particular voice-controlled device based on time, location, the particular user, etc. The voice-controlled device may include components for facilitating user interaction with multiple assistants. For example, a multi-assistant component may facilitate enabling/disabling assistants, assigning gestures and/or wakewords, etc. The multi-assistant component may handle routing commands to a command processing subsystem corresponding to an assistant invoked by the command. The voice controlled device may further include observer components, each configured to monitor the voice-controlled device for invocations of a particular assistant.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: detecting, by a device, a first gesture, wherein the first gesture is a non-verbal movement detectable by the device; receiving, by a microphone of the device, first input audio representing a spoken utterance; determining, using data stored by the device, that the first gesture corresponds to a first command processing subsystem (CPS), wherein the data stored by the device indicates that the first gesture represents a request to invoke the first CPS and that a second gesture represents a request to invoke a second CPS; outputting, by the device, a first indication that the first CPS is processing the first input audio; in response to determining that the first gesture corresponds to the first CPS, sending, by the device to a speech-processing system, first data representing the first input audio and a second indication that the first data is to be processed by the first CPS, the speech-processing system capable of sending input data to the first CPS and the second CPS; receiving, from the speech-processing system, first response data; and outputting, by the device, first synthesized speech in a first speech style corresponding to the first CPS. 2. The method of claim 1 , further comprising, prior to receiving the first input audio: receiving, by the device, a first request to enable the second CPS for processing commands received by the device; sending, to the speech-processing system, second data representing a change to a device-specific setting of the device to enable the device to process commands using the second CPS; receiving, from the speech-processing system, a third indication that the device-specific setting has been updated; detecting, by the device, the second gesture; receiving second input audio; determining, using the data stored by the device, that the second gesture corresponds to the second CPS; in response to determining that the second gesture corresponds to the second CPS sending, to the speech-processing system, third data representing the second input audio and a fourth indication that the third data is to be processed by the second CPS; receiving, from the speech-processing system, second response data; and outputting, by the device, second synthesized speech in a second speech style corresponding to the second CPS. 3. The method of claim 1 , further comprising, prior to receiving the first input audio: receiving, by the device, a first request to assign the second gesture for invoking the second CPS, wherein the second gesture is different from the first gesture; sending, to the speech-processing system, second data representing a change to a device-specific setting of the device to associate the second gesture with the second CPS; receiving, from the speech-processing system, a third indication that the device-specific setting has been updated; configuring the data stored by the device to include an association between the second gesture and the second CPS; detecting, by the device, the second gesture; receiving second input audio; determining, using the data stored by the device, that the second gesture corresponds to the second CPS; in response to determining that the second gesture corresponds to the second CPS, sending, to the speech-processing system, third data representing the second input audio and a fourth indication that the third data is to be processed by the second CPS; receiving, from the speech-processing system, second response data; and outputting, by the device, second synthesized speech in a second speech style corresponding to the second CPS. 4. The method of claim 1 , further comprising: detecting, by the device, the first gesture; receiving second input audio; determining, using the data stored by the device, that the first gesture corresponds to the first CPS; outputting a third indication that the first CPS is processing the second input audio; in response to determining that the first gesture corresponds to the first CPS, sending, to the speech-processing system, second data representing the second input audio and a fourth indication that the second data is to be processed by the first CPS; receiving, from the speech-processing system, a fifth indication that the second CPS is to process the second data; in response to receiving the fifth indication, outputting a sixth indication that the second CPS is processing the second input audio; receiving, from the speech-processing system, second response data; and outputting, by the device, second synthesized speech in a second speech style corresponding to the second CPS. 5. A method comprising: receiving, by a device, first input audio representing a spoken utterance; detecting, by the device, a first wake command; determining, using data stored by the device, that the first wake command corresponds to a first command processing subsystem (CPS), wherein the data stored by the device indicates that the first wake command represents a request to invoke the first CPS and that a second wake command represents a request to invoke a second CPS; in response to determining that the first wake command corresponds to the first CPS, sending, by the device to a speech-processing system, first data representing the first input audio and a first indication that the first data is to be processed by the first CPS, the speech-processing system capable of sending input data to at least the first CPS and the second CPS; receiving, from the speech-processing system, first response data; and performing, by the device, a first action based on the first response data. 6. The method of claim 5 , further comprising: outputting, by the device and based on the first response data, synthesized speech in a speech style corresponding to the first CPS. 7. The method of claim 5 , further comprising, prior to receiving the first input audio: receiving, by the device, a first request to enable the second CPS for processing commands received by the device; sending, to the speech-processing system, second data representing a change to a device-specific setting of the device to enable the device to process commands using the second CPS; receiving, from the speech-processing system, a second indication that the device-specific setting has been updated; receiving second input audio; detecting the second wake command; determining, using the data stored by the device, that the second wake command corresponds to the second CPS; in response to determining that the second wake command corresponds to the second CPS, sending, to the speech-processing system, third data representing the second input audio and a fourth indication that the third data is to be processed by the second CPS; receiving, from the speech-processing system, second response data; and performing, by the device, a second action based on the second response data wherein the second response data is based on the second CPS processing the second input audio. 8. The method of claim 5 , further comprising, prior to receiving the first input audio: receiving, by the device, a first request to assign a first gesture for invoking the second CPS, wherein the first gesture is a non-verbal movement detectable by the device and corresponds to the second wake command; sending, to the speech-processing system, second data representing a change to a device-specific setting of the device to associate the first gesture with the second CPS; receiving, from the speech-processing system, a second indication that the device-specific setting has been updated; configuring the data stored by the device to include an association between the second wake command and the second CPS; detecting, by the device, the first gesture; rece
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Execution procedure of a spoken command · CPC title
Gesture based interaction, e.g. based on a set of recognized hand gestures (interaction based on gestures traced on a digitiser G06F3/04883) · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.