Extending the period of voice recognition
US-2017169817-A1 · Jun 15, 2017 · US
US10847149B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10847149-B1 |
| Application number | US-201715694292-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 1, 2017 |
| Priority date | Sep 1, 2017 |
| Publication date | Nov 24, 2020 |
| Grant date | Nov 24, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for enabling a device to send to a speech processing server further input audio data following a completed utterance dialog to prevent the need for subsequent keywords to be spoken to invoke subsequent commands are described. A system receives input audio data corresponding to an utterance from a device upon the device detecting speech corresponding to a keyword. The system performs speech processing on the input audio data to determine a command. The system determines output data responsive to the command and sends same to the device, thus completing operations regarding the utterance. The system may also send an instruction to the device to: send to the system further input audio data corresponding to further input audio without the device first detecting a wake command.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, from a first device, first input audio data corresponding to a first utterance; performing, by at least one second device, speech processing on the first input audio data to determine a first command; determining, by the at least one second device, the first input audio data is sufficient to complete processing of the first command; determining, by the at least one second device, first output data responsive to the first command; sending, to the first device, the first output data; determining, by the at least one second device, that the first command corresponds to a command type that is likely to be followed by a new command within a time threshold; instructing, based at least in part on the command type of the first command, the first device to send further input audio data corresponding to further input audio without the first device determining a presence of a keyword in the further input audio data; receiving, from the first device, second input audio data; determining, by the at least one second device, the second input audio data corresponds to a second utterance intended for speech processing; performing, by the at least one second device, speech processing on the second input audio data to determine a second command; determining, by the at least one second device, second output data responsive to the second command; and sending, to the first device, the second output data. 2. The computer-implemented method of claim 1 , further comprising: receiving, from the first device, third input audio data corresponding to a third utterance; performing speech processing on the third input audio data to determine a third command requesting playback of media content including audio data; determining third output data responsive to the third command; sending, to the first device, the third output data; and send an instruction to the first device to cease sending non-keyword triggered audio data. 3. The computer-implemented method of claim 1 , further comprising: determining a profile associated with the first device is associated with an indicator indicating permission for the first device to send audio data without first detecting a wakeword; determining a second device associated with the first output data; receiving the first output data from the second device; and determining that no further output data is required from the second device to respond to the first command, wherein the instructing the first device occurs after determining that no further output data is required from the second device. 4. The computer-implemented method of claim 1 , wherein determining, by the at least one second device, the second input audio data corresponds to a second utterance intended for speech processing comprises: processing the second input audio data to determine the second input audio data represents speech; and after determining that the second input audio data represents speech, determining, using a trained model and at least a portion of automatic speech recognition (ASR) result data corresponding to the second input audio data, that the second input audio data corresponds to speech intended for further processing. 5. A system comprising: at least one processor; and at least one memory including instructions that, when executed by the at least one processor, cause the system to: receive, from at least one first device, input audio data representing an utterance; perform, by at least one second device, speech processing on the input audio data to determine command data; determine, by the at least one second device, that the command data is sufficient input data to generate output data to respond to the utterance; send, to the at least one first device, output data responsive to the command data; determine, by the at least one second device, that the command data corresponds to a command type that is likely to be followed by a new command within a time threshold; and send, based at least in part on the command type, to the at least one first device, an instruction to send further input audio data corresponding to further input audio without the at least one first device detecting a wake command. 6. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive, from the at least one first device, second input audio data corresponding to a second utterance; perform speech processing on the second input audio data to determine a second command to cancel output of content; and determine, based on the second command, to instruct the at least one first device to cease sending non-wake command triggered audio data. 7. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive, from the at least one first device, second input audio data corresponding to a second utterance; perform speech recognition processing on the second input audio data to generate input text data and a speech recognition score; perform natural language processing on the input text data to determine a second command and a natural language score; determine at least one of the speech recognition score or the natural language score falls below a threshold speech processing score; and determine, based on at least one of the speech recognition score or the natural language score falling below a threshold speech processing score, to instruct the at least one first device to cease sending non-wake command triggered audio data. 8. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive, from the at least one first device, second input audio data corresponding to a second utterance; perform speech processing on the second input audio data to determine a second command requesting playback of media content including audio data; determine second output data responsive to the second command; send, to the at least one first device, the second output data; and instructing the at least one first device to cease sending non-wake command triggered audio data. 9. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine a profile includes a user preference indicating permission to send non-wake command triggered audio data; and send, based on the user preference and the input audio data corresponding to the command data, the instruction to the at least one first device. 10. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine at least one third device associated with the output data; receive the output data from the at least one third device; and determine that no further output data is required from the at least one third device to respond to the utterance, wherein sending the instruction to the at least one first device occurs after determining that no further output data is required from the at least one third device. 11. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine a number of times the at least one first device has been instructed to send non-wake command triggered audio, the number of times corresponding to consecutive input commands; determine the number of times fails to exceed a threshold number of times; and further based on the number of times failing to exceed the threshold number of times, instruct the at least one first device to send
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Execution procedure of a spoken command · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Parsing for meaning understanding · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.