Speech-based attention span for voice user interface

US10847149B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10847149-B1
Application numberUS-201715694292-A
CountryUS
Kind codeB1
Filing dateSep 1, 2017
Priority dateSep 1, 2017
Publication dateNov 24, 2020
Grant dateNov 24, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for enabling a device to send to a speech processing server further input audio data following a completed utterance dialog to prevent the need for subsequent keywords to be spoken to invoke subsequent commands are described. A system receives input audio data corresponding to an utterance from a device upon the device detecting speech corresponding to a keyword. The system performs speech processing on the input audio data to determine a command. The system determines output data responsive to the command and sends same to the device, thus completing operations regarding the utterance. The system may also send an instruction to the device to: send to the system further input audio data corresponding to further input audio without the device first detecting a wake command.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, from a first device, first input audio data corresponding to a first utterance; performing, by at least one second device, speech processing on the first input audio data to determine a first command; determining, by the at least one second device, the first input audio data is sufficient to complete processing of the first command; determining, by the at least one second device, first output data responsive to the first command; sending, to the first device, the first output data; determining, by the at least one second device, that the first command corresponds to a command type that is likely to be followed by a new command within a time threshold; instructing, based at least in part on the command type of the first command, the first device to send further input audio data corresponding to further input audio without the first device determining a presence of a keyword in the further input audio data; receiving, from the first device, second input audio data; determining, by the at least one second device, the second input audio data corresponds to a second utterance intended for speech processing; performing, by the at least one second device, speech processing on the second input audio data to determine a second command; determining, by the at least one second device, second output data responsive to the second command; and sending, to the first device, the second output data. 2. The computer-implemented method of claim 1 , further comprising: receiving, from the first device, third input audio data corresponding to a third utterance; performing speech processing on the third input audio data to determine a third command requesting playback of media content including audio data; determining third output data responsive to the third command; sending, to the first device, the third output data; and send an instruction to the first device to cease sending non-keyword triggered audio data. 3. The computer-implemented method of claim 1 , further comprising: determining a profile associated with the first device is associated with an indicator indicating permission for the first device to send audio data without first detecting a wakeword; determining a second device associated with the first output data; receiving the first output data from the second device; and determining that no further output data is required from the second device to respond to the first command, wherein the instructing the first device occurs after determining that no further output data is required from the second device. 4. The computer-implemented method of claim 1 , wherein determining, by the at least one second device, the second input audio data corresponds to a second utterance intended for speech processing comprises: processing the second input audio data to determine the second input audio data represents speech; and after determining that the second input audio data represents speech, determining, using a trained model and at least a portion of automatic speech recognition (ASR) result data corresponding to the second input audio data, that the second input audio data corresponds to speech intended for further processing. 5. A system comprising: at least one processor; and at least one memory including instructions that, when executed by the at least one processor, cause the system to: receive, from at least one first device, input audio data representing an utterance; perform, by at least one second device, speech processing on the input audio data to determine command data; determine, by the at least one second device, that the command data is sufficient input data to generate output data to respond to the utterance; send, to the at least one first device, output data responsive to the command data; determine, by the at least one second device, that the command data corresponds to a command type that is likely to be followed by a new command within a time threshold; and send, based at least in part on the command type, to the at least one first device, an instruction to send further input audio data corresponding to further input audio without the at least one first device detecting a wake command. 6. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive, from the at least one first device, second input audio data corresponding to a second utterance; perform speech processing on the second input audio data to determine a second command to cancel output of content; and determine, based on the second command, to instruct the at least one first device to cease sending non-wake command triggered audio data. 7. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive, from the at least one first device, second input audio data corresponding to a second utterance; perform speech recognition processing on the second input audio data to generate input text data and a speech recognition score; perform natural language processing on the input text data to determine a second command and a natural language score; determine at least one of the speech recognition score or the natural language score falls below a threshold speech processing score; and determine, based on at least one of the speech recognition score or the natural language score falling below a threshold speech processing score, to instruct the at least one first device to cease sending non-wake command triggered audio data. 8. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive, from the at least one first device, second input audio data corresponding to a second utterance; perform speech processing on the second input audio data to determine a second command requesting playback of media content including audio data; determine second output data responsive to the second command; send, to the at least one first device, the second output data; and instructing the at least one first device to cease sending non-wake command triggered audio data. 9. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine a profile includes a user preference indicating permission to send non-wake command triggered audio data; and send, based on the user preference and the input audio data corresponding to the command data, the instruction to the at least one first device. 10. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine at least one third device associated with the output data; receive the output data from the at least one third device; and determine that no further output data is required from the at least one third device to respond to the utterance, wherein sending the instruction to the at least one first device occurs after determining that no further output data is required from the at least one third device. 11. The system of claim 5 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine a number of times the at least one first device has been instructed to send non-wake command triggered audio, the number of times corresponding to consecutive input commands; determine the number of times fails to exceed a threshold number of times; and further based on the number of times failing to exceed the threshold number of times, instruct the at least one first device to send

Assignees

Inventors

Classifications

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Execution procedure of a spoken command · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Parsing for meaning understanding · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10847149B1 cover?
Techniques for enabling a device to send to a speech processing server further input audio data following a completed utterance dialog to prevent the need for subsequent keywords to be spoken to invoke subsequent commands are described. A system receives input audio data corresponding to an utterance from a device upon the device detecting speech corresponding to a keyword. The system performs …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 24 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).