Contextual voice user interface

US10446147B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10446147-B1
Application numberUS-201715634780-A
CountryUS
Kind codeB1
Filing dateJun 27, 2017
Priority dateJun 27, 2017
Publication dateOct 15, 2019
Grant dateOct 15, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for providing a contextual voice user interface that enables a user to query a speech processing system with respect to the decisions made to answer the user's command are described. The speech processing system may store speech processing pipeline data used to process a command. At some point after the system outputs content deemed responsive to the command, a user may speak an utterance corresponding to an inquiry with respect to the processing performed to respond to the command. For example, the user may state “why did you tell me that?” In response thereto, the speech processing system may determine the stored speech processing pipeline data used to respond to the command, and may generate output audio data that describes the data and computing decisions involved in determining the content deemed responsive to the command.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: during a first time period at one or more remote devices: receiving, from a device, first input audio data corresponding to a first utterance; generating an identifier; associating the identifier with the first utterance; performing speech recognition processing on the first input audio data to generate first text data; associating the first text data with the identifier; performing natural language processing on the first text data to determine a first intent corresponding to the first utterance; associating the first intent with the identifier; performing natural language processing on the first text data to determine at least a portion of the first text data that potentially corresponds to an entity; associating, with the identifier, the at least a portion of the first text data and an indication of the entity; determining an application associated with the first intent; associating application data representing the application with the identifier; sending, to a remote device associated with the application, a signal requesting content responsive to the first utterance; receiving, from the remote device, content data representing the content; and causing the device to emit the content data; and during a second time period subsequent to the first time period at the one or more remote devices: receiving, from the device, second input audio data corresponding to a second utterance; performing speech recognition processing on the second input audio data to generate second text data; performing natural language processing on the second text data to determine a second intent corresponding to the second utterance, the second intent being to determine an explanation for processing of the first utterance and to receive previous speech processing results corresponding to the first utterance; determining the identifier associated with the first utterance; determining, based on the identifier, at least one of the first text data, the first intent, the at least a portion of the first text data, the indication of the entity, or the application data; determining an output data format associated with the second intent; generating output data using the output data format, wherein the output data includes the first text data and at least one of the first intent, the indication of the entity, or the application data with at least a first portion of the output data format; and sending the output data to the device. 2. The computer-implemented method of claim 1 , further comprising: receiving, from the device, third input audio data corresponding to a third utterance; performing speech recognition processing on the third input audio data to generate third text data; performing natural language processing on the third text data to determine a third intent corresponding to the third utterance, the third intent being to receive speech processing results corresponding to a fourth utterance; determining a second identifier associated with the fourth utterance; determining speech processing data associated with the second identifier; determining the output data format is associated with the third intent; generating second output data using the output data format, wherein the second output data includes at least a portion of the speech processing data; and sending the second output data to the device. 3. The computer-implemented method of claim 2 , further comprising: determining the third text data includes an indication of a time period when the fourth utterance was spoken; determining, in a profile associated with the third utterance, the time period; and determining the second identifier is associated with the time period in the profile. 4. A system comprising: at least one processor; and at least one memory including instructions that, when executed by the at least one processor, cause the system to: perform natural language processing on input text data representative of input from a user device to determine an intent of a current user input; determine the intent is to receive an explanation of a previous output corresponding to a previous user input and receive previous speech processing results corresponding to the previous user input; determine an identifier associated with the previous user input; determine previous speech recognition results associated with the identifier; determine previous natural language processing results associated with the identifier; determine an output format associated with the intent; generate output data using the output format, wherein the output data includes a portion of the input text data and at least one of: at least a portion of the previous speech recognition results or at least a portion of the previous natural language processing results; and send the output data to a first device associated with the input text data. 5. The system of claim 4 , wherein: the previous speech recognition results include text data output based on the previous user input, and the previous natural language processing results include at least one of a previous intent determined based on processing of the previous user input, or application data representing an application associated with the previous intent. 6. The system of claim 4 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive input audio data corresponding to the previous user input; generate the identifier; perform speech recognition processing on the input audio data to generate second input text data; associate the second input text data with the identifier; perform natural language processing on the second input text data to determine a previous intent; associate the previous intent with the identifier; determine an application associated with the previous intent; associate application data representing the application with the identifier; send, to a remote device associated with the application, a signal requesting content responsive to the previous user input; receive, from the remote device, content data representing the content; and cause the first device to emit the content data. 7. The system of claim 6 , wherein the instructions, when executed by the at least one processor, further cause the system to: associate, after determining the application, the second input text data with the identifier and the previous intent with the identifier. 8. The system of claim 4 , wherein the previous user input is a previously spoken utterance. 9. The system of claim 4 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine context data representing at least one of a geographic location of a user or a timestamp corresponding to when audio corresponding to the previous user input was received; and associate the context data with the identifier, wherein the output data further includes at least a portion of the context data. 10. The system of claim 4 , wherein the instructions, when executed by the at least one processor, further cause the system to: determine a user corresponding to the input text data is an application developer; and determine, based on the user being an application developer, a second output format. 11. The system of claim 4 , wherein the instructions, when executed by the at least one processor, further cause the system to: receive input audio data corresponding to a second utterance; perform speech processing on the input audio data to determine a second intent to determine a new content source for the previous user input; and associate the pr

Assignees

Inventors

Classifications

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • G06F40/205Primary

    Parsing · CPC title

  • Semantic analysis · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10446147B1 cover?
Techniques for providing a contextual voice user interface that enables a user to query a speech processing system with respect to the decisions made to answer the user's command are described. The speech processing system may store speech processing pipeline data used to process a command. At some point after the system outputs content deemed responsive to the command, a user may speak an utte…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/205. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 15 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).