Context-based device arbitration

US10546583B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10546583-B2
Application numberUS-201715691460-A
CountryUS
Kind codeB2
Filing dateAug 30, 2017
Priority dateAug 30, 2017
Publication dateJan 28, 2020
Grant dateJan 28, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure describes, in part, context-based device arbitration techniques to select a voice-enabled device from multiple voice-enabled devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include determining a ranked list of voice-enabled devices that are ranked based on audio signal metric values for audio signals generated by each voice-enabled device, and iteratively moving through the list to determine, based on device states of the voice-enabled devices, whether one of the voice-enabled devices can perform an action responsive to the command. If the voice-enabled devices that detected the speech utterance are unable to perform the action responsive to the command, all other voice-enabled devices associated with an account may be analyzed to determine whether one of the other voice-enabled devices can perform the action responsive to the command in the speech utterance.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more processors; computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving, from a first voice-enabled device, first audio data representing a speech utterance; receiving, from the first voice-enabled device, a first audio signal metric value indicating a first signal-to-noise ratio associated with the first audio data; receiving, from a second voice-enabled device, second audio data representing the speech utterance; receiving, from the second voice-enabled device, a second audio signal metric value indicating a second signal-to-noise ratio associated with the second audio data; determining that the first signal-to-noise ratio is greater than the second signal-to-noise ratio; identifying device state data associated with the first voice-enabled device; generating, using automatic speech recognition (ASR) on at least one of the first audio data or the second audio data, text data corresponding to the speech utterance; determining, using natural language understanding (NLU) on the text data, intent data associated with the speech utterance, the intent data representing a request for a client device to perform an action; determining, based at least in part on the device state data, that the first voice-enabled device is capable of performing the action responsive to the speech utterance; determining a command to cause the first voice-enabled device to perform the action; and sending, to the first voice-enabled device, data indicating the command. 2. The system of claim 1 , the operations further comprising causing the second voice-enabled device to stop transmitting the second audio data, the second voice-enabled device being stopped from transmitting the second audio data prior to the first voice-enabled device stopping transmitting the first audio data, wherein generating the text data is performed using ASR on the first audio data. 3. The system of claim 1 , the operations further comprising: determining that the first voice-enabled device is included in a stored grouping of devices that includes the first voice-enabled device and a third voice-enabled device; identifying device state data associated with the stored grouping of devices; and determining that the stored grouping of devices is capable of performing the action responsive to the speech utterance. 4. The system of claim 1 , wherein identifying the device state data associated with the first voice-enabled device comprises: sending a request to an event component to provide an indication of the device state data associated with the first voice-enabled device; and receiving, from the event component, the device state data. 5. A system comprising: one or more processors; computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving a first device identifier of a first device; receiving first audio data associated with the first device identifier, the first audio data representing a speech utterance; receiving a second device identifier of a second device; receiving second audio data associated with the second device identifier, the second audio data representing the speech utterance; determining intent data associated with the speech utterance, the intent data representing a machine response for a device to perform responsive to the speech utterance; identifying first device state data associated with the first device; identifying second device state data associated with the second device; based at least in part on the second device state data, determining the second device is to be used for the machine response; determining command data to cause the second device to perform the machine response; and sending, to the second device, the command data to perform the machine response. 6. The system of claim 5 , further comprising determining, based on the first device state data, that the first device is offline. 7. The system of claim 5 , the operations further comprising: determining that the first device is included in a stored grouping of devices that includes the first device and a third device; identifying device state data associated with the stored grouping of devices; and determining, based on the device state data associated with the stored grouping of devices, that the stored grouping of devices is offline. 8. The system of claim 5 , the operations further comprising: determining that the first device is associated with a secondary device; identifying third device state data associated with the secondary device; and determining, based on the third device state data, that the secondary device is offline. 9. The system of claim 5 , the operations further comprising: determining, based on the first device state data, that the first device is offline; and storing an indication that the second device is to perform the machine response. 10. The system of claim 5 , the operations further comprising receiving an indication that the first device is ranked higher than the second device based at least in part on a first audio signal metric associated with the first audio data and a second audio signal metric associated with the second audio data. 11. The system of claim 10 , wherein: the first audio signal metric associated with the first audio data comprises at least one of: a first signal-to-noise value of the first audio data; a first amplitude of the first audio data; or a first level of voice activity in the first audio data; and the second audio signal metric associated with the second audio data comprises at least one of: a second signal-to-noise value of the second audio data; a second amplitude of the second audio data; or a second level of voice activity in the second audio data. 12. The system of claim 5 , the operations further comprising receiving an indication that the first device is ranked higher than the second device, wherein the first device and the second device are ranked based on one or more of: input received via an input control of the first device; a distance of a user to the first device; or image data indicating that the user is at least partially facing the first device. 13. A method comprising: receiving first audio data associated with a first device, the first audio data representing a speech utterance; receiving second audio data associated with a second device, the second audio data representing the speech utterance; identifying first device state data associated with the first device; identifying second device state data associated with the second device; determining intent data associated with the speech utterance, the intent data representing a machine response for a device to perform responsive to the speech utterance; based at least in part on the second device state data, determining the second device is to be used for the machine response; determining command data to cause the second device to perform the machine response; and sending, to the second device, the command data to perform the machine response. 14. The method of claim 13 , further comprising determining, based on the first device state data, that the first device is offline. 15. The method of claim 13 , further comprising: determining that the first device is included in a stored grouping of devices that includes the first device and a third device; id

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Execution procedure of a spoken command · CPC title

  • for discriminating voice from noise · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Constructional details of speech recognition systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10546583B2 cover?
This disclosure describes, in part, context-based device arbitration techniques to select a voice-enabled device from multiple voice-enabled devices to provide a response to a command included in a speech utterance of a user. In some examples, the context-driven arbitration techniques may include determining a ranked list of voice-enabled devices that are ranked based on audio signal metric val…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 28 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).