Implicit target selection for multiple audio playback devices in an environment

US10839795B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10839795-B2
Application numberUS-201715433953-A
CountryUS
Kind codeB2
Filing dateFeb 15, 2017
Priority dateFeb 15, 2017
Publication dateNov 17, 2020
Grant dateNov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A user can utter a voice command in an environment where multiple audio playback devices are located to have audio output on a single device, or a predefined group of devices in a synchronized manner. In instances when the voice command uttered by the user does not specify a target for audio output, an implicit target selection algorithm can evaluate one or more criteria to determine an appropriate target for output of the audio corresponding to the voice command. An example criterion is met if a predetermined time period has lapsed since a last utterance was detected by a device in the environment. However, other criteria can be evaluated for determining a target output device(s).

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: determining, by a speech processing system and based at least in part on first speech data received from a first audio playback device in an environment, first audio content and a group of devices that is to output the first audio content, the group of devices including the first audio playback device and a second audio playback device; sending first audio data to the first audio playback device for synchronized output of the first audio content by the first audio playback device and the second audio playback device; determining, by the speech processing system and based at least in part on second speech data received from the first audio playback device, second audio content; determining, by the speech processing system, that the second speech data omits to target a specific output device; determining that a predetermined time period has not lapsed since the first speech data was received at a time of, or before, receipt of the second speech data; selecting the group of devices for output of the second audio content based at least in part on the predetermined time period having not lapsed at the time; and sending second audio data to the first audio playback device for synchronized output of the second audio content by the first audio playback device and the second audio playback device. 2. The method of claim 1 , further comprising: determining, by the speech processing system and based at least in part on third speech data received from the first audio playback device, third audio content; determining, by the speech processing system, that the third speech data omits to target a specific output device; determining that the predetermined time period has lapsed at a second time of, or before, receipt of the third speech data; selecting, based at least in part on the predetermined time period having lapsed at the second time, the first audio playback device for output of the third audio content; and sending third audio data to the first audio playback device for output of the third audio content by the first audio playback device. 3. The method of claim 2 , further comprising: determining that the synchronized output of the second audio content has stopped, wherein determining that the predetermined time period has lapsed at the second time comprises determining that the predetermined time period has lapsed since the synchronized output of the second audio content stopped. 4. The method of claim 2 , wherein selecting the first audio playback device for the output of the third audio content is further based on a stored preference specifying the first audio playback device as a preferred output device in response to determining that the predetermined time period has lapsed. 5. The method of claim 1 , further comprising: determining, based at least in part on third speech data received from an audio playback device among the first audio playback device and the second audio playback device, that the third speech data omits to target a specific output device; determining that the predetermined time period has not lapsed at a second time of receipt of the third speech data; generating a text-to-speech (TTS) output based at least in part on the third speech data; determining, based at least in part on generating the TTS output, that the audio playback device is to output the TTS output; and sending the TTS output to the audio playback device for output of the TTS output by the audio playback device. 6. The method of claim 1 , further comprising selecting the predetermined time period from among multiple different time periods based at least in part on the first speech data having been received from the first audio playback device. 7. A method comprising: determining, based at least in part on first speech data received from a first audio playback device, first audio content and a group of output devices including the first audio playback device and a second audio playback device; sending first audio data, to at least one of the first audio playback device or the second audio playback device, for synchronized output of the first audio content by the first audio playback device and the second audio playback device; determining, based at least in part on second speech data received from at least one of the first audio playback device or the second audio playback device, second audio content; determining that the second speech data omits a specific output device; determining that a predetermined time period has lapsed since the first speech data was received at a time of, or before, receipt of the second speech data; selecting an audio playback device among the first audio playback device and the second audio playback device for output of the second audio content based at least in part on the predetermined time period having lapsed at the time; and sending second audio data to the audio playback device for output of the second audio content by the audio playback device. 8. The method of claim 7 , further comprising selecting the predetermined time period from among multiple different time periods based at least in part on the first speech data having been received from the first audio playback device. 9. The method of claim 7 , wherein selecting the audio playback device is further based on a stored preference specifying the audio playback device as a preferred output device. 10. The method of claim 7 , further comprising determining that an additional criterion is met at the time, wherein determining that the additional criterion is met comprises determining that the first audio content is not being output by the first audio playback device and the second audio playback device in a synchronized manner at the time, and wherein selecting the audio playback device for output of the second audio content is further based on the additional criterion being met at the time. 11. The method of claim 7 , further comprising determining that an additional criterion is met at the time, wherein determining that the additional criterion is met comprises determining that a command based on the second speech data is not associated with a category of music-related commands, and wherein selecting the audio playback device for output of the second audio content is further based on the additional criterion being met at the time. 12. The method of claim 7 , wherein the audio playback device selected for output of the second audio content comprises the first audio playback device, the method further comprising: determining, based at least in part on third speech data received from the second audio playback device, that the third speech data omits a specific output device; determining that the predetermined time period has not lapsed since the second speech data was received at a second time of, or before, receipt of the third speech data; generating a text-to-speech (TTS) output based at least in part on the third speech data; determining, based at least in part on generating the TTS output, that the second audio playback device is to output the TTS output; and sending the TTS output to the second audio playback device for output of the TTS output by the second audio playback device. 13. The method of claim 7 , further comprising: determining, based at least in part on third speech data received from at least one audio playback device among the first audio playback device and the second audio playback device, third audio content and the group of output devices; and sending third audio data, to at least one of the first audio playback device or the second audio playback device, for synchronized output of the t

Assignees

Inventors

Classifications

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • using context dependencies, e.g. language models · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Execution procedure of a spoken command · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10839795B2 cover?
A user can utter a voice command in an environment where multiple audio playback devices are located to have audio output on a single device, or a predefined group of devices in a synchronized manner. In instances when the voice command uttered by the user does not specify a target for audio output, an implicit target selection algorithm can evaluate one or more criteria to determine an appropr…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).