Apparatus, system and method for directing voice input in a controlling device
US-2019019504-A1 · Jan 17, 2019 · US
US10685652B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-10685652-B1 |
| Application number | US-201815928682-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 22, 2018 |
| Priority date | Mar 22, 2018 |
| Publication date | Jun 16, 2020 |
| Grant date | Jun 16, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This disclosure describes, in part, techniques for determining device groupings, or clusters, for multiple voice-enabled devices. The device clusters may be determined based on metadata data for audio signals (or audio data) generated by each of the multiple voice-enabled devices. For example, a remote system may analyze timestamp data for the audio signals received from the devices, and determine that the devices detected the same voice command of a user based on the timestamp data indicating that the audio signals were received within a threshold period of time from each other. Additionally, the remote system may analyze other metadata of the audio data, such as signal-to-noise (SNR) values, and determine that the SNR values are within a threshold value. The remote system may determine device clusters for the voice-enabled devices of a user based on these, and potentially other, types of metadata of the audio signals.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, from a first device, first audio data representing first sound; receiving, from a second device, second audio data representing second sound captured by a second microphone of the second device; determining that the first audio data was received within a threshold period of time of when the second audio data was received; based at least in part on the first audio data being received within the threshold period of time of when the second audio data was received, generating an association between the first device and the second device indicating that the first device is located in a same physical environment as the second device; and storing the association indicating that the first device is in the same physical environment as the second device, wherein the association is to be used in future processing of audio data received from only the first device. 2. The method of claim 1 , further comprising: identifying a first signal-to-noise (SNR) value associated with the first audio data; identifying a second SNR value associated with the second audio data; determining that the first SNR value is greater than or equal to a threshold SNR value; determining that the second SNR value is greater than or equal to the threshold SNR value; and wherein the generating the association between the first device and the second device is further based at least in part on the first SNR value and the second SNR value being greater than or equal to the threshold SNR value. 3. The method of claim 1 , further comprising: identifying a first audio-signal metric associated with the first audio data; identifying a second audio-signal metric associated with the second audio data; determining that the first audio-signal metric is within a threshold amount to the second audio-signal metric; and wherein the generating the association between the first device and the second device is further based at least in part on the first audio-signal metric is within a threshold amount to the second audio-signal metric. 4. The method of claim 1 , further comprising: determining a number of instances where audio data was received from the first device within the threshold period of time of when audio data was received from the second device; determining that the number of instances is greater than or equal to a threshold number of instances; and wherein the generating the association between the first device and the second device is further based at least in part on the number of instances being greater than or equal to the threshold number of instances. 5. The method of claim 1 , further comprising: determining a number of instances where audio data was received from the first device within the threshold period of time of when audio data was received from the second device; identifying first signal-to-noise (SNR) values associated with the first device, wherein an SNR value of the first SNR values is associated with corresponding audio data received from the first device in the number of instances; identifying second SNR values associated with the second device, wherein an SNR value of the second SNR values is associated with corresponding audio data received from the second device in the number of instances; determining that, for more than a threshold number of the number of the instances, the first SNR values and the second SNR values are greater than or equal to a threshold SNR value; and wherein the generating the association between the first device and the second device is further based at least in part on the determining that, for more than the threshold number of the number of the instances, the first SNR values and the second SNR values are greater than or equal to the threshold SNR value. 6. The method of claim 1 , further comprising, prior to the generating the association: storing, in memory of a network-based computing device, an initial association between the first device, the second device, and a third device; determining that third audio data was not received from the third device within the threshold period of time from when at least one of the first audio data or the second audio data was received; and based at least in part on the third audio data not being received from the third device within the threshold period of time from when at least one of the first audio data or the second audio data was received, removing the initial association from the memory of the network-based computing device. 7. The method of claim 1 , further comprising: storing, in memory of one or more network-based computing devices, the association between the first device and the second device; determining metadata for the association between the first device and the second device, the metadata indicating at least one of: a device name assigned to the first device; an action previously performed by the first device; or an identity of a user that issued a voice command represented by the first audio data; and storing the metadata in the memory of the one or more network-based computing devices. 8. A system comprising: one or more processors; and computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising: receiving, from a first device, first audio data representing first sound captured by a first microphone of the first device; receiving, from a second device, second audio data representing second sound captured by a second microphone of the second device; determining that the first audio data was received within a threshold period of time of when the second audio data was received; based at least in part on the first audio data being received within the threshold period of time of when the second audio data was received, generating an association between the first device and the second device; receiving, from the first device, third audio data representing a speech utterance captured by the first microphone of the first device; determining intent data representing the speech utterance; determining, based at least in part on the association and the intent data, a command to cause the second device to perform an action; and sending, to the second device, command data indicating the command. 9. The system of claim 8 , the operations further comprising: determining a first signal-to-noise (SNR) value associated with the first audio data; determining a second SNR value associated with the second audio data; determining that the first SNR value is greater than or equal to a threshold SNR value; and determining that the second SNR value is greater than or equal to the threshold SNR value, wherein generating the association between the first device is further based at least in part on the first SNR value and second SNR value being greater than or equal to the threshold SNR value. 10. The system of claim 8 , the operations further comprising: identifying a first audio-signal metric associated with the first audio data; identifying a second audio-signal metric associated with the second audio data; and determining that the first audio-signal metric is within a threshold amount to the second audio-signal metric, wherein the generating the association between the first device and the second device is further based at least in part on the first audio-signal metric is within a threshold amount to the second audio-signal metric. 11. The system of claim 8 , the operations further comprising: determining a number of instances where audio data was received from the first device within the threshold period of tim
Speaker identification or verification techniques · CPC title
for comparison or discrimination · CPC title
the extracted parameters being power information · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.