Voice detection by multiple devices
US-11664023-B2 · May 30, 2023 · US
US2024029731A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024029731-A1 |
| Application number | US-202318323726-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 25, 2023 |
| Priority date | Jul 15, 2016 |
| Publication date | Jan 25, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are example techniques for voice detection by multiple NMDs. An example implementation may involve one or more servers receiving, via a network interface, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of the multiple NMDs, wherein the voice input comprises a detected wake-word. Based on respective sound pressure levels of the multiple audio recordings of the voice input, the servers (i) select a particular NMD of the multiple NMDs and (ii) forego selection of other NMDs of the multiple NMDs. The servers send, via the network interface to the particular NMD, data representing a playback command that corresponds to a voice command in the voice input represented in the multiple audio recordings, wherein the data representing the playback command causes the particular NMD to play back audio content according to the playback command.
Opening claim text (preview).
1 . A system comprising: a first microphone array; a second microphone array; at least one processor; and at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the system is configured to: receive first voice data captured via the first microphone array, the first voice data representing a first portion of a voice input; receive second voice data captured via the second microphone array, the second voice data representing the first portion of the voice input; based on (i) one or more characteristics of the first voice data and (ii) one or more characteristics of the second voice data, select the first voice data from among (a) the first voice data and (b) the second voice data; receive third voice data captured via the first microphone array, the third voice data representing a second portion of the voice input; receive fourth voice data captured via the second microphone array, the fourth voice data representing the second portion of the voice input; based on (i) one or more characteristics of the third voice data and (ii) one or more characteristics of the fourth voice data, select the fourth voice data from among (a) the third voice data and (b) the fourth voice data; and send the selected first voice data and the selected fourth voice data to a voice assistant for processing of the voice input. 2 . The system of claim 1 , further comprising a processing network microphone device (NMD), and wherein the processing NMD comprises the at least one processor and the at least one non-transitory computer-readable medium. 3 . The system of claim 2 , wherein the at least one non-transitory computer-readable medium further comprises program instructions that are executable by the at least one processor such that the system is configured to: determine, via the voice assistant, one or more commands represented in the voice input; and cause one or more devices to carry out the determined one or more commands. 4 . The system of claim 1 , wherein the program instructions that are executable by the at least one processor such that the system is configured to send the selected first voice data and the selected fourth voice data to the voice assistant for processing of the voice input comprise program instructions that are executable by the at least one processor such that the system is configured to: send, via a network interface to at least one server comprising the voice assistant, data representing the selected first voice data and the selected fourth voice data. 5 . The system of claim 1 , wherein the program instructions that are executable by the at least one processor such that the system is configured to receive the second voice data captured via the second microphone array comprise program instructions that are executable by the at least one processor such that the system is configured to: receive, via an 802.15-compatible wireless network interface, the second voice data. 6 . The system of claim 1 , wherein the one or more characteristics of the first voice data comprise sound pressure levels of the first portion of the voice input as detected by the first microphone array, wherein the one or more characteristics of the second voice data comprise sound pressure levels of the first portion of the voice input as detected by the second microphone array, and wherein the program instructions that are executable by the at least one processor such that the system is configured to select the first voice data comprise program instructions that are executable by the at least one processor such that the system is configured to: determine that the sound pressure levels of the first portion of the voice input as detected by the first microphone array are greater than then the sound pressure levels of the first portion of the voice input as detected by the second microphone array. 7 . The system of claim 1 , wherein the first portion of the voice input and the second portion of the voice input at least partially overlap. 8 . The system of claim 1 , wherein a wearable playback device comprises the first microphone array and the second microphone array. 9 . The system of claim 1 , wherein a first wearable playback device comprises the first microphone array and a second wearable playback device comprises the second microphone array. 10 . At least one non-transitory computer-readable medium comprising program instructions that are executable by at least one processor such that a device is configured to: receive first voice data captured via a first microphone array of a playback device, the first voice data representing a first portion of a voice input; receive second voice data captured via a second microphone array of the playback device, the second voice data representing the first portion of the voice input; based on (i) one or more characteristics of the first voice data and (ii) one or more characteristics of the second voice data, select the first voice data from among (a) the first voice data and (b) the second voice data; receive third voice data captured via the first microphone array, the third voice data representing a second portion of the voice input; receive fourth voice data captured via the second microphone array, the fourth voice data representing the second portion of the voice input; based on (i) one or more characteristics of the third voice data and (ii) one or more characteristics of the fourth voice data, select the fourth voice data from among (a) the third voice data and (b) the fourth voice data; and send the selected first voice data and the selected fourth voice data to a voice assistant for processing of the voice input. 11 . The at least one non-transitory computer-readable medium of claim 10 , wherein the at least one non-transitory computer-readable medium further comprises program instructions that are executable by the at least one processor such that the device is configured to: determine, via the voice assistant, one or more commands represented in the voice input; and cause one or more devices to carry out the determined one or more commands. 12 . The at least one non-transitory computer-readable medium of claim 10 , wherein the program instructions that are executable by the at least one processor such that the device is configured to send the selected first voice data and the selected fourth voice data to the voice assistant for processing of the voice input comprise program instructions that are executable by the at least one processor such that the device is configured to: send, via a network interface to at least one server comprising the voice assistant, data representing the selected first voice data and the selected fourth voice data. 13 . The at least one non-transitory computer-readable medium of claim 10 , wherein the program instructions that are executable by the at least one processor such that the device is configured to receive the second voice data captured via the second microphone array comprise program instructions that are executable by the at least one processor such that the device is configured to: receive, via an 802.15-compatible wireless network interface, the second voice data. 14 . The at least one non-transitory computer-readable medium of claim 10 , wherein the one or more characteristics of the first voice data comprise sound pressure levels of the first portion of the voice input as detected by the first microphone array, wherein the one or more characteristics of the second voice data comprise sound pressure levels of the first portion of the voice input
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title
Execution procedure of a spoken command · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.