Voice Detection By Multiple Devices

US2024029731A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024029731-A1
Application numberUS-202318323726-A
CountryUS
Kind codeA1
Filing dateMay 25, 2023
Priority dateJul 15, 2016
Publication dateJan 25, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are example techniques for voice detection by multiple NMDs. An example implementation may involve one or more servers receiving, via a network interface, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of the multiple NMDs, wherein the voice input comprises a detected wake-word. Based on respective sound pressure levels of the multiple audio recordings of the voice input, the servers (i) select a particular NMD of the multiple NMDs and (ii) forego selection of other NMDs of the multiple NMDs. The servers send, via the network interface to the particular NMD, data representing a playback command that corresponds to a voice command in the voice input represented in the multiple audio recordings, wherein the data representing the playback command causes the particular NMD to play back audio content according to the playback command.

First claim

Opening claim text (preview).

1 . A system comprising: a first microphone array; a second microphone array; at least one processor; and at least one non-transitory computer-readable medium comprising program instructions that are executable by the at least one processor such that the system is configured to: receive first voice data captured via the first microphone array, the first voice data representing a first portion of a voice input; receive second voice data captured via the second microphone array, the second voice data representing the first portion of the voice input; based on (i) one or more characteristics of the first voice data and (ii) one or more characteristics of the second voice data, select the first voice data from among (a) the first voice data and (b) the second voice data; receive third voice data captured via the first microphone array, the third voice data representing a second portion of the voice input; receive fourth voice data captured via the second microphone array, the fourth voice data representing the second portion of the voice input; based on (i) one or more characteristics of the third voice data and (ii) one or more characteristics of the fourth voice data, select the fourth voice data from among (a) the third voice data and (b) the fourth voice data; and send the selected first voice data and the selected fourth voice data to a voice assistant for processing of the voice input. 2 . The system of claim 1 , further comprising a processing network microphone device (NMD), and wherein the processing NMD comprises the at least one processor and the at least one non-transitory computer-readable medium. 3 . The system of claim 2 , wherein the at least one non-transitory computer-readable medium further comprises program instructions that are executable by the at least one processor such that the system is configured to: determine, via the voice assistant, one or more commands represented in the voice input; and cause one or more devices to carry out the determined one or more commands. 4 . The system of claim 1 , wherein the program instructions that are executable by the at least one processor such that the system is configured to send the selected first voice data and the selected fourth voice data to the voice assistant for processing of the voice input comprise program instructions that are executable by the at least one processor such that the system is configured to: send, via a network interface to at least one server comprising the voice assistant, data representing the selected first voice data and the selected fourth voice data. 5 . The system of claim 1 , wherein the program instructions that are executable by the at least one processor such that the system is configured to receive the second voice data captured via the second microphone array comprise program instructions that are executable by the at least one processor such that the system is configured to: receive, via an 802.15-compatible wireless network interface, the second voice data. 6 . The system of claim 1 , wherein the one or more characteristics of the first voice data comprise sound pressure levels of the first portion of the voice input as detected by the first microphone array, wherein the one or more characteristics of the second voice data comprise sound pressure levels of the first portion of the voice input as detected by the second microphone array, and wherein the program instructions that are executable by the at least one processor such that the system is configured to select the first voice data comprise program instructions that are executable by the at least one processor such that the system is configured to: determine that the sound pressure levels of the first portion of the voice input as detected by the first microphone array are greater than then the sound pressure levels of the first portion of the voice input as detected by the second microphone array. 7 . The system of claim 1 , wherein the first portion of the voice input and the second portion of the voice input at least partially overlap. 8 . The system of claim 1 , wherein a wearable playback device comprises the first microphone array and the second microphone array. 9 . The system of claim 1 , wherein a first wearable playback device comprises the first microphone array and a second wearable playback device comprises the second microphone array. 10 . At least one non-transitory computer-readable medium comprising program instructions that are executable by at least one processor such that a device is configured to: receive first voice data captured via a first microphone array of a playback device, the first voice data representing a first portion of a voice input; receive second voice data captured via a second microphone array of the playback device, the second voice data representing the first portion of the voice input; based on (i) one or more characteristics of the first voice data and (ii) one or more characteristics of the second voice data, select the first voice data from among (a) the first voice data and (b) the second voice data; receive third voice data captured via the first microphone array, the third voice data representing a second portion of the voice input; receive fourth voice data captured via the second microphone array, the fourth voice data representing the second portion of the voice input; based on (i) one or more characteristics of the third voice data and (ii) one or more characteristics of the fourth voice data, select the fourth voice data from among (a) the third voice data and (b) the fourth voice data; and send the selected first voice data and the selected fourth voice data to a voice assistant for processing of the voice input. 11 . The at least one non-transitory computer-readable medium of claim 10 , wherein the at least one non-transitory computer-readable medium further comprises program instructions that are executable by the at least one processor such that the device is configured to: determine, via the voice assistant, one or more commands represented in the voice input; and cause one or more devices to carry out the determined one or more commands. 12 . The at least one non-transitory computer-readable medium of claim 10 , wherein the program instructions that are executable by the at least one processor such that the device is configured to send the selected first voice data and the selected fourth voice data to the voice assistant for processing of the voice input comprise program instructions that are executable by the at least one processor such that the device is configured to: send, via a network interface to at least one server comprising the voice assistant, data representing the selected first voice data and the selected fourth voice data. 13 . The at least one non-transitory computer-readable medium of claim 10 , wherein the program instructions that are executable by the at least one processor such that the device is configured to receive the second voice data captured via the second microphone array comprise program instructions that are executable by the at least one processor such that the device is configured to: receive, via an 802.15-compatible wireless network interface, the second voice data. 14 . The at least one non-transitory computer-readable medium of claim 10 , wherein the one or more characteristics of the first voice data comprise sound pressure levels of the first portion of the voice input as detected by the first microphone array, wherein the one or more characteristics of the second voice data comprise sound pressure levels of the first portion of the voice input

Assignees

Inventors

Classifications

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing · CPC title

  • Execution procedure of a spoken command · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024029731A1 cover?
Disclosed herein are example techniques for voice detection by multiple NMDs. An example implementation may involve one or more servers receiving, via a network interface, data representing multiple audio recordings of a voice input spoken by a given user, each audio recording recorded by a respective NMD of the multiple NMDs, wherein the voice input comprises a detected wake-word. Based on res…
Who is the assignee on this patent?
Sonos Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).