Device grouping for audio based interactivity
US-9431021-B1 · Aug 30, 2016 · US
US11308959B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11308959-B2 |
| Application number | US-202016787993-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 11, 2020 |
| Priority date | Feb 11, 2020 |
| Publication date | Apr 19, 2022 |
| Grant date | Apr 19, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for detecting wake words. An electronic device detects an audio signal; identifies two spatial zones as first and second sources of audio associated with the audio signal; processes the audio signal at two wake word detection engines, where each detection engine is associated with a respective spatial zone; determines, based on the processing at the wake word detection engines, whether the audio signal represents a wake word for the electronic device; and in accordance with a determination that the audio signal does represent a wake word, adjusts a wake word detection threshold for at least one of the wake word detection engines.
Opening claim text (preview).
What is claimed is: 1. A method, comprising: at an electronic voice-controlled speaker device including an audio front end system having a microphone array, one or more processors, and memory storing instructions for execution by the one or more processors: detecting, from the microphone array, an audio signal in an environment proximate to the audio front end system; identifying a first spatial zone of a plurality of spatial zones in the environment as a first source of audio associated with the audio signal; assigning a first wake word detection engine of a plurality of wake word detection engines to the first spatial zone in accordance with the identifying of the first spatial zone as a first source of audio; identifying a second spatial zone of the plurality of spatial zones in the environment as a second source of audio associated with the audio signal, wherein the second spatial zone is different from the first spatial zone; assigning a second wake word detection engine of the plurality of wake word detection engines to the second spatial zone in accordance with the identifying of the second spatial zone as a second source of audio; processing the audio signal at the first wake word detection engine assigned to the first spatial zone; processing the audio signal at the second wake word detection engine assigned to the second spatial zone; determining, based on the processing at the first wake word detection engine and based on the processing at the second wake word detection engine, whether the audio signal represents a wake word for the electronic voice-controlled speaker device; and in accordance with a determination that the audio signal represents a wake word for the electronic voice-controlled speaker device, adjusting a wake word detection threshold for at least one of the first wake word detection engine and the second wake word detection engine. 2. The method of claim 1 , wherein: identifying the first spatial zone comprises aligning first component sound waves of the audio signal detected from the first source of audio; identifying the second spatial zone comprises aligning second component sound waves of the audio signal detected from the second source of audio; processing the audio signal at the first wake word detection engine comprises performing a wake word detection process on the aligned first component sound waves; and processing the audio signal at the second wake word detection engine comprises performing a wake word detection process on the aligned second component sound waves. 3. The method of claim 2 , wherein: determining whether the audio signal represents a wake word for the electronic voice-controlled speaker device comprises applying a noise cancelation process at the first wake word detection engine based on the processing of the audio signal at the second wake word detection engine. 4. The method of claim 2 , wherein: adjusting the wake word detection threshold comprises adjusting a wake word detection threshold associated with the second wake word detection engine based on a determination by the first wake word detection engine that the audio signal represents a wake word for the electronic voice-controlled speaker device. 5. The method of claim 4 , wherein: adjusting the wake word detection threshold associated with the second wake word detection engine comprises increasing the wake word detection threshold for the second wake word detection engine on a spectrum of strictness. 6. The method of claim 1 , further comprising: identifying a third spatial zone of the plurality of spatial zones in the environment as a third source of audio associated with the audio signal, wherein the third spatial zone is different from the first and second spatial zones; causing a third wake word detection engine to be made available for processing the audio signal; and assigning the third wake word detection engine to the third spatial zone. 7. The method of claim 6 , wherein causing the third wake word detection engine to be made available for processing the audio signal comprises: dynamically adjusting how many wake word detection engines are available for processing the audio signal. 8. The method of claim 1 , wherein: the wake word detection threshold is used in subsequent processing of audio signals at the first wake word detection engine, and corresponds with a probability that the audio signal represents a wake word. 9. The method of claim 1 , wherein adjusting the wake word detection threshold comprises adjusting the wake word detection threshold based on a spatial model of the environment representing locations and probabilities of wake word source zones. 10. The method of claim 9 , wherein the spatial model is based on a Bayesian inference analysis using a probability distribution to determine the probability of detecting a valid wake word. 11. The method of claim 1 , further comprising: configuring the audio front end system to detect subsequent audio signals from a direction associated with a spatial zone corresponding with the determination that the audio signal represents a wake word for the electronic voice-controlled speaker device; and maintaining the audio front end configuration until the electronic voice-controlled speaker device receives an end of speech feedback signal from a distinct voice service process. 12. An electronic voice-controlled speaker device including an audio front end system having a microphone array, one or more processors, and memory storing one or more programs to be executed by the one or more processors, the one or more programs including instructions for: detecting, from the microphone array, an audio signal in an environment proximate to the audio front end system; identifying a first spatial zone of a plurality of spatial zones in the environment as a first source of audio associated with the audio signal; assigning a first wake word detection engine of a plurality of wake word detection engines to the first spatial zone in accordance with the identifying of the first spatial zone as a first source of audio; identifying a second spatial zone of the plurality of spatial zones in the environment as a second source of audio associated with the audio signal, wherein the second spatial zone is different from the first spatial zone; assigning a second wake word detection engine of the plurality of wake word detection engines to the second spatial zone in accordance with the identifying of the second spatial zone as a second source of audio; processing the audio signal at the first wake word detection engine assigned to the first spatial zone; processing the audio signal at the second wake word detection engine assigned to the second spatial zone; determining, based on the processing at the first wake word detection engine and based on the processing at the second wake word detection engine, whether the audio signal represents a wake word for the electronic voice-controlled speaker device; and in accordance with a determination that the audio signal represents a wake word for the electronic voice-controlled speaker device, adjusting a wake word detection threshold for at least one of the first wake word detection engine and the second wake word detection engine. 13. The electronic voice-controlled speaker device of claim 12 , wherein the instructions for: identifying the first spatial zone include instructions for aligning first component sound waves of the audio signal detected from the first source of audio; identifying the second spatial zone include instructions for aligning second component sound waves of the audio signal detected from the second so
Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal · CPC title
Noise filtering · CPC title
Word spotting · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
microphones · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.