Dynamic adjustment of wake word acceptance tolerance thresholds in voice-controlled devices

US11308959B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11308959-B2
Application numberUS-202016787993-A
CountryUS
Kind codeB2
Filing dateFeb 11, 2020
Priority dateFeb 11, 2020
Publication dateApr 19, 2022
Grant dateApr 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods are provided for detecting wake words. An electronic device detects an audio signal; identifies two spatial zones as first and second sources of audio associated with the audio signal; processes the audio signal at two wake word detection engines, where each detection engine is associated with a respective spatial zone; determines, based on the processing at the wake word detection engines, whether the audio signal represents a wake word for the electronic device; and in accordance with a determination that the audio signal does represent a wake word, adjusts a wake word detection threshold for at least one of the wake word detection engines.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: at an electronic voice-controlled speaker device including an audio front end system having a microphone array, one or more processors, and memory storing instructions for execution by the one or more processors: detecting, from the microphone array, an audio signal in an environment proximate to the audio front end system; identifying a first spatial zone of a plurality of spatial zones in the environment as a first source of audio associated with the audio signal; assigning a first wake word detection engine of a plurality of wake word detection engines to the first spatial zone in accordance with the identifying of the first spatial zone as a first source of audio; identifying a second spatial zone of the plurality of spatial zones in the environment as a second source of audio associated with the audio signal, wherein the second spatial zone is different from the first spatial zone; assigning a second wake word detection engine of the plurality of wake word detection engines to the second spatial zone in accordance with the identifying of the second spatial zone as a second source of audio; processing the audio signal at the first wake word detection engine assigned to the first spatial zone; processing the audio signal at the second wake word detection engine assigned to the second spatial zone; determining, based on the processing at the first wake word detection engine and based on the processing at the second wake word detection engine, whether the audio signal represents a wake word for the electronic voice-controlled speaker device; and in accordance with a determination that the audio signal represents a wake word for the electronic voice-controlled speaker device, adjusting a wake word detection threshold for at least one of the first wake word detection engine and the second wake word detection engine. 2. The method of claim 1 , wherein: identifying the first spatial zone comprises aligning first component sound waves of the audio signal detected from the first source of audio; identifying the second spatial zone comprises aligning second component sound waves of the audio signal detected from the second source of audio; processing the audio signal at the first wake word detection engine comprises performing a wake word detection process on the aligned first component sound waves; and processing the audio signal at the second wake word detection engine comprises performing a wake word detection process on the aligned second component sound waves. 3. The method of claim 2 , wherein: determining whether the audio signal represents a wake word for the electronic voice-controlled speaker device comprises applying a noise cancelation process at the first wake word detection engine based on the processing of the audio signal at the second wake word detection engine. 4. The method of claim 2 , wherein: adjusting the wake word detection threshold comprises adjusting a wake word detection threshold associated with the second wake word detection engine based on a determination by the first wake word detection engine that the audio signal represents a wake word for the electronic voice-controlled speaker device. 5. The method of claim 4 , wherein: adjusting the wake word detection threshold associated with the second wake word detection engine comprises increasing the wake word detection threshold for the second wake word detection engine on a spectrum of strictness. 6. The method of claim 1 , further comprising: identifying a third spatial zone of the plurality of spatial zones in the environment as a third source of audio associated with the audio signal, wherein the third spatial zone is different from the first and second spatial zones; causing a third wake word detection engine to be made available for processing the audio signal; and assigning the third wake word detection engine to the third spatial zone. 7. The method of claim 6 , wherein causing the third wake word detection engine to be made available for processing the audio signal comprises: dynamically adjusting how many wake word detection engines are available for processing the audio signal. 8. The method of claim 1 , wherein: the wake word detection threshold is used in subsequent processing of audio signals at the first wake word detection engine, and corresponds with a probability that the audio signal represents a wake word. 9. The method of claim 1 , wherein adjusting the wake word detection threshold comprises adjusting the wake word detection threshold based on a spatial model of the environment representing locations and probabilities of wake word source zones. 10. The method of claim 9 , wherein the spatial model is based on a Bayesian inference analysis using a probability distribution to determine the probability of detecting a valid wake word. 11. The method of claim 1 , further comprising: configuring the audio front end system to detect subsequent audio signals from a direction associated with a spatial zone corresponding with the determination that the audio signal represents a wake word for the electronic voice-controlled speaker device; and maintaining the audio front end configuration until the electronic voice-controlled speaker device receives an end of speech feedback signal from a distinct voice service process. 12. An electronic voice-controlled speaker device including an audio front end system having a microphone array, one or more processors, and memory storing one or more programs to be executed by the one or more processors, the one or more programs including instructions for: detecting, from the microphone array, an audio signal in an environment proximate to the audio front end system; identifying a first spatial zone of a plurality of spatial zones in the environment as a first source of audio associated with the audio signal; assigning a first wake word detection engine of a plurality of wake word detection engines to the first spatial zone in accordance with the identifying of the first spatial zone as a first source of audio; identifying a second spatial zone of the plurality of spatial zones in the environment as a second source of audio associated with the audio signal, wherein the second spatial zone is different from the first spatial zone; assigning a second wake word detection engine of the plurality of wake word detection engines to the second spatial zone in accordance with the identifying of the second spatial zone as a second source of audio; processing the audio signal at the first wake word detection engine assigned to the first spatial zone; processing the audio signal at the second wake word detection engine assigned to the second spatial zone; determining, based on the processing at the first wake word detection engine and based on the processing at the second wake word detection engine, whether the audio signal represents a wake word for the electronic voice-controlled speaker device; and in accordance with a determination that the audio signal represents a wake word for the electronic voice-controlled speaker device, adjusting a wake word detection threshold for at least one of the first wake word detection engine and the second wake word detection engine. 13. The electronic voice-controlled speaker device of claim 12 , wherein the instructions for: identifying the first spatial zone include instructions for aligning first component sound waves of the audio signal detected from the first source of audio; identifying the second spatial zone include instructions for aligning second component sound waves of the audio signal detected from the second so

Assignees

Inventors

Classifications

  • Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal · CPC title

  • Noise filtering · CPC title

  • Word spotting · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • microphones · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11308959B2 cover?
Systems and methods are provided for detecting wake words. An electronic device detects an audio signal; identifies two spatial zones as first and second sources of audio associated with the audio signal; processes the audio signal at two wake word detection engines, where each detection engine is associated with a respective spatial zone; determines, based on the processing at the wake word de…
Who is the assignee on this patent?
Spotify Ab
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).