On-device custom wake word detection

US11798535B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11798535-B2
Application numberUS-202117474829-A
CountryUS
Kind codeB2
Filing dateSep 14, 2021
Priority dateMay 5, 2019
Publication dateOct 24, 2023
Grant dateOct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to the user-specified wake word; a microphone to capture audio; processing circuitry to: receive the audio from the microphone; determine, using the wake word detection model and including using the LUT to decode for the user-specified wake word, whether the audio includes an utterance of the user-specified wake word; and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 2. The device of claim 1 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 3. The device of claim 1 , wherein the processing circuitry is further to reset the wake word detection model to erase a history of processed audio. 4. The device of claim 3 , wherein the reset occurs in response to determining one of the wake word was uttered and a specified period of time has elapsed. 5. The device of claim 1 , wherein the wake word detection model is compressed using single value decomposition (SVD). 6. The device of claim 5 , wherein the wake word detection model includes weights quantized to 8-bit or 16-bit values. 7. The device of claim 1 , wherein the processing circuitry is further to: receive the wake word from a user; provide the wake word to a wake word model engine; and receive from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word; wherein the wake word graph is part of the wake word detection model. 8. The device of claim 1 , wherein the processing circuitry is to: receive the wake word from a user; provide the wake word to a wake word model engine; and receive from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word and a background language model with unigrams and bi-grams of the wake word removed therefrom; wherein the wake word graph and the background language model are part of the wake word detection model. 9. A method of on-device custom wake word detection comprising: receiving audio from a microphone of a device; determining, using a wake word detection model, whether the audio includes an utterance of a user-specified wake word, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to a user-specified wake word; and waking up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 10. The method of claim 9 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 11. The method of claim 9 , further comprising resetting the wake word detection model to erase a history of processed audio. 12. The method of claim 11 , wherein the reset occurs in response to determining one of the wake word was uttered and a specified period of time has elapsed. 13. The method of claim 9 , wherein the wake word detection model is compressed using single value decomposition (SVD). 14. The method of claim 13 , wherein the wake word detection model includes weights quantized to 8-bit or 16-bit values. 15. The method of claim 9 , further comprising: receiving the user-specified wake word from a user; providing the user-specified wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the user-specified wake word indicating a phoneme sequence of the user-specified wake word and alternate pronunciations of the user-specified wake word; wherein the wake word graph is part of the wake word detection model. 16. A non-transitory machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations for on-device custom wake word detection, the operations comprising: receiving audio from a microphone of a device; determining, using a wake word detection model, whether the audio includes an utterance of a user-specified wake word, the wake word detection model including a recurrent neural network transducer (RANT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to the user-specified wake word; waking up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 17. The non-transitory machine-readable medium of claim 16 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 18. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise resetting the wake word detection model to erase a history of processed audio in response to determining one of the wake word was uttered and a specified period of time has elapsed. 19. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise: receiving the user-specified wake word from a user; providing the user-specified wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the user-specified wake word indicating a phoneme sequence of the user-specified wake word and alternate pronunciations of the user-specified wake word; wherein the wake word graph is part of the wake word detection model. 20. The non-transitory machine-readable medium of claim 16 , wherein the operations further include: receiving the wake word from a user; providing the wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word and a background language model with unigrams and bi-grams of the wake word removed therefrom; wherein the wake word graph and the background language model are part of the wake word detection model.

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

  • G10L15/16Primary

    using artificial neural networks · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11798535B2 cover?
Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a …
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).