Wake word selection assistance architectures and methods
US-2022254334-A1 · Aug 11, 2022 · US
US11798535B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11798535-B2 |
| Application number | US-202117474829-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 14, 2021 |
| Priority date | May 5, 2019 |
| Publication date | Oct 24, 2023 |
| Grant date | Oct 24, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.
Opening claim text (preview).
What is claimed is: 1. A device comprising: memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to the user-specified wake word; a microphone to capture audio; processing circuitry to: receive the audio from the microphone; determine, using the wake word detection model and including using the LUT to decode for the user-specified wake word, whether the audio includes an utterance of the user-specified wake word; and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 2. The device of claim 1 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 3. The device of claim 1 , wherein the processing circuitry is further to reset the wake word detection model to erase a history of processed audio. 4. The device of claim 3 , wherein the reset occurs in response to determining one of the wake word was uttered and a specified period of time has elapsed. 5. The device of claim 1 , wherein the wake word detection model is compressed using single value decomposition (SVD). 6. The device of claim 5 , wherein the wake word detection model includes weights quantized to 8-bit or 16-bit values. 7. The device of claim 1 , wherein the processing circuitry is further to: receive the wake word from a user; provide the wake word to a wake word model engine; and receive from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word; wherein the wake word graph is part of the wake word detection model. 8. The device of claim 1 , wherein the processing circuitry is to: receive the wake word from a user; provide the wake word to a wake word model engine; and receive from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word and a background language model with unigrams and bi-grams of the wake word removed therefrom; wherein the wake word graph and the background language model are part of the wake word detection model. 9. A method of on-device custom wake word detection comprising: receiving audio from a microphone of a device; determining, using a wake word detection model, whether the audio includes an utterance of a user-specified wake word, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to a user-specified wake word; and waking up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 10. The method of claim 9 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 11. The method of claim 9 , further comprising resetting the wake word detection model to erase a history of processed audio. 12. The method of claim 11 , wherein the reset occurs in response to determining one of the wake word was uttered and a specified period of time has elapsed. 13. The method of claim 9 , wherein the wake word detection model is compressed using single value decomposition (SVD). 14. The method of claim 13 , wherein the wake word detection model includes weights quantized to 8-bit or 16-bit values. 15. The method of claim 9 , further comprising: receiving the user-specified wake word from a user; providing the user-specified wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the user-specified wake word indicating a phoneme sequence of the user-specified wake word and alternate pronunciations of the user-specified wake word; wherein the wake word graph is part of the wake word detection model. 16. A non-transitory machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations for on-device custom wake word detection, the operations comprising: receiving audio from a microphone of a device; determining, using a wake word detection model, whether the audio includes an utterance of a user-specified wake word, the wake word detection model including a recurrent neural network transducer (RANT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to the user-specified wake word; waking up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 17. The non-transitory machine-readable medium of claim 16 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 18. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise resetting the wake word detection model to erase a history of processed audio in response to determining one of the wake word was uttered and a specified period of time has elapsed. 19. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise: receiving the user-specified wake word from a user; providing the user-specified wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the user-specified wake word indicating a phoneme sequence of the user-specified wake word and alternate pronunciations of the user-specified wake word; wherein the wake word graph is part of the wake word detection model. 20. The non-transitory machine-readable medium of claim 16 , wherein the operations further include: receiving the wake word from a user; providing the wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word and a background language model with unigrams and bi-grams of the wake word removed therefrom; wherein the wake word graph and the background language model are part of the wake word detection model.
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Quantised networks; Sparse networks; Compressed networks · CPC title
using artificial neural networks · CPC title
Learning methods · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.