Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

On-device custom wake word detection

US11798535B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11798535-B2
Application number	US-202117474829-A
Country	US
Kind code	B2
Filing date	Sep 14, 2021
Priority date	May 5, 2019
Publication date	Oct 24, 2023
Grant date	Oct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a user-specified wake word, a microphone to capture audio, and processing circuitry to receive the audio from the microphone, determine, using the wake word detection model, whether the audio includes an utterance of the user-specified wake word, and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word.

First claim

Opening claim text (preview).

What is claimed is: 1. A device comprising: memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to the user-specified wake word; a microphone to capture audio; processing circuitry to: receive the audio from the microphone; determine, using the wake word detection model and including using the LUT to decode for the user-specified wake word, whether the audio includes an utterance of the user-specified wake word; and wake up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 2. The device of claim 1 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 3. The device of claim 1 , wherein the processing circuitry is further to reset the wake word detection model to erase a history of processed audio. 4. The device of claim 3 , wherein the reset occurs in response to determining one of the wake word was uttered and a specified period of time has elapsed. 5. The device of claim 1 , wherein the wake word detection model is compressed using single value decomposition (SVD). 6. The device of claim 5 , wherein the wake word detection model includes weights quantized to 8-bit or 16-bit values. 7. The device of claim 1 , wherein the processing circuitry is further to: receive the wake word from a user; provide the wake word to a wake word model engine; and receive from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word; wherein the wake word graph is part of the wake word detection model. 8. The device of claim 1 , wherein the processing circuitry is to: receive the wake word from a user; provide the wake word to a wake word model engine; and receive from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word and a background language model with unigrams and bi-grams of the wake word removed therefrom; wherein the wake word graph and the background language model are part of the wake word detection model. 9. A method of on-device custom wake word detection comprising: receiving audio from a microphone of a device; determining, using a wake word detection model, whether the audio includes an utterance of a user-specified wake word, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to a user-specified wake word; and waking up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 10. The method of claim 9 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 11. The method of claim 9 , further comprising resetting the wake word detection model to erase a history of processed audio. 12. The method of claim 11 , wherein the reset occurs in response to determining one of the wake word was uttered and a specified period of time has elapsed. 13. The method of claim 9 , wherein the wake word detection model is compressed using single value decomposition (SVD). 14. The method of claim 13 , wherein the wake word detection model includes weights quantized to 8-bit or 16-bit values. 15. The method of claim 9 , further comprising: receiving the user-specified wake word from a user; providing the user-specified wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the user-specified wake word indicating a phoneme sequence of the user-specified wake word and alternate pronunciations of the user-specified wake word; wherein the wake word graph is part of the wake word detection model. 16. A non-transitory machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations for on-device custom wake word detection, the operations comprising: receiving audio from a microphone of a device; determining, using a wake word detection model, whether the audio includes an utterance of a user-specified wake word, the wake word detection model including a recurrent neural network transducer (RANT) and a lookup table (LUT), the LUT including pre-computed hidden vectors produced by a prediction network responsive to the user-specified wake word; waking up a personal assistant after determining the audio includes the utterance of the user-specified wake word. 17. The non-transitory machine-readable medium of claim 16 , wherein the wake word detection model is trained using standard phonemes and whole word phonemes. 18. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise resetting the wake word detection model to erase a history of processed audio in response to determining one of the wake word was uttered and a specified period of time has elapsed. 19. The non-transitory machine-readable medium of claim 16 , wherein the operations further comprise: receiving the user-specified wake word from a user; providing the user-specified wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the user-specified wake word indicating a phoneme sequence of the user-specified wake word and alternate pronunciations of the user-specified wake word; wherein the wake word graph is part of the wake word detection model. 20. The non-transitory machine-readable medium of claim 16 , wherein the operations further include: receiving the wake word from a user; providing the wake word to a wake word model engine; and receiving from the wake word model engine, a wake word graph of the wake word indicating a phoneme sequence of the wake word and alternate pronunciations of the wake word and a background language model with unigrams and bi-grams of the wake word removed therefrom; wherein the wake word graph and the background language model are part of the wake word detection model.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0495
Quantised networks; Sparse networks; Compressed networks · CPC title
G10L15/16Primary
using artificial neural networks · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 73016681

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11798535B2 cover?: Generally discussed herein are devices, systems, and methods for on-device detection of a wake word. A device can include a memory including model parameters that define a custom wake word detection model, the wake word detection model including a recurrent neural network transducer (RNNT) and a lookup table (LUT), the LUT indicating a hidden vector to be provided in response to a phoneme of a …
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Wake word selection assistance architectures and methods

Customizable keyword spotting system with keyword adaptation

Generating input alternatives

Speech recognition with sequence-to-sequence models

Multi-user authentication on a device

System and method for managing models for embedded speech and language processing

Frequently asked questions