Online verification of custom wake word

US11158305B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11158305-B2
Application numberUS-201916522401-A
CountryUS
Kind codeB2
Filing dateJul 25, 2019
Priority dateMay 5, 2019
Publication dateOct 26, 2021
Grant dateOct 26, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: memory having stored model parameters that define a static portion of a wake word verification model; and processing circuitry to: receive a first message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word; retrieve or generate a custom decoding graph that decodes for the user-defined wake word, wherein the custom decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, the custom wake word verification model including, in parallel, the custom decoding graph, a background language model (BLM), and an acoustic model, the acoustic model receives audio features and generates likelihood vectors that the audio features correspond to phonemes, the custom decoding graph receives the user-defined wake word and generates a sequence of phonemes in the user-defined wake word, and the BLM receives the user-defined wake word and generates a probability of observing a next gram given previous grams; execute the custom wake word verification model to determine a likelihood that the wake word was uttered; and provide a second message to the device indicating whether wake was uttered based on the determined likelihood. 2. The system of claim 1 , wherein the wake word is one of a plurality of wake words for the device and the decoding graph includes only the plurality of wake words and alternate pronunciations thereof. 3. The system of claim 1 , wherein the acoustic model includes a plurality of neural network layers quantized to a specified number of bits. 4. The system of claim 3 , wherein an input neural network layer and an output neural network layer of the neural network layers are quantized to 16-bits and the remaining neural network layers are quantized to 8-bits. 5. The system of claim 1 , wherein the wake word verification model further includes a beam search decoder to determine, based on the predicted series of phonemes from the acoustic model, and a union of the sequence of phonemes in the user-defined wake word from the custom decoding graph, and the probability of observing a next gram given previous grams from the BLM, a confidence that the wake word was uttered. 6. The system of claim 1 , wherein the wake word verification model is to determine that the wake word is present only in response to determining that the wake word was predicted to be uttered in at least two intermediate hypotheses since a last silence. 7. The system of claim 1 , wherein the processing circuitry is further to determine, in parallel, a task present in the audio samples or features extracted from the audio samples. 8. A method of custom wake word verification comprising: receiving, at a server, a first message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word; retrieving or generating, at the server, a custom decoding graph that decodes for the user-defined wake word, wherein the custom decoding graph and a static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, the custom wake word verification model including, in parallel, the custom decoding graph, a background language model (BLM), and an acoustic model, the acoustic model receives audio features and generates likelihood vectors that the audio features correspond to phonemes, the custom decoding graph receives the user-defined wake word and generates a sequence of phonemes in the user-defined wake word, and the BLM receives the user-defined wake word and generates a probability of observing a next gram given previous grams; executing the custom wake word verification model to determine a likelihood that the wake word was uttered; and providing a second message to the device indicating whether wake was uttered based on the determined likelihood. 9. The method of claim 8 , wherein the user-specified wake word is one of a plurality of wake words for the device and the custom decoding graph includes only the plurality of wake words and alternate pronunciations thereof. 10. The method of claim 8 , wherein the acoustic model includes a plurality of neural network layers quantized to a specified number of bits. 11. The method of claim 10 , wherein an input neural network layer and an output neural network layer of the neural network layers are quantized to 16-bits and the remaining neural network layers are quantized to 8-bits. 12. A non-transitory machine-readable medium including instructions that, when executed by a machine, cause the machine to perform operations for custom wake word verification, the operations comprising: receiving a first message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word; retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the custom decoding graph and a static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, the custom wake word verification model including, in parallel, the custom decoding graph, a background language model (BLM), and an acoustic model, the acoustic model receives audio features and generates likelihood vectors that the audio features correspond to phonemes, the custom decoding graph receives the user-defined wake word and generates a sequence of phonemes in the user-defined wake word, and the BLM receives the user-defined wake word and generates a probability of observing a next gram given previous grams; executing the custom wake word verification model to determine a likelihood that the wake word was uttered; and providing a second message to the device indicating whether wake was uttered based on the determined likelihood. 13. The non-transitory machine-readable medium of claim 12 , wherein the wake word verification model further includes a beam search decoder to determine, based on the predicted series of phonemes from the acoustic model, and a union of the sequence of phonemes in the user-defined wake word from the custom decoding graph and the probability of observing a next gram given previous grams from the BLM a confidence that the wake word was uttered. 14. The non-transitory machine-readable medium of claim 12 , wherein the operations further include determining, in parallel, a task present in the audio samples or features extracted from the audio samples.

Assignees

Inventors

Classifications

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Execution procedure of a spoken command · CPC title

  • the user being prompted to utter a password or a predefined phrase · CPC title

  • Word spotting · CPC title

  • using artificial neural networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11158305B2 cover?
Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or gen…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 26 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).