Wakeword detection using a neural network
US-11521599-B1 · Dec 6, 2022 · US
US12424215B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12424215-B2 |
| Application number | US-202217804544-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 27, 2022 |
| Priority date | May 27, 2022 |
| Publication date | Sep 23, 2025 |
| Grant date | Sep 23, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and systems for pre-wakeword speech processing are disclosed. Speech audio, comprising command speech spoken before a wakeword, may be stored in a buffer in oldest to newest order. Upon detection of the wakeword, reverse acoustic models and language models, such as reverse automatic speech recognition (R-ASR) can be applied to the buffered audio, in newest to oldest order, starting from before the wakeword. The speech is converted into a sequence of words. Natural language grammar models, such as natural language understanding (NLU), can be applied to match the sequence of words to a complete command, the complete command being associated with invoking a computer operation.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of recognizing a command spoken before a wakeword, the method comprising: receiving an audio signal comprising speech; storing the audio signal in a sequence of spectral frames in a buffer; detecting a wakeword in the audio signal; estimating an initial frame of the wakeword; converting a plurality of frames into a sequence of words using reverse automatic speech recognition (R-ASR), in newest to oldest order, beginning from a first frame before the initial frame of the wakeword; matching the sequence of words to a complete command; identifying a mid-sentence correction between the command and the wakeword, the mid-sentence correction corresponding to words matching a second complete command; and; invoking a function associated with the second complete command. 2. The computer-implemented method of claim 1 , wherein using R-ASR further comprises referencing at least a phonetic dictionary, the phonetic dictionary having reverse pronunciations of words. 3. The computer-implemented method of claim 1 , wherein using R-ASR further comprises referencing at least a language model, the language model having reverse orders of word sequences. 4. The computer-implemented method of claim 1 , further comprising: transforming the audio signal into a sequence of reversed phonemes through R-ASR. 5. The computer-implemented method of claim 1 , further comprising: terminating R-ASR when the sequence of words matches the complete command. 6. The computer-implemented method of claim 1 , further comprising: estimating a last frame of the wakeword; converting a second plurality of frames into a second sequence of words using automatic speech recognition (ASR) system, in oldest to newest order, beginning from a first frame after the last frame of the wakeword; combining the sequence of words and the second sequence of words into a combined sequence of words; matching the combined sequence of words to the complete command; and invoking the function associated with the complete command. 7. The computer-implemented method of claim 6 , wherein using ASR further comprises referencing at least a second phonetic dictionary, the second phonetic dictionary having forward pronunciations of words. 8. The computer-implemented method of claim 6 , wherein using ASR further comprises referencing at least a second language model, the second language model having forward orders of word sequences. 9. The computer-implemented method of claim 6 , further comprising: converting the plurality of frames using R-ASR and converting the second plurality of frames using ASR in separate simultaneous threads. 10. The computer-implemented method of claim 6 , further comprising: converting the plurality of frames using R-ASR on a high-performance processor; and converting the second plurality of frames using ASR on a low-performance processor. 11. The computer-implemented method of claim 1 , further comprising: detecting a pause in the audio signal; and converting the plurality of frames using R-ASR, in newest to oldest order, beginning from the first frame before the initial frame of the wakeword toward the pause. 12. The computer-implemented method of claim 1 , wherein the wakeword is a high frequency phrase. 13. A computer-implemented method of recognizing a command, the method comprising: receiving an audio signal comprising speech; detecting a wakeword in the audio signal; estimating a beginning time of the wakeword; converting the audio signal into a sequence of words using reverse automatic speech recognition (R-ASR), in newest to oldest order, from before the beginning time of the wakeword; matching the sequence of words to a complete command; identifying a mid-sentence correction between the command and the wakeword, the mid-sentence correction corresponding to words matching a second complete command; and; invoking a function associated with the second complete command. 14. The computer-implemented method of claim 13 , wherein using R-ASR further comprises referencing at least a phonetic dictionary, the phonetic dictionary having reverse pronunciations of words. 15. The computer-implemented method of claim 13 , wherein using R-ASR further comprises referencing at least a language model, the language model having reverse orders of word sequences. 16. The computer-implemented method of claim 13 , further comprising: transforming the audio signal into a sequence of reversed phonemes through R-ASR. 17. The computer-implemented method of claim 13 , further comprising: terminating R-ASR when the sequence of words matches the complete command. 18. The computer-implemented method of claim 13 , further comprising: estimating a last frame of the wakeword; converting a second plurality of frames into a second sequence of words using automatic speech recognition (ASR) system, in oldest to newest order, beginning from a first frame after the last frame of the wakeword; combining the sequence of words and the second sequence of words into a combined sequence of words; matching the combined sequence of words to the complete command; and invoking the function associated with the complete command. 19. The computer-implemented method of claim 13 , wherein using ASR further comprises referencing at least a second phonetic dictionary, the second phonetic dictionary having forward pronunciations of words. 20. The computer-implemented method of claim 13 , wherein using ASR further comprises referencing at least a second language model, the second language model having forward orders of word sequences.
Speech classification or search · CPC title
Discriminating between voiced and unvoiced parts of speech signals (G10L25/90 takes precedence) · CPC title
Word spotting · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
using context dependencies, e.g. language models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.