On-device custom wake word detection
US-2020349927-A1 · Nov 5, 2020 · US
US12125482B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12125482-B2 |
| Application number | US-201916692150-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 22, 2019 |
| Priority date | Nov 22, 2019 |
| Publication date | Oct 22, 2024 |
| Grant date | Oct 22, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An example apparatus for recognizing speech includes an audio receiver to receive a stream of audio. The apparatus also includes a key phrase detector to detect a key phrase in the stream of audio. The apparatus further includes a model adapter to dynamically adapt a model based on the detected key phrase. The apparatus also includes a query recognizer to detect a voice query following the key phrase in a stream of audio via the adapted model.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: interface circuitry; instructions; and programmable circuitry to be programmed by the instructions to: detect a key phrase in a stream of audio with a key phrase detection model; dynamically adapt an automatic speech recognition (ASR) model based on estimated speaker features in the detected key phrase and estimated acoustic environment features in the detected key phrase by: computing an error based on a comparison of a recognized senone probability distribution of the key phrase which was spoken by a speaker and an optimal distribution; adjusting the ASR model based on the error; adjusting a weight of a history vector of the ASR model in a backward pass; and adjusting a weight of a feature vector of the ASR model in a backward pass; detect a voice query following the key phrase in a stream of audio via the adapted ASR model; and cause execution of an action based on the voice query. 2. The apparatus of claim 1 , wherein the programmable circuitry includes a digital signal processor. 3. The apparatus of claim 1 , wherein the programmable circuitry implements the key phrase detection model with a neural network to detect the key phrase. 4. The apparatus of claim 1 , wherein the programmable circuitry is to operate in a first power mode to detect the key phrase and to operate in a second power mode to detect the voice query. 5. The apparatus of claim 1 , wherein the key phrase includes a wake-on phrase. 6. The apparatus of claim 1 , wherein the programmable circuitry is to compute a stream of speech features based on the stream of audio. 7. The apparatus of claim 1 , wherein the ASR model includes an acoustic model to generate the probability distributions over senones. 8. The apparatus of claim 1 , wherein the ASR model includes a language model to compute a final letter sequence. 9. The apparatus of claim 1 , wherein the ASR model includes a recurrent neural network. 10. The apparatus of claim 1 , wherein the ASR model includes a time delay neural network. 11. A method for recognizing speech, the method comprising: detecting, by executing a key phrase detection model with at least one processor, a key phrase in the stream of audio; dynamically adapting, by executing instructions with at least one of the at least one processor, an automatic speech recognition (ASR) model based on estimated speaker features in the detected key phrase and estimated acoustic environment features in the detected key phrase by: comparing a recognized senone probability distribution of the key phrase which was spoken by a speaker with an optimal distribution to compute an error, propagating the error to an initial state of the ASR model, adjusting a weight of a history vector of the ASR model in a backward pass, and adjusting a weight of a feature vector of the ASR model in a backward pass; detecting, by executing the adapted ASR model with at least one of the at least one processor, a voice query following the key phrase in a stream of audio; and causing, by executing instructions with at least one of the at least one processor, execution of an action associated with the voice query. 12. The method of claim 11 , wherein the dynamically adapting of the ASR model includes propagating the error at each time step back to an initial state of the ASR model. 13. The method of claim 11 , wherein the detecting of the key phrase includes executing a forward pass on the ASR model. 14. The method of claim 11 , wherein the detecting of the key phrase includes processing the stream of audio via an ultra-low power mode. 15. The method of claim 11 , further including generating a stream of speech features based on the stream of audio, wherein the key phrase is detected based on the stream of features. 16. The method of claim 11 , wherein the detecting of the voice query includes generating probability distributions over senones. 17. The method of claim 11 , wherein the detecting of the voice query includes computing a final letter sequence. 18. At least one computer readable storage device or storage disk comprising instructions to cause programmable circuitry to at least: detect a key phrase in a stream of audio via a forward pass of a key phrase detection model; dynamically adapt an automatic speech recognition (ASR) model into an adapted ASR model based on estimated speaker features in the detected key phrase and estimated acoustic environment features in the detected key phrase by: computing an error based on a senone probability distribution of the key-phrase; and propagating the error to an initial state of the ASR model, and adjusting weights of a history vector and a feature vector of the ASR model in a backward pass to generate the adapted ASR model; execute the adapted ASR model to detect a voice query following the key phrase in the stream of audio; and cause performance of an action based on the voice query. 19. The at least one computer readable storage device or storage disc of claim 18 , wherein the instructions cause the programmable circuitry to propagate the error at each time step back to the initial state. 20. The at least one computer readable storage device or storage disc of claim 18 , wherein the instructions cause the programmable circuitry to compute the error by comparing the senone probability distribution with a reference distribution. 21. The apparatus of claim 1 , wherein the programmable circuitry is to detect the voice query by computing a final letter sequence. 22. The apparatus of claim 4 , wherein the programmable circuitry is to consume more power in the second power mode than in the first power mode. 23. The apparatus of claim 1 , wherein the programmable circuitry is to dynamically adapt the ASR model in a hidden state of the ASR model by adjusting the weight of the history vector and the weight of the feature vector in parallel.
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Speaker identification or verification techniques · CPC title
Word spotting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.