Adaptively recognizing speech using key phrases

US12125482B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12125482-B2
Application numberUS-201916692150-A
CountryUS
Kind codeB2
Filing dateNov 22, 2019
Priority dateNov 22, 2019
Publication dateOct 22, 2024
Grant dateOct 22, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An example apparatus for recognizing speech includes an audio receiver to receive a stream of audio. The apparatus also includes a key phrase detector to detect a key phrase in the stream of audio. The apparatus further includes a model adapter to dynamically adapt a model based on the detected key phrase. The apparatus also includes a query recognizer to detect a voice query following the key phrase in a stream of audio via the adapted model.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: interface circuitry; instructions; and programmable circuitry to be programmed by the instructions to: detect a key phrase in a stream of audio with a key phrase detection model; dynamically adapt an automatic speech recognition (ASR) model based on estimated speaker features in the detected key phrase and estimated acoustic environment features in the detected key phrase by: computing an error based on a comparison of a recognized senone probability distribution of the key phrase which was spoken by a speaker and an optimal distribution; adjusting the ASR model based on the error; adjusting a weight of a history vector of the ASR model in a backward pass; and adjusting a weight of a feature vector of the ASR model in a backward pass; detect a voice query following the key phrase in a stream of audio via the adapted ASR model; and cause execution of an action based on the voice query. 2. The apparatus of claim 1 , wherein the programmable circuitry includes a digital signal processor. 3. The apparatus of claim 1 , wherein the programmable circuitry implements the key phrase detection model with a neural network to detect the key phrase. 4. The apparatus of claim 1 , wherein the programmable circuitry is to operate in a first power mode to detect the key phrase and to operate in a second power mode to detect the voice query. 5. The apparatus of claim 1 , wherein the key phrase includes a wake-on phrase. 6. The apparatus of claim 1 , wherein the programmable circuitry is to compute a stream of speech features based on the stream of audio. 7. The apparatus of claim 1 , wherein the ASR model includes an acoustic model to generate the probability distributions over senones. 8. The apparatus of claim 1 , wherein the ASR model includes a language model to compute a final letter sequence. 9. The apparatus of claim 1 , wherein the ASR model includes a recurrent neural network. 10. The apparatus of claim 1 , wherein the ASR model includes a time delay neural network. 11. A method for recognizing speech, the method comprising: detecting, by executing a key phrase detection model with at least one processor, a key phrase in the stream of audio; dynamically adapting, by executing instructions with at least one of the at least one processor, an automatic speech recognition (ASR) model based on estimated speaker features in the detected key phrase and estimated acoustic environment features in the detected key phrase by: comparing a recognized senone probability distribution of the key phrase which was spoken by a speaker with an optimal distribution to compute an error, propagating the error to an initial state of the ASR model, adjusting a weight of a history vector of the ASR model in a backward pass, and adjusting a weight of a feature vector of the ASR model in a backward pass; detecting, by executing the adapted ASR model with at least one of the at least one processor, a voice query following the key phrase in a stream of audio; and causing, by executing instructions with at least one of the at least one processor, execution of an action associated with the voice query. 12. The method of claim 11 , wherein the dynamically adapting of the ASR model includes propagating the error at each time step back to an initial state of the ASR model. 13. The method of claim 11 , wherein the detecting of the key phrase includes executing a forward pass on the ASR model. 14. The method of claim 11 , wherein the detecting of the key phrase includes processing the stream of audio via an ultra-low power mode. 15. The method of claim 11 , further including generating a stream of speech features based on the stream of audio, wherein the key phrase is detected based on the stream of features. 16. The method of claim 11 , wherein the detecting of the voice query includes generating probability distributions over senones. 17. The method of claim 11 , wherein the detecting of the voice query includes computing a final letter sequence. 18. At least one computer readable storage device or storage disk comprising instructions to cause programmable circuitry to at least: detect a key phrase in a stream of audio via a forward pass of a key phrase detection model; dynamically adapt an automatic speech recognition (ASR) model into an adapted ASR model based on estimated speaker features in the detected key phrase and estimated acoustic environment features in the detected key phrase by: computing an error based on a senone probability distribution of the key-phrase; and propagating the error to an initial state of the ASR model, and adjusting weights of a history vector and a feature vector of the ASR model in a backward pass to generate the adapted ASR model; execute the adapted ASR model to detect a voice query following the key phrase in the stream of audio; and cause performance of an action based on the voice query. 19. The at least one computer readable storage device or storage disc of claim 18 , wherein the instructions cause the programmable circuitry to propagate the error at each time step back to the initial state. 20. The at least one computer readable storage device or storage disc of claim 18 , wherein the instructions cause the programmable circuitry to compute the error by comparing the senone probability distribution with a reference distribution. 21. The apparatus of claim 1 , wherein the programmable circuitry is to detect the voice query by computing a final letter sequence. 22. The apparatus of claim 4 , wherein the programmable circuitry is to consume more power in the second power mode than in the first power mode. 23. The apparatus of claim 1 , wherein the programmable circuitry is to dynamically adapt the ASR model in a hidden state of the ASR model by adjusting the weight of the history vector and the weight of the feature vector in parallel.

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Speaker identification or verification techniques · CPC title

  • Word spotting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12125482B2 cover?
An example apparatus for recognizing speech includes an audio receiver to receive a stream of audio. The apparatus also includes a key phrase detector to detect a key phrase in the stream of audio. The apparatus further includes a model adapter to dynamically adapt a model based on the detected key phrase. The apparatus also includes a query recognizer to detect a voice query following the key …
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 22 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).