Keyword spotting with competitor models
US-9159319-B1 · Oct 13, 2015 · US
US9704486B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9704486-B2 |
| Application number | US-201213711510-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 11, 2012 |
| Priority date | Dec 11, 2012 |
| Publication date | Jul 11, 2017 |
| Grant date | Jul 11, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Power consumption for a computing device may be managed by one or more keywords. For example, if an audio input obtained by the computing device includes a keyword, a network interface module and/or an application processing module of the computing device may be activated. The audio input may then be transmitted via the network interface module to a remote computing device, such as a speech recognition server. Alternately, the computing device may be provided with a speech recognition engine configured to process the audio input for on-device speech recognition.
Opening claim text (preview).
What is claimed is: 1. A system comprising: an audio input module; an audio detection module in communication with the audio input module; a speech detection module in communication with the audio detection module; a wakeword recognition module in communication with the speech detection module; and a network interface module in communication with the wakeword recognition module, wherein: the audio detection module is configured to: receive audio input from the audio input module; determine a volume of at least a portion of the audio input; cause the audio input module to increase a sampling rate of the audio input based at least in part on the volume exceeding a threshold; and cause activation of the speech detection module based at least in part on the volume exceeding the threshold; the speech detection module is configured to determine a first score indicating a likelihood that the audio input comprises speech and cause activation of the wakeword recognition module based at least on part on the score; and the wakeword recognition module is configured to: determine a second score indicating a likelihood that the audio input comprises a wakeword; and cause activation of a network interface module based on the second score by providing power to the network interface module; and the network interface module is configured to transmit at least a portion of the obtained audio input to a remote computing device. 2. The system of claim 1 , wherein the audio input device comprises a microphone, the audio detection module comprises a first digital signal processor, the speech detection module comprises a second digital signal processor, and the wakeword recognition module comprises a microprocessor. 3. The system of claim 1 , wherein: the speech detection module is further configured to determine the first score using at least one of a hidden Markov model, a Gaussian mixture model, energies in a plurality of spectral bands, or signal to noise ratios in a plurality of spectral bands; and the wakeword recognition module is further configured to determine the second score using at least one of an application processing module, a hidden Markov model, and a Gaussian mixture model. 4. The system of claim 1 , wherein: the wakeword recognition module is further configured to cause deactivation of the audio detection module based at least in part on the first score; and the wakeword recognition module is further configured to cause deactivation of the speech detection module based at least in part on the second score. 5. A computer-implemented method of operating a first computing device, the method comprising: receiving an audio input; determining one or more values from the audio input, wherein the one or more values comprise at least one of: a first value indicating an energy level of the audio input; or a second value indicating a likelihood that the audio input comprises speech; increasing a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values; activating a first module of the first computing device based at least in part on the one or more values; performing an operation, by the first module, wherein the operation comprises at least one of: determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio input comprises a wakeword, wherein causing activation of the network interface module comprises providing power to the network interface module; performing speech recognition on at least a portion of the audio input to obtain speech recognition results; or causing transmission of at least a portion of the audio input to a second computing device. 6. The computer-implemented method of claim 5 , wherein: the first module comprises a processor that is switchable between a low-power state and a high-power state; and the processor only performs the operation when it is in the high-power state. 7. The computer-implemented method of claim 6 , wherein activating the first module comprises switching the processor from the low-power state to the high-power state. 8. The computer-implemented method of claim 6 further comprising deactivating the first module, wherein deactivating the first module comprises switching the processor from the high-power state to the low-power state. 9. The computer-implemented method of claim 6 , wherein the processor comprises at least one of a digital signal processor or a microprocessor. 10. The computer-implemented method of claim 5 , wherein the first module comprises a software module configured to be executed by a microprocessor. 11. The computer-implemented method of claim 10 , wherein activating the first module comprises causing the microprocessor to execute the software module. 12. The computer-implemented method of claim 5 , wherein the operation further comprising receiving speech recognition results from the second computing device. 13. The computer-implemented method of claim 12 , wherein the speech recognition results comprise at least one of a transcription of at least a portion of the audio input and a response to an intelligent agent query included in at least a portion of the audio input. 14. The computer-implemented method of claim 12 further comprising: activating a second module of the first computing device based at least in part on the one or more values, wherein the second module is configured to implement a speech recognition application; and processing the speech recognition results with the speech recognition application. 15. The computer-implemented method of claim 5 , wherein providing power to the network interface module causes the network interface module to transition from a deactivated state to an activated state. 16. The computer-implemented method of claim 15 , wherein communications sent via the network interface module are prevented while the network interface module is in the deactivated state. 17. The computer-implemented method of claim 15 , wherein communications sent via the network interface module are enabled while the network interface module is in the activated state. 18. The computer-implemented method of claim 5 , wherein providing power to the network interface module comprises providing power to a processor of the first computing device. 19. The computer-implemented method of claim 5 , further comprising determining that the energy level of the audio input satisfies a threshold, wherein the increasing the sampling rate is performed in response to determining that the energy level of the audio input satisfies the threshold. 20. A device comprising: a first processor configured to: determine one or more values, wherein the one or more values comprise at least one of a first value indicating an energy level of an audio input or a second value indicating a likelihood that the audio input comprises speech; and cause an increase in a sampling rate of the audio input, from a first lower sampling rate to a second higher sampling rate, based at least in part on the one or more values; cause activation of a second processor based at least in part on the one or more values; the second processor configured to perform an operation, wherein the operation comprises at least one of: determining that the audio input comprises a wakeword and causing activation of a network interface module in response to determining that the audio inpu
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Word spotting · CPC title
Constructional details of speech recognition systems · CPC title
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.