Method and apparatus for activating application by speech input
US-2015302855-A1 · Oct 22, 2015 · US
US11657804B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11657804-B2 |
| Application number | US-202017090716-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 5, 2020 |
| Priority date | Jun 20, 2014 |
| Publication date | May 23, 2023 |
| Grant date | May 23, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Features are disclosed for detecting words in audio using contextual information in addition to automatic speech recognition results. A detection model can be generated and used to determine whether a particular word, such as a keyword or “wake word,” has been uttered. The detection model can operate on features derived from an audio signal, contextual information associated with generation of the audio signal, and the like. In some embodiments, the detection model can be customized for particular users or groups of users based usage patterns associated with the users.
Opening claim text (preview).
What is claimed is: 1. A system comprising: computer-readable memory storing executable instructions; and one or more processors in communication with the computer-readable memory, wherein the one or more processors are programmed by the executable instructions to at least: receive audio data from a computing device subsequent to the computing device determining a wake word was detected in a first portion of the audio data; generate speech recognition result data using at least the first portion of the audio data; generate acoustic data representing an acoustic property of a voice represented by the audio data; generate feature data using the speech recognition result data and the acoustic data; generate wake word detection data using a statistical wake word detection model trained to receive the feature data as input; and determine, based on the wake word detection data, that the audio data fails to satisfy a detection criterion related to detecting a representation of the wake word. 2. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to close an audio data stream from the computing device subsequent to determining that the audio data fails to satisfy the detection criterion. 3. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to: generate detection score data using the statistical wake word detection model and the feature data; and determine that the detection score data fails to satisfy a detection threshold, wherein the detection score data failing to satisfy the detection threshold indicates the audio data fails to satisfy the detection criterion. 4. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to determine the acoustic property, wherein the acoustic property comprises at least one of: prosody, energy, or speaking rate. 5. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to generate environmental data representing a property of an environment in which the computing device is located. 6. The system of claim 5 , wherein the environmental data represents at least one of: a noise level of the environment, an acoustic property of the environment, or a distance between a user and a microphone of the computing device. 7. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to: generate a user-specific statistical wake word detection model based on at least one of: the speech recognition result data, the acoustic data, or the feature data; generate second feature data based at least partly on second audio data; and determine, using the user-specific statistical wake word detection model, that the second audio data satisfies the detection criterion. 8. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to: generate a user-specific statistical wake word detection model based on at least one of: the speech recognition result data, the acoustic data, or the feature data; and send the user-specific statistical wake word detection model to the computing device. 9. The system of claim 1 , wherein the one or more processors are programmed by further executable instructions to receive identity data representing at least one of a start position or an end position of the first portion of audio data within the audio data. 10. The system of claim 1 , wherein the statistical wake word detection model comprises a neural network. 11. A computer-implemented method comprising: as implemented by a computing system comprising one or more processors configured to execute specific instructions, receiving audio data subsequent to a wake word being detected in a first portion of the audio data; generating speech recognition result data using at least the first portion of the audio data; generating acoustic data representing an acoustic property of a voice represented by the audio data; generating feature data using the speech recognition result data and the acoustic data; generating wake word detection data using a statistical wake word detection model trained to receive the feature data as input; and determining, based on the wake word detection data, that the audio data satisfies a detection criterion related to detecting a representation of the wake word. 12. The computer-implemented method of claim 11 , wherein the receiving the audio data comprises receiving the audio data over a network from a user computing device, wherein the audio data is received subsequent to the user computing device determining the wake word was detected in the first portion of the audio data. 13. The computer-implemented method of claim 12 , further comprising receiving, over the network from the user computing device, identity data representing at least one of a start position or an end position of the first portion of audio data within the audio data. 14. The computer-implemented method of claim 11 , wherein the receiving the audio data comprises receiving the audio data from a microphone of the computing system. 15. The computer-implemented method of claim 11 , further comprising: generating natural language understanding data using the audio data, wherein the natural language understanding data represents an action requested in the audio data; and performing the action. 16. The computer-implemented method of claim 11 , further comprising: generating detection score data using the statistical wake word detection model and the feature data; and determining that the detection score data satisfies a detection threshold, wherein the detection score data satisfying the detection threshold indicates the audio data satisfies the detection criterion. 17. The computer-implemented method of claim 11 , further comprising generating a user-specific statistical wake word detection model based on at least one of the speech recognition result data, the acoustic data, or the feature data. 18. The computer-implemented method of claim 17 , further comprising: generating second feature data based at least partly on second audio data; and determining, using the user-specific statistical wake word detection model, that the second audio data satisfies the detection criterion. 19. The computer-implemented method of claim 17 , further comprising sending the user-specific statistical wake word detection model to a user computing device. 20. The computer-implemented method of claim 11 , further comprising generating environmental data representing an environmental property of an environment in which a user computing device is located, wherein generating the feature data comprises using the environmental data.
Speech classification or search · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
using natural language modelling · CPC title
Word spotting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.