Method and apparatus for activating application by speech input
US-2015302855-A1 · Oct 22, 2015 · US
US10229686B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10229686-B2 |
| Application number | US-201415329354-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 18, 2014 |
| Priority date | Aug 18, 2014 |
| Publication date | Mar 12, 2019 |
| Grant date | Mar 12, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus to process microphone signals by a speech enhancement module to generate an audio stream signal including first and second metadata for use by a speech recognition module. In an embodiment, speech recognition is performed using endpointing information including transitioning from a silence state to a maybe speech state, in which data is buffered, based on the first metadata and transitioning to a speech state, in which speech recognition is performed, based upon the second metadata.
Opening claim text (preview).
The invention claimed is: 1. A method of performing automated speech recognition (ASR) in a system having a speech enhancement module for generating an audio stream signal and metadata, coupled to an ASR module for performing speech recognition on the audio stream signal using the metadata, the method comprising: by the speech enhancement module, processing microphone signals to generate the audio stream signal; by a first speech detector having a first response latency, generating first metadata that indicate the possible presence of speech in the audio stream signal with a first confidence level; by a second speech detector having a second response latency that is higher than the first response latency, generating second metadata that indicate the possible presence of speech in the audio stream signal with a second confidence level that is higher than the first confidence level; by the ASR module based on the first metadata, initiating buffering of the audio stream signal from an endpoint; and by the ASR module based on the second metadata, initiating speech recognition on the buffered audio stream signal from the endpoint. 2. The method according to claim 1 , wherein the first metadata has a frame-by-frame time scale. 3. The method according to claim 1 , wherein the second metadata has a sequence of frames time scale. 4. The method according to claim 1 , further including performing one or more of barge-in, beamforming, and/or echo cancellation for generating the first and/or second metadata. 5. The method according to claim 1 , further including tuning a speech detection threshold for a given latency for the first metadata. 6. The method according to claim 1 , further including adjusting latency for a given confidence level of voice activity detection for the second metadata. 7. The method according to claim 1 , further including controlling computation of the second metadata using the first metadata or computation of the first metadata using the second metadata. 8. The method according to claim 1 , further including performing one or more of barge-in, beamforming, and/or echo cancellation for generating further metadata. 9. The method according to claim 1 , wherein at least one of the first and second metadata is encoded into the audio signal. 10. An article, comprising a non-transitory computer readable medium having stored instructions that when executed perform a method of automated speech recognition (ASR) in a system having a speech enhancement module for generating an audio stream signal and metadata, coupled to an ASR module for performing speech recognition on the audio stream signal using the metadata, the method comprising: by the speech enhancement module, processing microphone signals to generate the audio stream signal; by a first speech detector having a first response latency, generating first metadata that indicate the possible presence of speech in the audio stream signal with a first confidence level; by a second speech detector having a second response latency that is higher than the first response latency, generating second metadata that indicate the possible presence of speech in the audio stream signal with a second confidence level that is higher than the first confidence level; by the ASR module based on the first metadata, initiating buffering of the audio stream signal from an endpoint; and by the ASR module based on the second metadata, initiating speech recognition on the buffered audio stream signal from the endpoint. 11. The article according to claim 10 , wherein the first metadata has a frame-by-frame time scale. 12. The article according to claim 10 , wherein the second metadata has a sequence of frames time scale. 13. The article according to claim 10 , further including instructions to perform one or more of barge-in, beamforming, and/or echo cancellation for generating the first and second metadata. 14. The article according to claim 10 , further including instructions to tune speech detector parameters for a given latency for the first metadata. 15. The article according to claim 10 , further including instructions to adjust latency for a given confidence level of voice activity detection for the second metadata. 16. The article according to claim 10 , further including instructions to control computation of the second metadata using the first metadata or computation of the first metadata using the second metadata. 17. The article according to claim 10 , further including instructions to perform one or more of barge-in, beamforming, and/or echo cancellation for generating further metadata. 18. A system for performing automated speech recognition (ASR) comprising a speech enhancement module for generating an audio stream signal and metadata, coupled to an ASR module for performing speech recognition on the audio stream signal using the metadata, the system further comprising: in the speech enhancement module, electronic circuitry configured to provide: a first speech detector having a first response latency for generating first metadata that indicate the possible presence of speech in the audio stream signal with a first confidence level; and a second speech detector having a second response latency that is higher than the first response latency for generating second metadata that indicate the possible presence of speech in the audio stream signal with a second confidence level that is higher than the first confidence level; and in the ASR module, electronic circuitry configured to provide: an endpointing module for initiating, based on the first metadata, buffering of the audio stream signal from an endpoint, and for initiating, based on the second metadata, speech recognition on the buffered audio stream signal from the endpoint. 19. The system according to claim 18 , further including a further speech detector to perform one or more of barge-in, beamforming, and/or echo cancellation for generating further metadata for use by the endpointing module. 20. The system according to claim 18 , wherein the first speech detector is further configured to tune detector parameters for a given latency for the first metadata. 21. The system according to claim 18 , wherein the second speech detector is further configured to adjust latency for a given confidence level of voice activity detection using the second metadata.
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title
Constructional details of speech recognition systems · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech (G10L21/02 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.