Theme detection for object-recognition-based notifications
US-12183330-B2 · Dec 31, 2024 · US
US9613624B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9613624-B1 |
| Application number | US-201414314563-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 25, 2014 |
| Priority date | Jun 25, 2014 |
| Publication date | Apr 4, 2017 |
| Grant date | Apr 4, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In a dynamic automatic speech recognition (ASR) processing system, ASR processing may be configured to estimate a latency of returning speech results to a user based on work being done by an ASR processor. The ASR processing system may measure work done by an ASR processor by measuring one or more time independent metrics and comparing the metrics to threshold values. If the metrics exceed the thresholds, the ASR system may take steps to reduce latency associated with processing the utterance, including adjusting a speech recognition parameter.
Opening claim text (preview).
What is claimed is: 1. A method for dynamically adjusting speech recognition processing to reduce latency, the method comprising: receiving an audio signal, the audio signal corresponding to an utterance; performing first speech recognition processing on a first portion of the audio signal, the first speech recognition processing involving a first plurality of hypotheses, each of the first plurality having a respective score within a first hypothesis score range; determining a processing value for each frame of the first portion, wherein the processing value corresponds to an amount of processing performed by one or more processors; determining a total number of frames in the first portion; determining, using the processing value, an estimated amount of time to process the first portion; determining that the estimated amount of time is above a time threshold; determining an adjusted hypothesis score range to increase a speed of speech recognition processing in response to the estimated amount of time being above the time threshold; and performing second speech recognition processing on a second portion of the audio signal using, the second speech recognition processing involving a second plurality of hypotheses, each of the first plurality having a respective score within the adjusted hypothesis score range, the second plurality including fewer hypotheses than the first plurality. 2. The method of claim 1 , wherein the processing value corresponds to at least one of: a number of active nodes of a speech decoding graph being considered; a number of Gaussian mixture-components scored; a number of nodes of the speech decoding graph traversed; an audio quality metric value of the audio signal; a number of arcs added to a decoding graph; or a number of processor instructions executed. 3. A computing device, comprising: at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor to: receive an audio signal, the audio signal corresponding to an utterance; perform first speech recognition processing on a first portion of the audio signal, the first speech recognition processing involving a first plurality of hypotheses, each of the first plurality having a respective score within a first hypothesis score range, the first portion comprising a number of frames; determine a first processing value corresponding to a quantity of speech recognition processing performed for the number of frames; determine, using the first processing value, an estimated amount of time to process the first portion; determine that the estimated amount of time is above a time threshold; determine a first adjusted hypothesis score range in response to the estimated amount of time being above the time threshold; and perform second speech recognition processing on a second portion of the audio signal using, the second speech recognition processing involving a second plurality of hypotheses, each of the second plurality having a respective score within the first adjusted hypothesis score range, the second plurality including fewer hypotheses than the first plurality. 4. The computing device of claim 3 , wherein the at least one processor is further configured to adjust the time threshold based on an estimated real time factor of the speech recognition processing of the audio signal. 5. The computing device of claim 3 , wherein first processing value comprises at least one of: a number of active nodes of a speech decoding graph being considered during the speech recognition processing of the number of frames; a number of Gaussian mixture-components scored during the speech recognition processing of the number of frames; a number of nodes of the speech decoding graph traversed over the number of frames; an audio quality value of the audio signal; a number of arcs of a decoding graph considered during the speech recognition processing of the number of frames; or a number of processor instructions executed during speech recognition processing of the number of frames. 6. The computing device of claim 3 , wherein the at least one processor is further configured to: determine a new processing value corresponding to a quantity of speech recognition processing performed for a third portion of the audio signal; and determine a second adjusted hypothesis score range for a fourth portion of the audio signal in response to the determination of the new processing value. 7. The computing device of claim 3 , wherein the at least one processor is further configured to: determine a new processing value corresponding to a quantity of speech recognition processing performed for a minimum number of frames of the audio signal; and determine a second adjusted hypothesis score range in response to processing the minimum number of frames of the first portion of the audio signal. 8. The computing device of claim 3 , wherein the at least one processor is further configured to: determine a second processing value corresponding to a quantity of speech recognition processing performed for the number of frames; and determine a second adjusted hypothesis score range based on the first processing value and/or the second processing value for each frame of the first portion of the audio signal. 9. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising program code to: receive an audio signal, the audio signal corresponding to an utterance; perform first speech recognition processing on a first portion of the audio signal the first speech recognition processing involving a first plurality of hypotheses, each of the first plurality having a respective score within a first hypothesis score range, the first portion comprising a number of frames; determine a first processing value corresponding to a quantity of speech recognition processing performed for the number of frames; determine, using the first processing value, an estimated amount of time to process the first portion; determine that the estimated amount of time is above a time threshold; determine a first adjusted hypothesis score range in response to the estimated amount of time being above the time threshold; and perform second speech recognition processing on a second portion of the audio signal using, the second speech recognition processing involving a second plurality of hypotheses, each of the second plurality having a respective score within the first adjusted hypothesis score range, the second plurality including fewer hypotheses than the first plurality. 10. The non-transitory computer-readable storage medium of claim 9 , further comprising program code to adjust the time threshold based on an estimated real time factor of the speech recognition processing of the audio signal. 11. The non-transitory computer-readable storage medium of claim 9 , wherein first processing value comprises at least one of: a number of active nodes of a speech decoding graph being considered during the speech recognition processing of the number of frames; a number of Gaussian mixture-components scored during the speech recognition processing of the number of frames; a number of nodes of the speech decoding graph traversed over the number of frames; an audio quality value of the audio signal; a number of arcs of a decoding graph considered during the speech recognition processing of the number of frames; or a number of processor instructions executed during speech recognition processing of the number of frames. 12. The non-transitory
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis (in musical instruments G10H) · CPC title
Methods for reducing search complexity, pruning · CPC title
Speech classification or search · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.