What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Dynamic pruning in speech recognition

US9613624B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9613624-B1
Application number	US-201414314563-A
Country	US
Kind code	B1
Filing date	Jun 25, 2014
Priority date	Jun 25, 2014
Publication date	Apr 4, 2017
Grant date	Apr 4, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In a dynamic automatic speech recognition (ASR) processing system, ASR processing may be configured to estimate a latency of returning speech results to a user based on work being done by an ASR processor. The ASR processing system may measure work done by an ASR processor by measuring one or more time independent metrics and comparing the metrics to threshold values. If the metrics exceed the thresholds, the ASR system may take steps to reduce latency associated with processing the utterance, including adjusting a speech recognition parameter.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for dynamically adjusting speech recognition processing to reduce latency, the method comprising: receiving an audio signal, the audio signal corresponding to an utterance; performing first speech recognition processing on a first portion of the audio signal, the first speech recognition processing involving a first plurality of hypotheses, each of the first plurality having a respective score within a first hypothesis score range; determining a processing value for each frame of the first portion, wherein the processing value corresponds to an amount of processing performed by one or more processors; determining a total number of frames in the first portion; determining, using the processing value, an estimated amount of time to process the first portion; determining that the estimated amount of time is above a time threshold; determining an adjusted hypothesis score range to increase a speed of speech recognition processing in response to the estimated amount of time being above the time threshold; and performing second speech recognition processing on a second portion of the audio signal using, the second speech recognition processing involving a second plurality of hypotheses, each of the first plurality having a respective score within the adjusted hypothesis score range, the second plurality including fewer hypotheses than the first plurality. 2. The method of claim 1 , wherein the processing value corresponds to at least one of: a number of active nodes of a speech decoding graph being considered; a number of Gaussian mixture-components scored; a number of nodes of the speech decoding graph traversed; an audio quality metric value of the audio signal; a number of arcs added to a decoding graph; or a number of processor instructions executed. 3. A computing device, comprising: at least one processor; a memory device including instructions operable to be executed by the at least one processor to perform a set of actions, configuring the at least one processor to: receive an audio signal, the audio signal corresponding to an utterance; perform first speech recognition processing on a first portion of the audio signal, the first speech recognition processing involving a first plurality of hypotheses, each of the first plurality having a respective score within a first hypothesis score range, the first portion comprising a number of frames; determine a first processing value corresponding to a quantity of speech recognition processing performed for the number of frames; determine, using the first processing value, an estimated amount of time to process the first portion; determine that the estimated amount of time is above a time threshold; determine a first adjusted hypothesis score range in response to the estimated amount of time being above the time threshold; and perform second speech recognition processing on a second portion of the audio signal using, the second speech recognition processing involving a second plurality of hypotheses, each of the second plurality having a respective score within the first adjusted hypothesis score range, the second plurality including fewer hypotheses than the first plurality. 4. The computing device of claim 3 , wherein the at least one processor is further configured to adjust the time threshold based on an estimated real time factor of the speech recognition processing of the audio signal. 5. The computing device of claim 3 , wherein first processing value comprises at least one of: a number of active nodes of a speech decoding graph being considered during the speech recognition processing of the number of frames; a number of Gaussian mixture-components scored during the speech recognition processing of the number of frames; a number of nodes of the speech decoding graph traversed over the number of frames; an audio quality value of the audio signal; a number of arcs of a decoding graph considered during the speech recognition processing of the number of frames; or a number of processor instructions executed during speech recognition processing of the number of frames. 6. The computing device of claim 3 , wherein the at least one processor is further configured to: determine a new processing value corresponding to a quantity of speech recognition processing performed for a third portion of the audio signal; and determine a second adjusted hypothesis score range for a fourth portion of the audio signal in response to the determination of the new processing value. 7. The computing device of claim 3 , wherein the at least one processor is further configured to: determine a new processing value corresponding to a quantity of speech recognition processing performed for a minimum number of frames of the audio signal; and determine a second adjusted hypothesis score range in response to processing the minimum number of frames of the first portion of the audio signal. 8. The computing device of claim 3 , wherein the at least one processor is further configured to: determine a second processing value corresponding to a quantity of speech recognition processing performed for the number of frames; and determine a second adjusted hypothesis score range based on the first processing value and/or the second processing value for each frame of the first portion of the audio signal. 9. A non-transitory computer-readable storage medium storing processor-executable instructions for controlling a computing device, comprising program code to: receive an audio signal, the audio signal corresponding to an utterance; perform first speech recognition processing on a first portion of the audio signal the first speech recognition processing involving a first plurality of hypotheses, each of the first plurality having a respective score within a first hypothesis score range, the first portion comprising a number of frames; determine a first processing value corresponding to a quantity of speech recognition processing performed for the number of frames; determine, using the first processing value, an estimated amount of time to process the first portion; determine that the estimated amount of time is above a time threshold; determine a first adjusted hypothesis score range in response to the estimated amount of time being above the time threshold; and perform second speech recognition processing on a second portion of the audio signal using, the second speech recognition processing involving a second plurality of hypotheses, each of the second plurality having a respective score within the first adjusted hypothesis score range, the second plurality including fewer hypotheses than the first plurality. 10. The non-transitory computer-readable storage medium of claim 9 , further comprising program code to adjust the time threshold based on an estimated real time factor of the speech recognition processing of the audio signal. 11. The non-transitory computer-readable storage medium of claim 9 , wherein first processing value comprises at least one of: a number of active nodes of a speech decoding graph being considered during the speech recognition processing of the number of frames; a number of Gaussian mixture-components scored during the speech recognition processing of the number of frames; a number of nodes of the speech decoding graph traversed over the number of frames; an audio quality value of the audio signal; a number of arcs of a decoding graph considered during the speech recognition processing of the number of frames; or a number of processor instructions executed during speech recognition processing of the number of frames. 12. The non-transitory

Assignees

Amazon Tech Inc

Inventors

Classifications

G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L19/00
Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis (in musical instruments G10H) · CPC title
G10L2015/085
Methods for reducing search complexity, pruning · CPC title
G10L15/08Primary
Speech classification or search · CPC title

Patent family

Related publications grouped by family.

View patent family 58419195

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9613624B1 cover?: In a dynamic automatic speech recognition (ASR) processing system, ASR processing may be configured to estimate a latency of returning speech results to a user based on work being done by an ASR processor. The ASR processing system may measure work done by an ASR processor by measuring one or more time independent metrics and comparing the metrics to threshold values. If the metrics exceed the …
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 04 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).