Incremental utterance decoder combination for efficient and accurate decoding

US9552817B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9552817-B2
Application numberUS-201414219642-A
CountryUS
Kind codeB2
Filing dateMar 19, 2014
Priority dateMar 19, 2014
Publication dateJan 24, 2017
Grant dateJan 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops. Accordingly, a decoded utterance with accuracy approaching the maximum for the series is obtained without decoding the spoken utterance using all utterance decoders in the series, thereby minimizing resource usage.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for accurately decoding spoken utterances with a plurality of utterance decoders, the method comprising the acts of: determining a first utterance decoder based on word error rate; calculating a system diversity metric value for each of the plurality of other utterance decoders based on each individual utterance decoder's word error rate and the likelihood of agreement with the first utterance decoder's result; ordering the plurality of utterance decoders into a series, the plurality of utterance decoders ordered according to the calculated system diversity metric values; decoding a spoken utterance with the first utterance decoder in the series and a second utterance decoder in the series; and merging results from the first utterance decoder and the second utterance decoder and comparing the merged results with an accuracy threshold to determine whether to decode the spoken utterance with a third utterance decoder in the series. 2. The method of claim 1 further comprising determining the accuracy of the results based on confidence values returned by the plurality of utterance decoders. 3. The method of claim 1 further comprising accepting the merged results if accurate. 4. The method of claim 3 further comprising: decoding the spoken utterance with an additional utterance decoder from the series to obtain an additional recognition result; and combining the additional recognition result with the merged results to produce combined recognition result; and accepting the combined recognition result if accurate. 5. The method of claim 4 further comprising the acts of: determining that spoken utterance cannot be accurately decoded using any of the utterance decoders in the series; and abandoning decoding of the spoken utterance. 6. The method of claim 4 wherein the act of accepting the combined recognition result if accurate further comprises the acts of: determining the reliability of the combined recognition result; and accepting the combined recognition result when the accuracy of the combined recognition result exceeds a threshold value. 7. The method of claim 1 wherein the act of ordering the plurality of utterance decoders into a series further comprises the act of ranking the plurality of utterance decoders based on at least one of recognition accuracy metrics and resource usage associated with each utterance decoder. 8. The method of claim 4 further comprising the act of building a statistical classifier based on results obtained by decoding training data using the series of utterance decoders. 9. The method of claim 8 wherein the act of accepting the combined recognition result if accurate further comprises the acts of: supplying recognition accuracy scores associated with each recognition result in the combined recognition result as inputs to the statistical classifier; determining the accuracy of the combined recognition result using the statistical classifier; and accepting the combined recognition result when the accuracy determined by the statistical classifier reaches a threshold value. 10. An incremental speech recognition system for accurately decoding spoken utterances with a plurality of speech decoding models comprising: at least one processor; and a memory operatively connected to the at least one processor, the memory comprising computer-executable instructions that, when executed by the at least one processor, perform a method comprising: storing audio data corresponding to at least one spoken utterance; determining a first speech decoding model based on word error rate; calculating a system diversity metric value for each of the plurality of other speech decoding models based on each individual utterance decoder's word error rate and the likelihood of agreement with the first speech decoding model's result; ordering the plurality of speech decoding models into a series, the plurality of speech decoding models ordered according to the calculated system diversity metric values; and sequentially engaging the plurality of speech decoding models in the series to decode a spoken utterance and contribute to a decoded combination until the decoded combination is deemed accurate enough to accept as a final decoded utterance for the spoken utterance based on performance scores associated with each decoded utterance included in the decoded combination. 11. The incremental speech recognition system of claim 10 further comprising: a speech decoding model sequence configuration defining an order of operation for the plurality of speech decoding models; an utterance decoder interface operable to sequentially engage speech decoding models according to the speech decoding model sequence configuration until a stop condition is generated; a result combiner operable to combine the decoded utterance received from the current speech decoding model with the previous decoded utterance combination or the previous decoded utterance when no previous decoded utterance combination is available; a reliability estimator operable to estimate a reliability for the decoded utterance combination based on one or more performance scores associated with the decoded utterance; and a reliability evaluator operable to generate a stop condition when the reliability of the decoded utterance combination indicates that the accuracy of the decoded utterance combination reaches a threshold level for acceptance. 12. The incremental speech recognition system of claim 11 wherein the reliability evaluator is further operable to generate a stop condition when the reliability of the decoded utterance combination indicates that the spoken utterance cannot be accurately decoded by any of the speech decoding models in the series. 13. The incremental speech recognition system of claim 11 further comprising a statistical classifier operable to determine the accuracy of the decoded utterance combination from one or more performance scores corresponding to the decoded utterance included in the decoded utterance combination. 14. The incremental speech recognition system of claim 11 wherein the utterance decoder interface is further operable to: pass audio data for the spoken utterance to the first speech decoding model in the series; and receive a decoded utterance corresponding to the spoken utterance and a performance score from the first speech decoding model. 15. The incremental speech recognition system of claim 10 wherein the performance scores are recognition confidence scores. 16. The incremental speech recognition system of claim 10 wherein the performance scores are selected from recognition confidence scores, acoustic model scores, and language model scores. 17. A computer readable storage device containing computer executable instructions which, when executed by a computer, perform a method for accurately decoding spoken utterances with a plurality of utterance decoders, the method comprising the acts of: determining a first utterance decoder based on word error rate; calculating a system diversity metric value for each of the plurality of other utterance decoders based on each individual utterance decoder's word error rate and the likelihood of agreement with the first decoder's result; decoding a spoken utterance with a next utterance decoder in a series of utterance decoders to obtain a recognition result, wherein the series of utterance decoders is ordered according to the calculated system diversity metric values; when a previous recognition result for the spoken utterance is available, combining the recognition

Assignees

Inventors

Classifications

  • G10L19/005Primary

    Correction of errors induced by the transmission channel, if related to the coding algorithm · CPC title

  • G10L15/32Primary

    Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

  • Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9552817B2 cover?
An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recogniti…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L19/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).