Incremental utterance decoder combination for efficient and accurate decoding

US9922654B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9922654-B2
Application numberUS-201615377267-A
CountryUS
Kind codeB2
Filing dateDec 13, 2016
Priority dateMar 19, 2014
Publication dateMar 20, 2018
Grant dateMar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops. Accordingly, a decoded utterance with accuracy approaching the maximum for the series is obtained without decoding the spoken utterance using all utterance decoders in the series, thereby minimizing resource usage.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for accurately decoding spoken utterances with a plurality of utterance decoders, the method comprising: determining a first utterance decoder based on word error rate; calculating a system diversity metric value for each of the plurality of other utterance decoders based on each individual utterance decoder's word error rate and likelihood of agreement with a result of the first utterance decoder; ordering the plurality of utterance decoders into a series, the plurality of utterance decoders ordered according to the calculated system diversity metric values; decoding a spoken utterance with the first utterance decoder in the series and a second utterance decoder in the series; and merging results from the first utterance decoder and the second utterance decoder. 2. The method of claim 1 , further comprising: evaluating an accuracy level of the merged results; and upon determining that the accuracy level of the merged results is acceptable, terminating decoding of the spoken utterance. 3. The method of claim 2 , wherein the accuracy level is determined based on confidence values returned by the plurality of utterance decoders. 4. The method of claim 1 , further comprising: determining that that each additional utterance decoder of the plurality of utterance decoders is unlikely to be capable of accurately decoding the spoken utterance; and terminating decoding of the spoken utterance. 5. The method of claim 4 , wherein the accuracy of the decoded spoken utterance is determined based on confidence values returned by the plurality of utterance decoders. 6. The method of claim 1 , further comprising: decoding the spoken utterance with an additional utterance decoder of the plurality of utterance decoders to obtain an additional result; combining the additional result with the merged results to produce a combined recognition result; and accepting the combined recognition result if accurate. 7. The method of claim 1 , wherein the first utterance decoder has a lowest word error rate of the plurality of utterance decoders, and the second utterance decoder is identified based on a metric comprising a weighted sum derived from its word error rate and its agreement rate with the first utterance decoder. 8. An incremental speech recognition system comprising: at least one processor; and a memory operatively connected to the at least one processor, the memory comprising computer-executable instructions that, when executed by the at least one processor, perform a method comprising: storing audio data corresponding to a spoken utterance; determining a first speech decoding model of a plurality of speech decoding models based on word error rate; decoding the spoken utterance with the first speech decoding model; calculating a system diversity metric value for each of the plurality of other speech decoding models based on each individual speech decoding model's word error rate and likelihood of agreement with a result of the first speech decoding model for the spoken utterance; ordering the plurality of speech decoding models into a series, the plurality of speech decoding models ordered according to the calculated system diversity metric values; and decoding the spoken utterance with the second speech decoding model in the series. 9. The incremental speech recognition system of claim 8 , wherein the computer-executable instructions are further executable by the at least one processor for: merging results from the first speech decoding model and the second speech decoding model. 10. The incremental speech recognition system of claim 9 , wherein the computer-executable instructions are further executable by the at least one processor for: evaluating an accuracy level of the merged results; and upon determining that accuracy level of the merged results is acceptable, terminating decoding of the spoken utterance. 11. The incremental speech recognition system of claim 10 , wherein the accuracy level is determined based on confidence values returned by the first and second speech decoding models. 12. The incremental speech recognition system of claim 9 , wherein the computer-executable instructions are further executable by the at least one processor for: determining that each additional speech decoding model of the plurality of speech decoding models is unlikely to be capable of accurately decoding the spoken utterance; and terminating decoding of the spoken utterance. 13. The incremental speech recognition system of claim 12 , wherein the accuracy of the decoded spoken utterance is determined based on confidence values returned by the first speech decoding model and the second speech decoding model. 14. The incremental speech recognition system of claim 9 , wherein the computer-executable instructions are further executable by the at least one processor for: decoding the spoken utterance with an additional speech decoding model of the plurality of speech decoding models to obtain an additional result; and combining the additional result with the merged results to produce a combined recognition result. 15. The incremental speech recognition system of claim 14 , wherein the computer-executable instructions are further executable by the at least one processor for: accepting the combined recognition result if accurate. 16. The incremental speech recognition system of claim 9 , wherein the computer-executable instructions are further executable by the at least one processor for: building a statistical classifier based on results obtained by decoding training data using the series of speech decoding models. 17. A computer readable storage device containing computer-executable instructions which, when executed by a computer, perform a method for decoding spoken utterances with a plurality of utterance decoders, the method comprising: determining a first utterance decoder based on word error rate; calculating a system diversity metric value for each of the plurality of other utterance decoders based on each individual utterance decoder's word error rate and likelihood of agreement with a result of the first utterance decoder; ordering the plurality of utterance decoders into a series, the plurality of utterance decoders ordered according to the calculated system diversity metric values; decoding a spoken utterance with the first utterance decoder in the series and a second utterance decoder in the series; and merging results from the first utterance decoder and the second utterance decoder. 18. The computer readable storage device of claim 17 , wherein the computer-executable instructions are further executable by the computer for: determining that each additional utterance decoder of the plurality of utterance decoders is unlikely to be capable of accurately decoding the spoken utterance; and terminating decoding of the spoken utterance. 19. The computer readable storage device of claim 18 , wherein the accuracy of the decoded spoken utterance is determined based on confidence values returned by the first utterance decoder in the series and the second utterance decoder in the series. 20. The computer readable storage device of claim 18 , wherein the accuracy of the decoded spoken utterance is determined based on confidence values returned by each utterance decoder of the plurality of utterance decoders that have decoded the spoken utterance.

Assignees

Inventors

Classifications

  • G10L15/32Primary

    Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

  • Training · CPC title

  • Correction of errors induced by the transmission channel, if related to the coding algorithm · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9922654B2 cover?
An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recogniti…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/32. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).