Confidence features for automated speech recognition arbitration

US10706852B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10706852-B2
Application numberUS-201514941058-A
CountryUS
Kind codeB2
Filing dateNov 13, 2015
Priority dateNov 13, 2015
Publication dateJul 7, 2020
Grant dateJul 7, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.

First claim

Opening claim text (preview).

What is claimed is: 1. A speech recognition system for transforming an acoustic utterance into a transcribed speech recognition result by arbitrating between speech recognition results generated by a first automated speech recognition (ASR) engine and a second ASR engine, the system comprising: at least one memory device; at least one processing device; an arbitrator stored in the at least one memory device and executable by the at least one processing device, the arbitrator configured to receive a set of confidence features of an utterance and to select between a first speech recognition result representing the acoustic utterance as transcribed by the first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by the second ASR engine, the selection being based on the received set of confidence features. 2. The system of claim 1 , wherein the arbitrator is trained on datasets correlating confidence features with success and failure scenarios of the first ASR engine and the second ASR engine. 3. The system of claim 1 , wherein the first ASR engine decodes the acoustic utterance based on a first acoustic model and a first language model and the second ASR engine decodes the acoustic utterance based on a second acoustic model and a second language model. 4. The system of claim 1 , wherein the arbitrator further selects between the first speech recognition result and the second speech recognition result based on a first confidence score computed by the first ASR engine and a second confidence score computed by the second ASR engine. 5. The system of claim 1 , wherein the arbitrator is configured to select between the first speech recognition result and the second speech recognition result based on a first set of confidence features generated by the first ASR engine and a second set of confidence features generated by the second ASR engine. 6. The system of claim 1 , wherein the confidence features include acoustic model features. 7. The system of claim 1 , wherein the confidence features include language-model features. 8. The system of claim 1 , wherein the first ASR engine includes a speech recognizer that implements a confidence classifier trained on the confidence features. 9. A method of transforming an acoustic utterance into a transcribed speech recognition result by arbitrating to select between a first automated speech recognition (ASR) result and a second ASR result, the method comprising: receiving from a first ASR engine a set of confidence features of an acoustic utterance and an associated first speech recognition result representing the acoustic utterance selected from a plurality of potential results based on analysis of the set of confidence features; receiving from a second ASR engine a second speech recognition result representing the acoustic utterance; and selecting between the first speech recognition result and the second speech recognition result based on one or more of the confidence features. 10. The method of claim 9 , wherein the speech recognition result is derived by a confidence classifier trained on a dataset including the confidence features to maximally discriminate between correct and incorrect definitions. 11. The method of claim 9 , wherein the first ASR engine decodes the acoustic utterance based on a first acoustic model and a first language model and the second ASR engine decodes the acoustic utterance based on a second acoustic model and a second language model. 12. The method of claim 9 , further comprising: receiving from the second ASR engine another set of confidence features used in generating the second first speech recognition result. 13. The method of claim 9 , wherein the first ASR engine is executed by a processor on a client device and the method further comprises: transmitting the selected result to a client device. 14. The method of claim 9 , further comprising receiving a confidence score from each of the first ASR engine and the second ASR engine. 15. The method of claim 9 , wherein the first speech recognition result and the second speech recognition result are received at an arbitrator and the method further comprises: training the arbitrator on datasets correlating confidence features with success and failure scenarios of the first ASR engine and the second ASR engine. 16. The method of claim 9 , wherein the confidence features include acoustic model features. 17. The method of claim 9 , wherein the confidence features include language-model features. 18. A method of transforming an acoustic utterance into a transcribed speech recognition result by initiating arbitration to select between a first speech recognition result and a second speech recognition result, the method comprising: transmitting a set of confidence features of an acoustic utterance and the first speech recognition result to an arbitrator, the first speech recognition result representing the acoustic utterance as transcribed by a first ASR engine selected from a plurality of potential results based on analysis of the set of confidence features; and receiving, from the arbitrator, an arbitrated result representing the first speech recognition result transcribed by either the first ASR engine or the second speech recognition transcribed by a second ASR engine, the arbitrated result selected based on the set of confidence features. 19. The method of claim 18 , further comprising: computing the confidence features based on the acoustic utterance; and computing a confidence score to select the speech recognition result by maximally discriminating between correct and incorrect recognitions of the acoustic utterance. 20. The method of claim 18 , wherein the confidence features include at least one of acoustic-model features and language-model features. 21. The system of claim 1 , wherein the confidence features are derived from the acoustic utterance. 22. The system of claim 1 , wherein the confidence features each quantify an auditory, linguistic or syntactical aspect of the acoustic utterance.

Assignees

Inventors

Classifications

  • using context dependencies, e.g. language models · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • G10L15/32Primary

    Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10706852B2 cover?
The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a se…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/32. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 07 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).