Speaker verification
US-2015301796-A1 · Oct 22, 2015 · US
US10706852B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10706852-B2 |
| Application number | US-201514941058-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 13, 2015 |
| Priority date | Nov 13, 2015 |
| Publication date | Jul 7, 2020 |
| Grant date | Jul 7, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.
Opening claim text (preview).
What is claimed is: 1. A speech recognition system for transforming an acoustic utterance into a transcribed speech recognition result by arbitrating between speech recognition results generated by a first automated speech recognition (ASR) engine and a second ASR engine, the system comprising: at least one memory device; at least one processing device; an arbitrator stored in the at least one memory device and executable by the at least one processing device, the arbitrator configured to receive a set of confidence features of an utterance and to select between a first speech recognition result representing the acoustic utterance as transcribed by the first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by the second ASR engine, the selection being based on the received set of confidence features. 2. The system of claim 1 , wherein the arbitrator is trained on datasets correlating confidence features with success and failure scenarios of the first ASR engine and the second ASR engine. 3. The system of claim 1 , wherein the first ASR engine decodes the acoustic utterance based on a first acoustic model and a first language model and the second ASR engine decodes the acoustic utterance based on a second acoustic model and a second language model. 4. The system of claim 1 , wherein the arbitrator further selects between the first speech recognition result and the second speech recognition result based on a first confidence score computed by the first ASR engine and a second confidence score computed by the second ASR engine. 5. The system of claim 1 , wherein the arbitrator is configured to select between the first speech recognition result and the second speech recognition result based on a first set of confidence features generated by the first ASR engine and a second set of confidence features generated by the second ASR engine. 6. The system of claim 1 , wherein the confidence features include acoustic model features. 7. The system of claim 1 , wherein the confidence features include language-model features. 8. The system of claim 1 , wherein the first ASR engine includes a speech recognizer that implements a confidence classifier trained on the confidence features. 9. A method of transforming an acoustic utterance into a transcribed speech recognition result by arbitrating to select between a first automated speech recognition (ASR) result and a second ASR result, the method comprising: receiving from a first ASR engine a set of confidence features of an acoustic utterance and an associated first speech recognition result representing the acoustic utterance selected from a plurality of potential results based on analysis of the set of confidence features; receiving from a second ASR engine a second speech recognition result representing the acoustic utterance; and selecting between the first speech recognition result and the second speech recognition result based on one or more of the confidence features. 10. The method of claim 9 , wherein the speech recognition result is derived by a confidence classifier trained on a dataset including the confidence features to maximally discriminate between correct and incorrect definitions. 11. The method of claim 9 , wherein the first ASR engine decodes the acoustic utterance based on a first acoustic model and a first language model and the second ASR engine decodes the acoustic utterance based on a second acoustic model and a second language model. 12. The method of claim 9 , further comprising: receiving from the second ASR engine another set of confidence features used in generating the second first speech recognition result. 13. The method of claim 9 , wherein the first ASR engine is executed by a processor on a client device and the method further comprises: transmitting the selected result to a client device. 14. The method of claim 9 , further comprising receiving a confidence score from each of the first ASR engine and the second ASR engine. 15. The method of claim 9 , wherein the first speech recognition result and the second speech recognition result are received at an arbitrator and the method further comprises: training the arbitrator on datasets correlating confidence features with success and failure scenarios of the first ASR engine and the second ASR engine. 16. The method of claim 9 , wherein the confidence features include acoustic model features. 17. The method of claim 9 , wherein the confidence features include language-model features. 18. A method of transforming an acoustic utterance into a transcribed speech recognition result by initiating arbitration to select between a first speech recognition result and a second speech recognition result, the method comprising: transmitting a set of confidence features of an acoustic utterance and the first speech recognition result to an arbitrator, the first speech recognition result representing the acoustic utterance as transcribed by a first ASR engine selected from a plurality of potential results based on analysis of the set of confidence features; and receiving, from the arbitrator, an arbitrated result representing the first speech recognition result transcribed by either the first ASR engine or the second speech recognition transcribed by a second ASR engine, the arbitrated result selected based on the set of confidence features. 19. The method of claim 18 , further comprising: computing the confidence features based on the acoustic utterance; and computing a confidence score to select the speech recognition result by maximally discriminating between correct and incorrect recognitions of the acoustic utterance. 20. The method of claim 18 , wherein the confidence features include at least one of acoustic-model features and language-model features. 21. The system of claim 1 , wherein the confidence features are derived from the acoustic utterance. 22. The system of claim 1 , wherein the confidence features each quantify an auditory, linguistic or syntactical aspect of the acoustic utterance.
using context dependencies, e.g. language models · CPC title
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.