Using models to detect potential significant errors in speech recognition results

US9564126B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9564126-B2
Application numberUS-201414557321-A
CountryUS
Kind codeB2
Filing dateDec 1, 2014
Priority dateJul 9, 2012
Publication dateFeb 7, 2017
Grant dateFeb 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recognition result to determine whether the recognition result includes such an indication. In some embodiments, a word/phrase of a set that appears in the result is iteratively replaced with each of the other words/phrases of the set. The result of the replacement may be evaluated using a language model to determine a likelihood of the newly-created string of words appearing in a language and/or domain. The likelihood may then be evaluated to determine whether the result of the replacement is sufficiently likely for an alert to be triggered.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of processing a result of a recognition by an automatic speech recognition (ASR) system on a speech input, the method comprising: determining whether the result includes a member of a set of words or phrases, the set of words or phrases comprising a plurality of members and each member of the set comprising a word or phrase; and when it is determined that the result includes a word or phrase of a first member of the set, producing a modified result by substituting a word or phrase of a second member of the set for the word or phrase of the first member in the result, evaluating the modified result using at least one language model and/or at least one acoustic model, and determining, based on a result of the evaluating, whether to trigger presentation of an alert to a user via a user interface. 2. The method of claim 1 , wherein: the evaluating the modified result comprises evaluating the modified result using a language model related to a domain to which the speech input relates and/or a language of the speech input; and the determining whether to trigger an alert comprises triggering an alert based at least in part on an evaluation of the likelihood. 3. The method of claim 2 , wherein: the evaluating the modified result using the language model related to the domain and/or the language comprises determining a likelihood of the modified result occurring in the domain and/or the language; and the determining whether to trigger an alert comprises weighting the likelihood produced by the evaluating using the language model based on information indicating a significance of consequence in the domain associated with misrecognizing the speech input as specifying the first member rather than the second member. 4. The method of claim 2 , wherein: the evaluating the modified result using the language model related to the domain and/or the language comprises determining a likelihood of the modified result occurring in the domain and/or the language; and the determining the likelihood of the modified result comprises determining the likelihood using a language model for the domain that produces a likelihood weighted using information indicating a significance of consequence in the domain associated with misrecognizing at least the second member. 5. The method of claim 2 , wherein the determining whether to trigger an alert based on a result of the evaluating comprises: determining whether the likelihood of the modified result occurring is above a threshold; and when the likelihood is above the threshold, triggering the alert. 6. The method of claim 1 , wherein: the determining whether the result includes the member of the set comprises, when the member is a null word, determining that the result includes the null word without evaluating words of the result, and the producing the modified result by substituting the word or phrase of the second member comprises inserting the word or phrase of the second member into the result. 7. The method of claim 1 , further comprising, when it is determined that an alert should be triggered, triggering an alert identifying that a speaker from which the speech input was received may have misspoken. 8. At least one non-transitory computer-readable storage medium having encoded thereon computer-executable instructions that, when executed by at least one computer, cause the at least one computer to carry out a method of processing a result of a recognition by an automatic speech recognition (ASR) system on a speech input, the method comprising: determining whether the result includes a member of a set of words or phrases, the set of words or phrases comprising a plurality of members and each member of the set comprising a word or phrase; and when it is determined that the result includes a word or phrase of a first member of the set, producing a modified result by substituting a word or phrase of a second member of the set for the word or phrase of the first member in the result, evaluating the modified result using at least one language model and/or at least one acoustic model, and determining, based on a result of the evaluating, whether to trigger presentation of an alert to a user via a user interface. 9. The at least one computer-readable storage medium of claim 8 , wherein the evaluating the modified result comprises evaluating the modified result using a language model related to a domain to which the speech input relates and/or a language of the speech input; and the determining whether to trigger an alert comprises triggering an alert based at least in part on an evaluation of the likelihood. 10. The at least one computer-readable storage medium of claim 9 , wherein: the evaluating the modified result using the language model related to the domain and/or the language comprises determining a likelihood of the modified result occurring in the domain and/or the language; and the determining whether to trigger an alert comprises weighting the likelihood produced by the evaluating using the language model based on information indicating a significance of consequence in the domain associated with misrecognizing the speech input as specifying the first member rather than the second member. 11. The at least one computer-readable storage medium of claim 9 , wherein the determining whether to trigger an alert based on a result of the evaluating comprises: determining whether the likelihood of the modified result occurring is above a threshold; and when the likelihood is above the threshold, triggering the alert. 12. An apparatus comprising: at least one processor; and at least one storage medium having encoded thereon processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to carry out a method of processing a result of a recognition by an automatic speech recognition (ASR) system on a speech input, the method comprising: determining whether the result includes a member of a set of words or phrases, the set of words or phrases comprising a plurality of members and each member of the set comprising a word or phrase; and when it is determined that the result includes a word or phrase of a first member of the set, producing a modified result by substituting a word or phrase of a second member of the set for the word or phrase of the first member in the result, evaluating the modified result using at least one language model and/or at least one acoustic model, and determining, based on a result of the evaluating, whether to trigger presentation of an alert to a user via a user interface. 13. The apparatus of claim 12 , wherein: the evaluating the modified result comprises evaluating the modified result using a language model related to a domain to which the speech input relates and/or a language of the speech input; and the determining whether to trigger an alert comprises triggering an alert based at least in part on an evaluation of the likelihood. 14. The apparatus of claim 13 , wherein: the evaluating the modified result using the language model related to the domain and/or the language comprises determining a likelihood of the modified result occurring in the domain and/or the language; and the determining whether to trigger an alert comprises weighting the likelihood produced by the evaluating using the language model based on information indicating a significance of consequence in the domain associated with misrecognizing the speech input as specifying the first member rather than the second member. 15. The apparatus of claim 13 , wherein determin

Assignees

Inventors

Classifications

  • using context dependencies, e.g. language models · CPC title

  • G10L15/18Primary

    using natural language modelling · CPC title

  • G10L15/01Primary

    Assessment or evaluation of speech recognition systems · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9564126B2 cover?
In some embodiments, a recognition result produced by a speech processing system based on an analysis of a speech input is evaluated for indications of potential errors. In some embodiments, sets of words/phrases that may be acoustically similar or otherwise confusable, the misrecognition of which can be significant in the domain, may be used together with a language model to evaluate a recogni…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/18. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 9 related publications on this page (citations in our corpus or others sharing the same primary CPC).