System and method for estimating the reliability of alternate speech recognition hypotheses in real time

US9653066B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9653066-B2
Application numberUS-60465009-A
CountryUS
Kind codeB2
Filing dateOct 23, 2009
Priority dateOct 23, 2009
Publication dateMay 16, 2017
Grant dateMay 16, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.

First claim

Opening claim text (preview).

We claim: 1. A method comprising: receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance; receiving an acoustic score of each word in the N-best list of speech recognition hypotheses; receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses; receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator; determining, via a processor and based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words; determining, via the processor, a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and using the first probability and the second probability in a spoken dialog. 2. The method of claim 1 , wherein the speech recognition hypotheses are stored in a word confusion network. 3. The method of claim 1 , wherein the processor is configured to perform speech language generation. 4. The method of claim 1 , wherein determining the first probability of correctness comprises two stages. 5. The method of claim 4 , wherein a first stage of the two stages comprises training a discriminative model P a . 6. The method of claim 5 , wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list. 7. The method of claim 1 , wherein the processor is configured to perform spoken language understanding. 8. The method of claim 1 , the processor is configured to perform automatic speech recognition, and further comprising using the first probability and the second probability in the automatic speech recognition. 9. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising: receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance; receiving an acoustic score of each word in the N-best list of speech recognition hypotheses; receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses; receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator; determining, based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words; determining a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and using the first probability and the second probability in a spoken dialog. 10. The system of claim 9 , wherein the speech recognition hypotheses are stored in a garbage model. 11. The system of claim 9 , wherein the processor is configured to perform speech language generation. 12. The system of claim 9 , wherein determining the first probability of correctness comprises two stages. 13. The system of claim 12 , wherein a first stage of the two stages comprises training a discriminative model P a . 14. The system of claim 13 , wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list. 15. A computer-readable storage medium having instructions stored which, when executed by a computing device, cause the computing device perform operations comprising: receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance; receiving an acoustic score of each word in the N-best list of speech recognition hypotheses; receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses; receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator; determining, based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words; determining a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and using the first probability and the second probability in a spoken dialog. 16. The computer-readable storage medium of claim 15 , wherein the speech recognition hypotheses are stored in one of a word confusion network and a garbage model. 17. The computer-readable storage medium of claim 15 , wherein the computing device is configured to perform speech language generation. 18. The computer-readable storage medium of claim 15 , wherein determining the first probability of correctness comprises two stages. 19. The computer-readable storage medium of claim 18 , wherein a first stage of the two stages comprises training a discriminative model P a . 20. The computer-readable storage medium of claim 19 , wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list.

Assignees

Inventors

Classifications

  • using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title

  • Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title

  • Segmentation; Word boundary detection · CPC title

  • Constructional details of speech recognition systems · CPC title

  • Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9653066B2 cover?
Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received fea…
Who is the assignee on this patent?
Williams Jason, Balakrishnan Suhrid, Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 16 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).