Methods, systems and apparatuses for improved speech recognition and transcription
US-11869507-B2 · Jan 9, 2024 · US
US9653066B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9653066-B2 |
| Application number | US-60465009-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 23, 2009 |
| Priority date | Oct 23, 2009 |
| Publication date | May 16, 2017 |
| Grant date | May 16, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are systems, methods, and computer-readable storage media for estimating reliability of alternate speech recognition hypotheses. A system configured to practice the method receives an N-best list of speech recognition hypotheses and features describing the N-best list, determines a first probability of correctness for each hypothesis in the N-best list based on the received features, determines a second probability that the N-best list does not contain a correct hypothesis, and uses the first probability and the second probability in a spoken dialog. The features can describe properties of at least one of a lattice, a word confusion network, and a garbage model. In one aspect, the N-best lists are not reordered according to reranking scores. The determination of the first probability of correctness can include a first stage of training a probabilistic model and a second stage of distributing mass over items in a tail of the N-best list.
Opening claim text (preview).
We claim: 1. A method comprising: receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance; receiving an acoustic score of each word in the N-best list of speech recognition hypotheses; receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses; receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator; determining, via a processor and based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words; determining, via the processor, a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and using the first probability and the second probability in a spoken dialog. 2. The method of claim 1 , wherein the speech recognition hypotheses are stored in a word confusion network. 3. The method of claim 1 , wherein the processor is configured to perform speech language generation. 4. The method of claim 1 , wherein determining the first probability of correctness comprises two stages. 5. The method of claim 4 , wherein a first stage of the two stages comprises training a discriminative model P a . 6. The method of claim 5 , wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list. 7. The method of claim 1 , wherein the processor is configured to perform spoken language understanding. 8. The method of claim 1 , the processor is configured to perform automatic speech recognition, and further comprising using the first probability and the second probability in the automatic speech recognition. 9. A system comprising: a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, result in the processor performing operations comprising: receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance; receiving an acoustic score of each word in the N-best list of speech recognition hypotheses; receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses; receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator; determining, based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words; determining a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and using the first probability and the second probability in a spoken dialog. 10. The system of claim 9 , wherein the speech recognition hypotheses are stored in a garbage model. 11. The system of claim 9 , wherein the processor is configured to perform speech language generation. 12. The system of claim 9 , wherein determining the first probability of correctness comprises two stages. 13. The system of claim 12 , wherein a first stage of the two stages comprises training a discriminative model P a . 14. The system of claim 13 , wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list. 15. A computer-readable storage medium having instructions stored which, when executed by a computing device, cause the computing device perform operations comprising: receiving an N-best list of speech recognition hypotheses from a speech utterance, wherein the N-best list of speech recognition hypotheses comprises words recognized from the speech utterance; receiving an acoustic score of each word in the N-best list of speech recognition hypotheses; receiving a count indicating a number of words associated with each hypothesis in the speech recognition hypotheses; receiving an indication of problematic words in the each hypothesis in the N-best list of speech recognition hypotheses, wherein the indication is determined by a reliability estimator; determining, based on a feature set evaluated by an algorithm, a first probability of correctness for the each hypothesis in the N-best list of speech recognition hypotheses, the feature set evaluated by the algorithm comprising the count, the acoustic score, and the indication of problematic words; determining a second probability that the N-best list of speech recognition hypotheses does not contain a correct hypothesis using the reliability estimator; and using the first probability and the second probability in a spoken dialog. 16. The computer-readable storage medium of claim 15 , wherein the speech recognition hypotheses are stored in one of a word confusion network and a garbage model. 17. The computer-readable storage medium of claim 15 , wherein the computing device is configured to perform speech language generation. 18. The computer-readable storage medium of claim 15 , wherein determining the first probability of correctness comprises two stages. 19. The computer-readable storage medium of claim 18 , wherein a first stage of the two stages comprises training a discriminative model P a . 20. The computer-readable storage medium of claim 19 , wherein a second stage of the two stages comprises distributing mass over items in a tail of the N-best list.
using statistical models, e.g. Hidden Markov Models [HMMs] (G10L15/18 takes precedence) · CPC title
Recognition networks (G10L15/142, G10L15/16 take precedence) · CPC title
Segmentation; Word boundary detection · CPC title
Constructional details of speech recognition systems · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.