Methods, systems and apparatuses for improved speech recognition and transcription
US-11869507-B2 · Jan 9, 2024 · US
US9542931B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9542931-B2 |
| Application number | US-201414521990-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 23, 2014 |
| Priority date | Oct 27, 2010 |
| Publication date | Jan 10, 2017 |
| Grant date | Jan 10, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for improving speech recognition on a computing device, the method comprising: accessing, on the computing device, a log file comprising a plurality of speech dialogues, wherein each of the plurality of speech dialogues is associated with an interaction context and comprises one or more speech utterances; automatically extracting from the log file: one or more features associated with the one or more speech utterances; a first confidence score associated with the one or more features associated with the one or more speech utterances; one or more dialog-level features from the speech dialog, each dialog-level feature corresponding to a specific characteristic of the speech dialog; and a value associated with each of the one or more dialog-level features from the speech dialog; and using the values associated with the one or more dialog-level features to adjust the first confidence score associated with one or more of the speech utterances. 2. The method of claim 1 , wherein the dialog-level features include a position of each utterance in the speech dialog. 3. The method of claim 1 , wherein the dialog-level features include a degree in which re-prompting occurred for one or more utterances in the speech dialog. 4. The method of claim 1 , wherein the adjusting of the first confidence score is associated with recalibrating one or more speech recognition models on the computing device. 5. The method of claim 1 , wherein the adjusting of the first confidence score is associated with recalibrating a confidence classifier module. 6. The method of claim 1 , wherein the one more features associated with the one or more speech utterances includes a degree to which an acoustic match is determined for each speech utterance. 7. The method of claim 1 , wherein the one or more features associated with the one or more speech utterances includes a noise of an acoustic signal associated with each speech utterance. 8. The method of claim 1 , wherein the one or more features associated with the one or more speech utterances includes a degree to which a first recognition for a speech utterance is similar to a second recognition for the speech utterance. 9. A system comprising: one or more processors; and a memory coupled to the one or more processors, the memory for storing instructions which, when executed by the one or more processors, performs a method for improving speech recognition on a computing device, the method comprising: accessing, on the computing device, a log file comprising a plurality of speech dialogues, wherein each of the plurality of speech dialogues is associated with an interaction context and comprises one or more speech utterances; automatically extracting from the log file: one or more features associated with the one or more speech utterances; a first confidence score associated with the one or more features associated with the one or more speech utterances; one or more dialog-level features from the speech dialog, each dialog-level feature corresponding to a specific characteristic of the speech dialog; and a value associated with each of the one or more dialog-level features from the speech dialog; and using the values associated with the one or more dialog-level features to adjust the first confidence score associated with one or more of the speech utterances. 10. The system of claim 9 , wherein the dialog-level features further include at least a degree in which re-prompting occurred for one or more utterances in the speech dialog. 11. The system of claim 9 , wherein the adjusting of the first confidence score is associated with recalibrating one or more speech recognition models on the computing device. 12. The system of claim 9 , wherein the adjusting of the first confidence score is associated with recalibrating a confidence classifier module. 13. The system of claim 9 , wherein the one more features associated with the one or more speech utterances includes a degree to which an acoustic match is determined for each speech utterance. 14. The system of claim 9 , wherein the one or more features associated with the one or more speech utterances includes a noise of an acoustic signal associated with each speech utterance. 15. The system of claim 9 , wherein the one or more features associated with the one or more speech utterances includes a degree to which a first recognition for a speech utterance is similar to a second recognition for the speech utterance. 16. The system of claim 9 , wherein the log file includes contextual information for the one or more speech utterances. 17. The system of claim 16 , wherein the contextual information includes information from previous and future speech utterances in the speech dialog. 18. The system of claim 9 , wherein the speech dialog includes a plurality of dialog events, and wherein the one or more dialog-level features are derived from the log file for a first dialog event of the plurality of dialog events occurring previous to a current speech utterance and for a second dialog event of the plurality of dialog events occurring after the current speech utterance. 19. A hardware device comprising instructions that, when executed by a computing device, cause the computing device to: access, on the computing device, a log file comprising a plurality of speech dialogues, wherein each of the plurality of speech dialogues is associated with an interaction context and comprises one or more speech utterances; automatically extract from the log file: one or more features associated with the one or more speech utterances; a first confidence score associated with the one or more features associated with the one or more speech utterances; one or more dialog-level features from the speech dialog, each dialog-level feature corresponding to a specific characteristic of the speech dialog; and a value associated with each of the one or more dialog-level features from the speech dialog; and use the values associated with the one or more dialog-level features to adjust the first confidence score associated with one or more of the speech utterances. 20. The hardware device of claim 19 , wherein the dialog-level features further include a value associated with re-prompting for the one or more utterances in the speech dialog.
Related publications grouped by family.
Answers are generated from the same data shown on this page.