Incremental utterance processing and semantic stability determination
US-10102851-B1 · Oct 16, 2018 · US
US11600279B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11600279-B2 |
| Application number | US-201917279512-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 26, 2019 |
| Priority date | Oct 8, 2018 |
| Publication date | Mar 7, 2023 |
| Grant date | Mar 7, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method to transcribe communications may include obtaining audio data originating at a first device during a communication session between the first device and a second device and providing the audio data to an automated speech recognition system configured to transcribe the audio data. The method may further include obtaining multiple hypothesis transcriptions generated by the automated speech recognition system. Each of the multiple hypothesis transcriptions may include one or more words determined by the automated speech recognition system to be a transcription of a portion of the audio data. The method may further include determining one or more consistent words that are included in two or more of the multiple hypothesis transcriptions and in response to determining the one or more consistent words, providing the one or more consistent words to the second device for presentation of the one or more consistent words by the second device.
Opening claim text (preview).
We claim: 1. A method to transcribe communications, the method comprising: obtaining a plurality of hypothesis transcriptions of audio data; determining a plurality of consistent words that are included in two or more of the plurality of hypothesis transcriptions; in response to determining the plurality of consistent words, directing the plurality of consistent words to a device for presentation of the plurality of consistent words; and presenting the plurality of consistent words in a rolling fashion, a pace of the presentation of the plurality of consistent words in the rolling fashion being variable such that when more words are to be presented the words are presented quicker than when fewer words are to be presented. 2. The method of claim 1 , further comprising: determining an update word in a final transcription of the audio data that is different from any of the plurality of consistent words; and in response to determining the update word, directing an indication of the update word to the device, the update word replacing one or more of the plurality of consistent words in the presentation of the plurality of consistent words. 3. The method of claim 1 , wherein the presentation of the plurality of consistent words by the device is configured to occur before a final transcription of the audio data is provided to the device. 4. The method of claim 1 , wherein the plurality of hypothesis transcriptions are obtained from a speech recognition system that includes a plurality of speech engines configured to recognize speech. 5. At least one non-transitory computer-readable media configured to store one or more instructions that when executed by at least one processor cause or direct a system to perform the method of claim 1 . 6. A method to transcribe communications, the method comprising: obtaining a plurality of hypothesis transcriptions of audio data; determining one or more consistent words that are included in two or more of the plurality of hypothesis transcriptions; in response to determining the one or more consistent words, directing the one or more consistent words to a device for presentation of the one or more consistent words; determining an update word in a final transcription of the audio data that is different from any of the one or more consistent words; and in response to determining the update word and after directing the one or more consistent words to the device, directing an indication of the update word to the device, the update word replacing one or more of the one or more consistent words in the presentation of the one or more consistent words on the device. 7. The method of claim 6 , further comprising obtaining the audio data during a communication session between a second device and the device, the audio data originating at the second device. 8. The method of claim 6 , wherein the presentation of the one or more consistent words by the device is configured to occur before the final transcription of the audio data is provided to the device. 9. The method of claim 6 , wherein the one or more consistent words are presented in a rolling fashion, a pace of the presentation of the one or more consistent words in the rolling fashion being variable. 10. The method of claim 6 , wherein the plurality of hypothesis transcriptions are obtained sequentially over time and determining the one or more consistent words includes comparing a first hypothesis transcription of the plurality of hypothesis transcriptions with a second hypothesis transcription of the plurality of hypothesis transcriptions, the second hypothesis transcription directly following the first hypothesis transcription among the plurality of hypothesis transcriptions. 11. The method of claim 6 , wherein the plurality of hypothesis transcriptions are obtained from a speech recognition system that includes a plurality of speech engines configured to recognize speech. 12. At least one non-transitory computer-readable media configured to store one or more instructions that when executed by at least one processor cause or direct a system to perform the method of claim 6 . 13. A method to transcribe communications, the method comprising: obtaining a plurality of hypothesis transcriptions of audio data, each of the plurality of hypothesis transcriptions including one or more words determined to be a transcription of portions of the audio data; determining one or more consistent words that are included in two or more of the plurality of hypothesis transcriptions, each of the two or more of the plurality of hypothesis transcriptions including words from a first portion of the audio data; in response to determining the one or more consistent words, providing the one or more consistent words to a device for presentation of the one or more consistent words by the device, the presentation of the one or more consistent words configured to occur before a final transcription of the audio data is provided to the device; and presenting the one or more consistent words in a rolling fashion, a pace of the presentation of the one or more consistent words in the rolling fashion being variable such that when more words are to be presented the words are presented quicker than when fewer words are to be presented. 14. The method of claim 13 , wherein the plurality of hypothesis transcriptions are obtained from a speech recognition system that includes a plurality of speech engines configured to recognize speech. 15. The method of claim 13 , wherein the plurality of hypothesis transcriptions are obtained from a speech recognition system that includes a single speech engine configured to recognize speech. 16. The method of claim 13 , wherein a first portion of the audio data associated with a first one of the plurality of hypothesis transcriptions includes all of the audio data associated with at least one of the plurality of hypothesis transcriptions obtained previous to obtaining the first one of the plurality of hypothesis transcriptions. 17. The method of claim 13 , wherein the plurality of hypothesis transcriptions are obtained sequentially over time and the determining the one or more consistent words includes comparing a first hypothesis transcription of the plurality of hypothesis transcriptions with a second hypothesis transcription of the plurality of hypothesis transcriptions, the second hypothesis transcription directly following the first hypothesis transcription among the plurality of hypothesis transcriptions. 18. The method of claim 13 , further comprising: determining an update word in the final transcription that is different from any of the one or more consistent words; and in response to determining the update word and after directing the one or more consistent words to the device, directing an indication of the update word to the device, the update word replacing one or more of the one or more consistent words in the presentation of the one or more consistent words on the device. 19. At least one non-transitory computer-readable media configured to store one or more instructions that when executed by at least one processor cause or direct a system to perform the method of claim 13 .
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Announcement of recognition results · CPC title
Speech classification or search · CPC title
using speech recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.