Detection and labeling of conversational actions
US-9817817-B2 · Nov 14, 2017 · US
US10832679B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10832679-B2 |
| Application number | US-201816196245-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 20, 2018 |
| Priority date | Nov 20, 2018 |
| Publication date | Nov 10, 2020 |
| Grant date | Nov 10, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment provides a computer program product for improving accuracy of a transcript of a spoken interaction. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor to cause the processor to identify a plurality of patterns in the transcript. The plurality of patterns are indicative of a group of acoustically similar words in the transcript and a corresponding local, sequential context of the group of acoustically similar words. The program instructions are further executable by the processor to cause the processor to predict conditional probabilities for the group of acoustically similar words based on a predictive model and the plurality of patterns, detect one or more transcription errors in the transcript based on the conditional probabilities, and correct the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors.
Opening claim text (preview).
What is claimed is: 1. A computer program product for improving accuracy of a transcript of a spoken interaction, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: identify a group of acoustically similar words in the transcript; identify a corresponding local, sequential context of the group of acoustically similar words, wherein the corresponding local, sequential context is indicative of one or more common features across portions of the transcript in which the group of acoustically similar words occur; predict conditional probabilities for the group of acoustically similar words based on a predictive model and the corresponding local, sequential context; detect one or more transcription errors in the transcript based on the conditional probabilities; and correct the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors. 2. The computer program product of claim 1 , wherein the predictive model comprises a recurrent neural network. 3. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: identify the group of acoustically similar words within and across turns-at-talk of the spoken interaction; wherein the corresponding local, sequential context represents the one or more common features identified in environments in which word repeats of the group of acoustically similar words occurs. 4. The computer program product of claim 3 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more subsequent word mentions of the word occurring within one or more subsequent speaker turns over one or more prior word mentions of the word if the one or more subsequent word mentions occur in one or more environments substantially similar to the corresponding local, sequential context. 5. The computer program product of claim 3 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more longer local phrases around one or more word mentions of the word that match. 6. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words: determine whether a corresponding conditional probability for the word is less than a pre-determined threshold; and label the word as a transcription error in response to determining the corresponding conditional probability is less than the pre-determined threshold. 7. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, determine a corresponding contextual score for the word, wherein the contextual score is a sum of a corresponding conditional probability for the word and a number of repetition occurrences of the word in the transcript; select a word from the group of acoustically similar words with a corresponding contextual score that is the highest across the group of acoustically similar words; and correct each transcription error based on the word selected. 8. The computer program product of claim 7 , wherein the multi-pass correction is repeated until a confidence level of a transcript resulting from the multi-pass correction meets or exceeds a pre-defined threshold for accuracy. 9. The computer program product of claim 1 , wherein the predictive model is trained offline on a server device. 10. A system for improving accuracy of a transcript of a spoken interaction, comprising: at least one processor; and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations including: identifying a group of acoustically similar words in the transcript; identifying a corresponding local, sequential context of the group of acoustically similar words, wherein the corresponding local, sequential context is indicative of one or more common features across portions of the transcript in which the group of acoustically similar words occur; predicting conditional probabilities for the group of acoustically similar words based on a predictive model and the corresponding local, sequential context; detecting one or more transcription errors in the transcript based on the conditional probabilities; and correcting the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors. 11. The system of claim 10 , wherein the predictive model comprises a recurrent neural network. 12. The system of claim 10 , wherein identifying a plurality of patterns in a transcript comprises: identifying the group of acoustically similar words within and across turns-at-talk of the spoken interaction; wherein the corresponding local, sequential context represents the one or more common features identified in environments in which word repeats of the group of acoustically similar words occurs. 13. The system of claim 12 , wherein identifying a plurality of patterns in a transcript further comprises: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more subsequent word mentions of the word occurring within one or more subsequent speaker turns over one or more prior word mentions of the word if the one or more subsequent word mentions occur in one or more environments substantially similar to the corresponding local, sequential context. 14. The system of claim 12 , wherein identifying a plurality of patterns in a transcript further comprises: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more longer local phrases around one or more word mentions of the word that match. 15. The system of claim 10 , wherein detecting one or more transcription errors in the transcript based on the conditional probabilities comprises: for each word of the group of acoustically similar words: determining whether a corresponding conditional probability for the word is less than a pre-determined threshold; and labeling the word as a transcription error in response to determining the corresponding conditional probability is less than the pre-determined threshold. 16. The system of claim 10 , wherein correcting the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors comprises: for each word of the group of acoustically similar words, determining a corresponding contextual score for the word, wherein the contextual score is a sum of a corresponding conditional probability for the word and a number of repetition occurrences of the word in the transcript; selecting a word from the group of acoustically similar words with a corresponding contextual score that is the highest across the group of acoustically similar words; and correcting each transcription error based on the word selected. 17. The system of claim 16 , wherein the multi-pass correction is repeated until a
Recurrent networks, e.g. Hopfield networks · CPC title
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Learning methods · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.