Method and system for correcting speech-to-text auto-transcription using local context of talk

US10832679B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10832679-B2
Application numberUS-201816196245-A
CountryUS
Kind codeB2
Filing dateNov 20, 2018
Priority dateNov 20, 2018
Publication dateNov 10, 2020
Grant dateNov 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a computer program product for improving accuracy of a transcript of a spoken interaction. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor to cause the processor to identify a plurality of patterns in the transcript. The plurality of patterns are indicative of a group of acoustically similar words in the transcript and a corresponding local, sequential context of the group of acoustically similar words. The program instructions are further executable by the processor to cause the processor to predict conditional probabilities for the group of acoustically similar words based on a predictive model and the plurality of patterns, detect one or more transcription errors in the transcript based on the conditional probabilities, and correct the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for improving accuracy of a transcript of a spoken interaction, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: identify a group of acoustically similar words in the transcript; identify a corresponding local, sequential context of the group of acoustically similar words, wherein the corresponding local, sequential context is indicative of one or more common features across portions of the transcript in which the group of acoustically similar words occur; predict conditional probabilities for the group of acoustically similar words based on a predictive model and the corresponding local, sequential context; detect one or more transcription errors in the transcript based on the conditional probabilities; and correct the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors. 2. The computer program product of claim 1 , wherein the predictive model comprises a recurrent neural network. 3. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: identify the group of acoustically similar words within and across turns-at-talk of the spoken interaction; wherein the corresponding local, sequential context represents the one or more common features identified in environments in which word repeats of the group of acoustically similar words occurs. 4. The computer program product of claim 3 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more subsequent word mentions of the word occurring within one or more subsequent speaker turns over one or more prior word mentions of the word if the one or more subsequent word mentions occur in one or more environments substantially similar to the corresponding local, sequential context. 5. The computer program product of claim 3 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more longer local phrases around one or more word mentions of the word that match. 6. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words: determine whether a corresponding conditional probability for the word is less than a pre-determined threshold; and label the word as a transcription error in response to determining the corresponding conditional probability is less than the pre-determined threshold. 7. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, determine a corresponding contextual score for the word, wherein the contextual score is a sum of a corresponding conditional probability for the word and a number of repetition occurrences of the word in the transcript; select a word from the group of acoustically similar words with a corresponding contextual score that is the highest across the group of acoustically similar words; and correct each transcription error based on the word selected. 8. The computer program product of claim 7 , wherein the multi-pass correction is repeated until a confidence level of a transcript resulting from the multi-pass correction meets or exceeds a pre-defined threshold for accuracy. 9. The computer program product of claim 1 , wherein the predictive model is trained offline on a server device. 10. A system for improving accuracy of a transcript of a spoken interaction, comprising: at least one processor; and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations including: identifying a group of acoustically similar words in the transcript; identifying a corresponding local, sequential context of the group of acoustically similar words, wherein the corresponding local, sequential context is indicative of one or more common features across portions of the transcript in which the group of acoustically similar words occur; predicting conditional probabilities for the group of acoustically similar words based on a predictive model and the corresponding local, sequential context; detecting one or more transcription errors in the transcript based on the conditional probabilities; and correcting the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors. 11. The system of claim 10 , wherein the predictive model comprises a recurrent neural network. 12. The system of claim 10 , wherein identifying a plurality of patterns in a transcript comprises: identifying the group of acoustically similar words within and across turns-at-talk of the spoken interaction; wherein the corresponding local, sequential context represents the one or more common features identified in environments in which word repeats of the group of acoustically similar words occurs. 13. The system of claim 12 , wherein identifying a plurality of patterns in a transcript further comprises: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more subsequent word mentions of the word occurring within one or more subsequent speaker turns over one or more prior word mentions of the word if the one or more subsequent word mentions occur in one or more environments substantially similar to the corresponding local, sequential context. 14. The system of claim 12 , wherein identifying a plurality of patterns in a transcript further comprises: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more longer local phrases around one or more word mentions of the word that match. 15. The system of claim 10 , wherein detecting one or more transcription errors in the transcript based on the conditional probabilities comprises: for each word of the group of acoustically similar words: determining whether a corresponding conditional probability for the word is less than a pre-determined threshold; and labeling the word as a transcription error in response to determining the corresponding conditional probability is less than the pre-determined threshold. 16. The system of claim 10 , wherein correcting the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors comprises: for each word of the group of acoustically similar words, determining a corresponding contextual score for the word, wherein the contextual score is a sum of a corresponding conditional probability for the word and a number of repetition occurrences of the word in the transcript; selecting a word from the group of acoustically similar words with a corresponding contextual score that is the highest across the group of acoustically similar words; and correcting each transcription error based on the word selected. 17. The system of claim 16 , wherein the multi-pass correction is repeated until a

Assignees

Inventors

Classifications

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Learning methods · CPC title

  • G10L15/26Primary

    Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10832679B2 cover?
One embodiment provides a computer program product for improving accuracy of a transcript of a spoken interaction. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor to cause the processor to identify a plurality of patterns in the transcript. The plurality of patterns a…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).