What technology area does this patent fall under?

Primary CPC classification G10L15/26. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for correcting speech-to-text auto-transcription using local context of talk

US10832679B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10832679-B2
Application number	US-201816196245-A
Country	US
Kind code	B2
Filing date	Nov 20, 2018
Priority date	Nov 20, 2018
Publication date	Nov 10, 2020
Grant date	Nov 10, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment provides a computer program product for improving accuracy of a transcript of a spoken interaction. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor to cause the processor to identify a plurality of patterns in the transcript. The plurality of patterns are indicative of a group of acoustically similar words in the transcript and a corresponding local, sequential context of the group of acoustically similar words. The program instructions are further executable by the processor to cause the processor to predict conditional probabilities for the group of acoustically similar words based on a predictive model and the plurality of patterns, detect one or more transcription errors in the transcript based on the conditional probabilities, and correct the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product for improving accuracy of a transcript of a spoken interaction, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: identify a group of acoustically similar words in the transcript; identify a corresponding local, sequential context of the group of acoustically similar words, wherein the corresponding local, sequential context is indicative of one or more common features across portions of the transcript in which the group of acoustically similar words occur; predict conditional probabilities for the group of acoustically similar words based on a predictive model and the corresponding local, sequential context; detect one or more transcription errors in the transcript based on the conditional probabilities; and correct the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors. 2. The computer program product of claim 1 , wherein the predictive model comprises a recurrent neural network. 3. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: identify the group of acoustically similar words within and across turns-at-talk of the spoken interaction; wherein the corresponding local, sequential context represents the one or more common features identified in environments in which word repeats of the group of acoustically similar words occurs. 4. The computer program product of claim 3 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more subsequent word mentions of the word occurring within one or more subsequent speaker turns over one or more prior word mentions of the word if the one or more subsequent word mentions occur in one or more environments substantially similar to the corresponding local, sequential context. 5. The computer program product of claim 3 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more longer local phrases around one or more word mentions of the word that match. 6. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words: determine whether a corresponding conditional probability for the word is less than a pre-determined threshold; and label the word as a transcription error in response to determining the corresponding conditional probability is less than the pre-determined threshold. 7. The computer program product of claim 1 , wherein the program instructions are further executable by the processor to cause the processor to: for each word of the group of acoustically similar words, determine a corresponding contextual score for the word, wherein the contextual score is a sum of a corresponding conditional probability for the word and a number of repetition occurrences of the word in the transcript; select a word from the group of acoustically similar words with a corresponding contextual score that is the highest across the group of acoustically similar words; and correct each transcription error based on the word selected. 8. The computer program product of claim 7 , wherein the multi-pass correction is repeated until a confidence level of a transcript resulting from the multi-pass correction meets or exceeds a pre-defined threshold for accuracy. 9. The computer program product of claim 1 , wherein the predictive model is trained offline on a server device. 10. A system for improving accuracy of a transcript of a spoken interaction, comprising: at least one processor; and a non-transitory processor-readable memory device storing instructions that when executed by the at least one processor causes the at least one processor to perform operations including: identifying a group of acoustically similar words in the transcript; identifying a corresponding local, sequential context of the group of acoustically similar words, wherein the corresponding local, sequential context is indicative of one or more common features across portions of the transcript in which the group of acoustically similar words occur; predicting conditional probabilities for the group of acoustically similar words based on a predictive model and the corresponding local, sequential context; detecting one or more transcription errors in the transcript based on the conditional probabilities; and correcting the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors. 11. The system of claim 10 , wherein the predictive model comprises a recurrent neural network. 12. The system of claim 10 , wherein identifying a plurality of patterns in a transcript comprises: identifying the group of acoustically similar words within and across turns-at-talk of the spoken interaction; wherein the corresponding local, sequential context represents the one or more common features identified in environments in which word repeats of the group of acoustically similar words occurs. 13. The system of claim 12 , wherein identifying a plurality of patterns in a transcript further comprises: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more subsequent word mentions of the word occurring within one or more subsequent speaker turns over one or more prior word mentions of the word if the one or more subsequent word mentions occur in one or more environments substantially similar to the corresponding local, sequential context. 14. The system of claim 12 , wherein identifying a plurality of patterns in a transcript further comprises: for each word of the group of acoustically similar words, across speaker turns, increase a weight of one or more longer local phrases around one or more word mentions of the word that match. 15. The system of claim 10 , wherein detecting one or more transcription errors in the transcript based on the conditional probabilities comprises: for each word of the group of acoustically similar words: determining whether a corresponding conditional probability for the word is less than a pre-determined threshold; and labeling the word as a transcription error in response to determining the corresponding conditional probability is less than the pre-determined threshold. 16. The system of claim 10 , wherein correcting the one or more transcription errors by applying a multi-pass correction on the one or more transcription errors comprises: for each word of the group of acoustically similar words, determining a corresponding contextual score for the word, wherein the contextual score is a sum of a corresponding conditional probability for the word and a number of repetition occurrences of the word in the transcript; selecting a word from the group of acoustically similar words with a corresponding contextual score that is the highest across the group of acoustically similar words; and correcting each transcription error based on the word selected. 17. The system of claim 16 , wherein the multi-pass correction is repeated until a

Assignees

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/08
Learning methods · CPC title
G10L15/26Primary
Speech to text systems (G10L15/08 takes precedence) · CPC title

Patent family

Related publications grouped by family.

View patent family 70726704

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10832679B2 cover?: One embodiment provides a computer program product for improving accuracy of a transcript of a spoken interaction. The computer program product comprises a computer readable storage medium having program instructions embodied therewith. The program instructions are executable by a processor to cause the processor to identify a plurality of patterns in the transcript. The plurality of patterns a…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G10L15/26. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 10 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Detection and labeling of conversational actions

Data driven speech enabled self-help systems and methods of operating thereof

System and Method of Automated Model Adaptation

Frequently asked questions