What technology area does this patent fall under?

Primary CPC classification G10L15/01. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Learning transcription errors in speech recognition tasks

US11211046B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11211046-B2
Application number	US-202016740761-A
Country	US
Kind code	B2
Filing date	Jan 13, 2020
Priority date	Jan 7, 2018
Publication date	Dec 28, 2021
Grant date	Dec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A mistranscription generated by a speech recognition system is identified. A received utterance is matched to a first utterance member within a set of known utterance members. The matching operation matches fewer than the first plural number of words in the received utterance and the received utterance varies in a first particular manner as compared to a first word in a first slot in the first utterance member. The received utterance is sent to a mistranscription analyzer component which increments evidence that the received utterance is evidence of a mistranscription. Once the incremented evidence for the mistranscription exceeds a threshold, future received utterances containing the mistranscription are treated as though the first word was recognized.

First claim

Opening claim text (preview).

Having described our invention, what we now claim is as follows: 1. A method for identification of a mistranscription generated by a speech recognition system comprising: matching a received utterance to a first utterance member within a set of known utterance members, wherein fewer than a first number of words in the received utterance are matched to the first number of words in the first utterance member and the received utterance varies in a first particular manner as compared to a first word in a first slot in the first utterance member; sending the received utterance to a mistranscription analyzer component; incrementing evidence, by the mistranscription analyzer, that the received utterance is evidence of a mistranscription, wherein the evidence is an occurrence of one of a substitution error, a deletion error or an insertion error; and responsive to incremented evidence for the mistranscription exceeding a threshold, treating a future received utterance containing the mistranscription as though the first word was recognized in the future received utterance. 2. The method as recited in claim 1 , wherein the received utterance contains a replacement error uses a second word in place of the first word used in a first slot in the first utterance member; wherein the incremented evidence by the mistranscription analyzer is that the received utterance is evidence of a mistranscription replacing the second word for the first word. 3. The method as recited in claim 1 , further comprising: responsive to matching a second received utterance to the first utterance member, sending the second received utterance to a mistranscription analyzer, wherein the matching matches a first plurality of the words, and a second plurality of the remaining words in the received utterance are candidate mistranscriptions; generating a first synthetic utterance via a text-to-speech sub-system of an audio stream based on replacing a first contiguous set of words assumed to be a mistranscription from the second plurality of remaining words in the first utterance member with an assumed correct replacement; transmitting the first synthetic utterance to a speech recognition engine with the above correcting feature; and responsive to a correction of the synthetic utterance to the first utterance member, accumulating evidence that the first contiguous set of words is a mistranscription of the assumed correct replacement. 4. The method as recited in claim 1 , wherein the mistranscription analyzer matches a received utterance to a respective utterance member with a different number of words and with a single first candidate mistranscription results in a greater evidence for the first candidate mistranscription that contains one more contiguous words that do not exactly match one or more contiguous words in the respective utterance member. 5. The method as recited in claim 2 , wherein the mistranscription analyzer uses a rule that increments evidence for a mistranscription of the second word for the first word in a second utterance member which also includes the first word, based on the received utterance matching the first utterance member, wherein an amount of evidence incremented for the mistranscription in the second utterance member is less than an amount of evidence incremented for the mistranscription in the first utterance member. 6. The method as recited in claim 1 , wherein the mistranscription analyzer increments evidence for a mistranscription in the first manner at the first slot based on multiple received utterances from a first user having the mistranscription in the first manner at the first slot. 7. The method as recited in claim 2 , further comprising incrementing evidence, by the mistranscription analyzer, that the received utterance is evidence of a mistranscription of the second word for the first word each time a received utterance is matched to the first utterance member so that with each received utterance where the second word is transcribed in place of the first word in the received utterance, the greater the evidence that is accumulated for the mistranscription. 8. The method as recited in claim 1 , wherein the mistranscription analyzer uses a phonetic based rule that a greater degree of phonetic similarity between a second word in the received utterance and the first word at the first slot in the first utterance member results in a greater amount of evidence being incremented per instance of the received utterance than if no such phonetic similarity is detected. 9. The method as recited in claim 1 , wherein the evidence is a deletion error where the first word is omitted in the received utterance at the first slot. 10. Apparatus, comprising: a processor; computer memory holding computer program instructions executed by the processor for identification of a mistranscription generated by a speech recognition system, the computer program instructions comprising: program code, operative to match a received utterance to a first utterance member within a set of known utterance members, wherein fewer than a first number of words in the received utterance are matched to the first number of words in the first utterance member and the received utterance varies in a first particular manner as compared to a first word in a first slot in the first utterance member; program code, operative to send the received utterance to a mistranscription analyzer component; program code, operative to increment evidence, by the mistranscription analyzer, that the received utterance is evidence of a mistranscription, wherein the evidence is an occurrence of one of a substitution error, a deletion error or an insertion error; and program code responsive to incremented evidence for the mistranscription exceeding a threshold, operative to treat a future received utterance containing the mistranscription as though the first word was recognized in the future received utterance. 11. The apparatus as recited in claim 10 , wherein the received utterance contains a replacement error which uses a second word in place of the first word used in a first slot in the first utterance member; wherein the incremented evidence by the mistranscription analyzer is that the received utterance is evidence of a mistranscription replacing the second word for the first word. 12. The apparatus as recited in claim 10 , further comprising: program code responsive to matching a second received utterance to the first utterance member, operative to send the second received utterance to a mistranscription analyzer, wherein the matching matches a first plurality of the words, and a second plurality of the remaining words in the received utterance are candidate mistranscriptions; program code, operative to generate a first synthetic utterance via a text-to-speech sub-system of an audio stream based on replacing a first contiguous set of words assumed to be a mistranscription from the second plurality of remaining words in the first utterance member with an assumed correct replacement; program code, operative to transmit the first synthetic utterance to a speech recognition engine with the above correcting feature; and program code responsive to a correction of the synthetic utterance to the first utterance member, operative to accumulate evidence that the first contiguous set of words is a mistranscription of the assumed correct replacement. 13. The apparatus as recited in claim 11 , wherein the mistranscription analyzer increments evidence for a mistranscription of the second word for the first word wherein evidence of a mistranscription for a first user who uttered the received utterance is greater than eviden

Assignees

Inventors

Classifications

G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L15/01Primary
Assessment or evaluation of speech recognition systems · CPC title
G06N20/20
Ensemble learning · CPC title
G06N20/00
Machine learning · CPC title
G10L15/063
Training · CPC title

Patent family

Related publications grouped by family.

View patent family 67140986

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11211046B2 cover?: A mistranscription generated by a speech recognition system is identified. A received utterance is matched to a first utterance member within a set of known utterance members. The matching operation matches fewer than the first plural number of words in the received utterance and the received utterance varies in a first particular manner as compared to a first word in a first slot in the first …
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G10L15/01. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generating structured text content using speech recognition models

Method for recognizing voice signal and electronic device supporting the same

Negative n-gram biasing

Concatenated expected responses for speech recognition

Generating Language Models

Methods and systems for identifying errors in a speech regonition system

Frequently asked questions