Systems and methods for correcting automatic speech recognition errors

US11922926B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11922926-B2
Application numberUS-202117474080-A
CountryUS
Kind codeB2
Filing dateSep 14, 2021
Priority dateSep 14, 2021
Publication dateMar 5, 2024
Grant dateMar 5, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may include processor(s), and memory in communication with the processor(s) and storing instructions configured to cause the system to correct ASR errors. The system may receive a transcription comprising transcribed word(s) and may determine whether the transcribed word(s) exceed associated predefined confidence level(s). Responsive to determining a transcribed word does not exceed a predefined confidence level, the system may generate a predicted word. The system may calculate a distance between numerical representations of the transcribed word and the predicted word and may determine whether the distance exceeds a predefined threshold. Responsive to determining the distance exceeds the predefined threshold, the system may determine whether at least one red flag word of a list of red flag words corresponds to a context of the transcription, and, responsive to making that determination, may classify the transcription as associated with a first category.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for correcting automatic speech recognition (ASR) errors comprising: one or more processors; and a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to: receive, via an ASR model, a transcription comprising one or more transcribed words; retrieve one or more respective predefined confidence levels associated with the one or more transcribed words; determine whether the one or more transcribed words exceed the one or more respective predefined confidence levels; responsive to determining that a first transcribed word of the one or more transcribed words does not exceed a first respective predefined confidence level, generate, using a first machine learning model, a first predicted word; generate, using the first machine learning model, a first numerical representation of the first transcribed word and a second numerical representation of the first predicted word; calculate a distance between the first and second numerical representations; determine whether the distance exceeds a predefined threshold; and responsive to determining that the distance exceeds the predefined threshold: retrieve a first list comprising a plurality of predefined red flag words; determine whether at least one of the plurality of predefined red flag words corresponds to a context of a grouping of transcribed words surrounding the first transcribed word by iteratively substituting each predefined red flag word of the plurality of predefined red flag words for the first transcribed word; and responsive to determining the at least one of the plurality of predefined red flag words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with a first category. 2. The system of claim 1 , wherein retrieving the first list comprising the plurality of predefined red flag words comprises: comparing, using a second machine learning model, a first sample of predefined calls associated with the first category to a second sample of predefined calls associated with a second category; identifying one or more words frequently used in the first sample but not in the second sample; and labeling the one or more words as being associated with the first category. 3. The system of claim 1 , wherein classifying the transcription as being associated with the first category comprises transmitting the transcription to a transcription review queue. 4. The system of claim 1 , wherein the instructions are further configured to cause the system to: generate, using a second machine learning model, a second list of words, wherein each word of the second list of words is a synonym to at least one predefined red flag word of the plurality of predefined red flag words; determine whether at least one of the second list of words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word by iteratively substituting each word of the second list of words for the first transcribed word; and responsive to determining the at least one of the second list of words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with the first category. 5. The system of claim 1 , wherein the instructions are further configured to cause the system to: identify, using a second machine learning model, one or more first respective phonemes associated with each of the plurality of predefined red flag words; identify, using the second machine learning model, one or more second respective phonemes associated with each of the one or more transcribed words; determine whether at least one of the one or more first respective phonemes matches at least one of the one or more second respective phonemes; and responsive to determining the at least one of the one or more first respective phonemes matches the at least one of the one or more second respective phonemes, classify the transcription as being associated with the first category. 6. The system of claim 1 , wherein the instructions are further configured to cause the system to: responsive to determining that the one or more transcribed words exceed the one or more respective predefined confidence levels, determine whether a second transcribed word of the one or more transcribed words is indicative of the first category; responsive to determining the second transcribed word of the one or more transcribed words is indicative of the first category, classify the transcription as being associated with the first category; and responsive to determining the second transcribed word of the one or more transcribed words is not indicative of the first category, classify the transcription as being associated with a second category. 7. The system of claim 1 , wherein the instructions are further configured to cause the system to: responsive to determining that a cosine distance does not exceed the predefined threshold, classify the transcription as being associated with a second category; and responsive to determining that none of the plurality of predefined red flag words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with the second category. 8. A system for correcting automatic speech recognition (ASR) errors comprising: one or more processors; and a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to: receive, via an ASR model, a transcription comprising one or more transcribed words; retrieve one or more respective predefined confidence levels associated with the one or more transcribed words; determine whether the one or more transcribed words exceed the one or more respective predefined confidence levels; responsive to determining that a first transcribed word of the one or more transcribed words does not exceed a first respective predefined confidence level, generate, using a first machine learning model, a first predicted word; generate, using the first machine learning model, a first numerical representation of the first transcribed word and a second numerical representation of the first predicted word; calculate a distance between the first and second numerical representations; determine whether the distance exceeds a predefined threshold; and responsive to determining that the distance exceeds the predefined threshold: retrieve a first list of words, wherein each word of the first list of words comprises a synonym to one or more predefined red flag words; determine whether at least one word of the first list of words corresponds to a context of a grouping of transcribed words surrounding the first transcribed word by iteratively substituting each word of the first list of words for the first transcribed word; and responsive to determining the at least one word of the first list of words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with a first category. 9. The system of claim 8 , wherein the distance comprises one or more of cosine distance, Euclidean distance, hamming distance, Manhattan distance, Chebyshev distance, Minkowski distance, Jaccard distance, Haversine distance, Sørensen-Dice distance, or combinations thereof. 10. The system of claim 8 , wherein the instructions are further configured to cause

Assignees

Inventors

Classifications

  • G10L15/01Primary

    Assessment or evaluation of speech recognition systems · CPC title

  • Clustering; Classification · CPC title

  • Knowledge engineering; Knowledge acquisition · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • using distance or distortion measures between unknown speech and reference templates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11922926B2 cover?
A system may include processor(s), and memory in communication with the processor(s) and storing instructions configured to cause the system to correct ASR errors. The system may receive a transcription comprising transcribed word(s) and may determine whether the transcribed word(s) exceed associated predefined confidence level(s). Responsive to determining a transcribed word does not exceed a …
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/01. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).