Error correction in speech recognition
US-2022392432-A1 · Dec 8, 2022 · US
US11922926B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11922926-B2 |
| Application number | US-202117474080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 14, 2021 |
| Priority date | Sep 14, 2021 |
| Publication date | Mar 5, 2024 |
| Grant date | Mar 5, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system may include processor(s), and memory in communication with the processor(s) and storing instructions configured to cause the system to correct ASR errors. The system may receive a transcription comprising transcribed word(s) and may determine whether the transcribed word(s) exceed associated predefined confidence level(s). Responsive to determining a transcribed word does not exceed a predefined confidence level, the system may generate a predicted word. The system may calculate a distance between numerical representations of the transcribed word and the predicted word and may determine whether the distance exceeds a predefined threshold. Responsive to determining the distance exceeds the predefined threshold, the system may determine whether at least one red flag word of a list of red flag words corresponds to a context of the transcription, and, responsive to making that determination, may classify the transcription as associated with a first category.
Opening claim text (preview).
What is claimed is: 1. A system for correcting automatic speech recognition (ASR) errors comprising: one or more processors; and a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to: receive, via an ASR model, a transcription comprising one or more transcribed words; retrieve one or more respective predefined confidence levels associated with the one or more transcribed words; determine whether the one or more transcribed words exceed the one or more respective predefined confidence levels; responsive to determining that a first transcribed word of the one or more transcribed words does not exceed a first respective predefined confidence level, generate, using a first machine learning model, a first predicted word; generate, using the first machine learning model, a first numerical representation of the first transcribed word and a second numerical representation of the first predicted word; calculate a distance between the first and second numerical representations; determine whether the distance exceeds a predefined threshold; and responsive to determining that the distance exceeds the predefined threshold: retrieve a first list comprising a plurality of predefined red flag words; determine whether at least one of the plurality of predefined red flag words corresponds to a context of a grouping of transcribed words surrounding the first transcribed word by iteratively substituting each predefined red flag word of the plurality of predefined red flag words for the first transcribed word; and responsive to determining the at least one of the plurality of predefined red flag words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with a first category. 2. The system of claim 1 , wherein retrieving the first list comprising the plurality of predefined red flag words comprises: comparing, using a second machine learning model, a first sample of predefined calls associated with the first category to a second sample of predefined calls associated with a second category; identifying one or more words frequently used in the first sample but not in the second sample; and labeling the one or more words as being associated with the first category. 3. The system of claim 1 , wherein classifying the transcription as being associated with the first category comprises transmitting the transcription to a transcription review queue. 4. The system of claim 1 , wherein the instructions are further configured to cause the system to: generate, using a second machine learning model, a second list of words, wherein each word of the second list of words is a synonym to at least one predefined red flag word of the plurality of predefined red flag words; determine whether at least one of the second list of words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word by iteratively substituting each word of the second list of words for the first transcribed word; and responsive to determining the at least one of the second list of words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with the first category. 5. The system of claim 1 , wherein the instructions are further configured to cause the system to: identify, using a second machine learning model, one or more first respective phonemes associated with each of the plurality of predefined red flag words; identify, using the second machine learning model, one or more second respective phonemes associated with each of the one or more transcribed words; determine whether at least one of the one or more first respective phonemes matches at least one of the one or more second respective phonemes; and responsive to determining the at least one of the one or more first respective phonemes matches the at least one of the one or more second respective phonemes, classify the transcription as being associated with the first category. 6. The system of claim 1 , wherein the instructions are further configured to cause the system to: responsive to determining that the one or more transcribed words exceed the one or more respective predefined confidence levels, determine whether a second transcribed word of the one or more transcribed words is indicative of the first category; responsive to determining the second transcribed word of the one or more transcribed words is indicative of the first category, classify the transcription as being associated with the first category; and responsive to determining the second transcribed word of the one or more transcribed words is not indicative of the first category, classify the transcription as being associated with a second category. 7. The system of claim 1 , wherein the instructions are further configured to cause the system to: responsive to determining that a cosine distance does not exceed the predefined threshold, classify the transcription as being associated with a second category; and responsive to determining that none of the plurality of predefined red flag words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with the second category. 8. A system for correcting automatic speech recognition (ASR) errors comprising: one or more processors; and a memory in communication with the one or more processors and storing instructions that, when executed by the one or more processors, are configured to cause the system to: receive, via an ASR model, a transcription comprising one or more transcribed words; retrieve one or more respective predefined confidence levels associated with the one or more transcribed words; determine whether the one or more transcribed words exceed the one or more respective predefined confidence levels; responsive to determining that a first transcribed word of the one or more transcribed words does not exceed a first respective predefined confidence level, generate, using a first machine learning model, a first predicted word; generate, using the first machine learning model, a first numerical representation of the first transcribed word and a second numerical representation of the first predicted word; calculate a distance between the first and second numerical representations; determine whether the distance exceeds a predefined threshold; and responsive to determining that the distance exceeds the predefined threshold: retrieve a first list of words, wherein each word of the first list of words comprises a synonym to one or more predefined red flag words; determine whether at least one word of the first list of words corresponds to a context of a grouping of transcribed words surrounding the first transcribed word by iteratively substituting each word of the first list of words for the first transcribed word; and responsive to determining the at least one word of the first list of words corresponds to the context of the grouping of transcribed words surrounding the first transcribed word, classify the transcription as being associated with a first category. 9. The system of claim 8 , wherein the distance comprises one or more of cosine distance, Euclidean distance, hamming distance, Manhattan distance, Chebyshev distance, Minkowski distance, Jaccard distance, Haversine distance, Sørensen-Dice distance, or combinations thereof. 10. The system of claim 8 , wherein the instructions are further configured to cause
Assessment or evaluation of speech recognition systems · CPC title
Clustering; Classification · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
using distance or distortion measures between unknown speech and reference templates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.