Learning transcription errors in speech recognition tasks
US-2019213996-A1 · Jul 11, 2019 · US
US10811007B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10811007-B2 |
| Application number | US-201816004229-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 8, 2018 |
| Priority date | Jun 8, 2018 |
| Publication date | Oct 20, 2020 |
| Grant date | Oct 20, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method, according to one embodiment, includes: receiving a complex audio signal which includes an intended audio signal and at least one interfering audio signal. The complex audio signal is converted into text which represents a plurality of words included in the complex audio signal, and at least some of the text is identified as representing words which correspond to the at least one interfering audio signal. The identified text is discarded, and a remaining portion of the text is evaluated to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range. Furthermore, the remaining portion of the text is output in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: receiving a complex audio signal, wherein the complex audio signal includes an intended audio signal and at least one interfering audio signal, wherein the intended audio signal is a voice-based command originating from a user, wherein the at least one interfering audio signal is background noise; converting the intended audio signal and the at least one interfering audio signal into text which represents a plurality of words included in the complex audio signal; identifying at least some of the text as representing words which correspond to the at least one interfering audio signal; discarding the identified text; evaluating a remaining portion of the text to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range; and outputting the remaining portion of the text in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 2. The computer-implemented method of claim 1 , comprising: identifying at least some of the text in the remaining portion of the text as representing words which correspond to the at least one interfering audio signal in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is not in the predetermined range; discarding the identified text from the remaining portion of the text; evaluating an updated remaining portion of the text to determine whether the updated remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range; and outputting the updated remaining portion of the text in response to determining that the updated remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 3. The computer-implemented method of claim 1 , wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes: applying one or more natural language processing techniques to the text. 4. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: comparing the text to known voice-based commands, wherein the known voice-based commands are previously logged commands; detecting matches between portions of the text and the known voice-based commands; and identifying the remaining text which does not match any of the known voice-based commands as representing words which correspond to the at least one interfering audio signal, wherein comparing the text to known voice-based commands includes applying a clustering algorithm to the text. 5. The computer-implemented method of claim 1 , comprising: receiving information which corresponds to the at least one interfering audio signal, wherein the received information includes one or more audio samples collected by one or more other users at about the same time that the voice-based command originated from the user, wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes: comparing the one or more audio samples collected by the one or more other users against the complex audio signal, and identifying any matches between the one or more audio samples and the complex audio signal as portions of the at least one interfering audio signal. 6. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: comparing the text to a grammatical template; detecting portions of the text which comply with the grammatical template; and identifying the remaining text which does not comply with the grammatical template as representing words which correspond to the at least one interfering audio signal. 7. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: using heuristic algorithms to compare the text to a word bank, wherein the word bank includes a plurality of common words that are detected frequently; identifying portions of the text that match entries in the word bank as representing common words; and identifying remaining portions of the text that do not match the entries in the word bank as representing words which correspond to the at least one interfering audio signal. 8. The computer-implemented method of claim 1 , wherein outputting the remaining portion of the text includes: selecting a known command which matches the remaining portion of the text most closely; and outputting the known command, wherein discarding the identified text includes erasing the identified text from memory. 9. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions readable and/or executable by a processor to cause the processor to perform a method comprising: receiving, by the processor, a complex audio signal, wherein the complex audio signal includes an intended audio signal and at least one interfering audio signal, wherein the intended audio signal is a voice-based command originating from a user, wherein the at least one interfering audio signal is background noise; converting, by the processor, the intended audio signal and the at least one interfering audio signal into text which represents a plurality of words included in the complex audio signal; identifying, by the processor, at least some of the text as representing words which correspond to the at least one interfering audio signal; discarding, by the processor, the identified text; evaluating, by the processor, a remaining portion of the text to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range; and outputting, by the processor, the remaining portion of the text in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 10. The computer program product of claim 9 , the program instructions readable and/or executable by the processor to cause the processor to perform the method comprising: receiving, by the processor, information which corresponds to the at least one interfering audio signal, wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes using the received information to identify the at least some of the text. 11. The computer program product of claim 10 , wherein the received information includes: a full copy of an audio file which produced the at least one interfering audio signal; and a timing offset which identifies a portion of the audio file that matches the at least one interfering audio signal, wherein using the received information to identify the at least some of the text as representing words which correspond to the at least one interfering audio signal includes comparing the audio file at the timing offset to the complex audio signal. 12. The computer program product of claim 9 , wherein identifying at least some of the text as representing words which correspond to the at least one interfering
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
Speech to text systems (G10L15/08 takes precedence) · CPC title
Execution procedure of a spoken command · CPC title
for comparison or discrimination · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.