Filtering audio-based interference from voice commands using natural language processing

US10811007B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10811007-B2
Application numberUS-201816004229-A
CountryUS
Kind codeB2
Filing dateJun 8, 2018
Priority dateJun 8, 2018
Publication dateOct 20, 2020
Grant dateOct 20, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method, according to one embodiment, includes: receiving a complex audio signal which includes an intended audio signal and at least one interfering audio signal. The complex audio signal is converted into text which represents a plurality of words included in the complex audio signal, and at least some of the text is identified as representing words which correspond to the at least one interfering audio signal. The identified text is discarded, and a remaining portion of the text is evaluated to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range. Furthermore, the remaining portion of the text is output in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving a complex audio signal, wherein the complex audio signal includes an intended audio signal and at least one interfering audio signal, wherein the intended audio signal is a voice-based command originating from a user, wherein the at least one interfering audio signal is background noise; converting the intended audio signal and the at least one interfering audio signal into text which represents a plurality of words included in the complex audio signal; identifying at least some of the text as representing words which correspond to the at least one interfering audio signal; discarding the identified text; evaluating a remaining portion of the text to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range; and outputting the remaining portion of the text in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 2. The computer-implemented method of claim 1 , comprising: identifying at least some of the text in the remaining portion of the text as representing words which correspond to the at least one interfering audio signal in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is not in the predetermined range; discarding the identified text from the remaining portion of the text; evaluating an updated remaining portion of the text to determine whether the updated remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range; and outputting the updated remaining portion of the text in response to determining that the updated remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 3. The computer-implemented method of claim 1 , wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes: applying one or more natural language processing techniques to the text. 4. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: comparing the text to known voice-based commands, wherein the known voice-based commands are previously logged commands; detecting matches between portions of the text and the known voice-based commands; and identifying the remaining text which does not match any of the known voice-based commands as representing words which correspond to the at least one interfering audio signal, wherein comparing the text to known voice-based commands includes applying a clustering algorithm to the text. 5. The computer-implemented method of claim 1 , comprising: receiving information which corresponds to the at least one interfering audio signal, wherein the received information includes one or more audio samples collected by one or more other users at about the same time that the voice-based command originated from the user, wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes: comparing the one or more audio samples collected by the one or more other users against the complex audio signal, and identifying any matches between the one or more audio samples and the complex audio signal as portions of the at least one interfering audio signal. 6. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: comparing the text to a grammatical template; detecting portions of the text which comply with the grammatical template; and identifying the remaining text which does not comply with the grammatical template as representing words which correspond to the at least one interfering audio signal. 7. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: using heuristic algorithms to compare the text to a word bank, wherein the word bank includes a plurality of common words that are detected frequently; identifying portions of the text that match entries in the word bank as representing common words; and identifying remaining portions of the text that do not match the entries in the word bank as representing words which correspond to the at least one interfering audio signal. 8. The computer-implemented method of claim 1 , wherein outputting the remaining portion of the text includes: selecting a known command which matches the remaining portion of the text most closely; and outputting the known command, wherein discarding the identified text includes erasing the identified text from memory. 9. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions readable and/or executable by a processor to cause the processor to perform a method comprising: receiving, by the processor, a complex audio signal, wherein the complex audio signal includes an intended audio signal and at least one interfering audio signal, wherein the intended audio signal is a voice-based command originating from a user, wherein the at least one interfering audio signal is background noise; converting, by the processor, the intended audio signal and the at least one interfering audio signal into text which represents a plurality of words included in the complex audio signal; identifying, by the processor, at least some of the text as representing words which correspond to the at least one interfering audio signal; discarding, by the processor, the identified text; evaluating, by the processor, a remaining portion of the text to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range; and outputting, by the processor, the remaining portion of the text in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 10. The computer program product of claim 9 , the program instructions readable and/or executable by the processor to cause the processor to perform the method comprising: receiving, by the processor, information which corresponds to the at least one interfering audio signal, wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes using the received information to identify the at least some of the text. 11. The computer program product of claim 10 , wherein the received information includes: a full copy of an audio file which produced the at least one interfering audio signal; and a timing offset which identifies a portion of the audio file that matches the at least one interfering audio signal, wherein using the received information to identify the at least some of the text as representing words which correspond to the at least one interfering audio signal includes comparing the audio file at the timing offset to the complex audio signal. 12. The computer program product of claim 9 , wherein identifying at least some of the text as representing words which correspond to the at least one interfering

Assignees

Inventors

Classifications

  • G06F40/20Primary

    Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Execution procedure of a spoken command · CPC title

  • for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10811007B2 cover?
A computer-implemented method, according to one embodiment, includes: receiving a complex audio signal which includes an intended audio signal and at least one interfering audio signal. The complex audio signal is converted into text which represents a plurality of words included in the complex audio signal, and at least some of the text is identified as representing words which correspond to t…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).