What technology area does this patent fall under?

Primary CPC classification G06F40/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Filtering audio-based interference from voice commands using natural language processing

US10811007B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10811007-B2
Application number	US-201816004229-A
Country	US
Kind code	B2
Filing date	Jun 8, 2018
Priority date	Jun 8, 2018
Publication date	Oct 20, 2020
Grant date	Oct 20, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method, according to one embodiment, includes: receiving a complex audio signal which includes an intended audio signal and at least one interfering audio signal. The complex audio signal is converted into text which represents a plurality of words included in the complex audio signal, and at least some of the text is identified as representing words which correspond to the at least one interfering audio signal. The identified text is discarded, and a remaining portion of the text is evaluated to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range. Furthermore, the remaining portion of the text is output in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: receiving a complex audio signal, wherein the complex audio signal includes an intended audio signal and at least one interfering audio signal, wherein the intended audio signal is a voice-based command originating from a user, wherein the at least one interfering audio signal is background noise; converting the intended audio signal and the at least one interfering audio signal into text which represents a plurality of words included in the complex audio signal; identifying at least some of the text as representing words which correspond to the at least one interfering audio signal; discarding the identified text; evaluating a remaining portion of the text to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range; and outputting the remaining portion of the text in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 2. The computer-implemented method of claim 1 , comprising: identifying at least some of the text in the remaining portion of the text as representing words which correspond to the at least one interfering audio signal in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is not in the predetermined range; discarding the identified text from the remaining portion of the text; evaluating an updated remaining portion of the text to determine whether the updated remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range; and outputting the updated remaining portion of the text in response to determining that the updated remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 3. The computer-implemented method of claim 1 , wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes: applying one or more natural language processing techniques to the text. 4. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: comparing the text to known voice-based commands, wherein the known voice-based commands are previously logged commands; detecting matches between portions of the text and the known voice-based commands; and identifying the remaining text which does not match any of the known voice-based commands as representing words which correspond to the at least one interfering audio signal, wherein comparing the text to known voice-based commands includes applying a clustering algorithm to the text. 5. The computer-implemented method of claim 1 , comprising: receiving information which corresponds to the at least one interfering audio signal, wherein the received information includes one or more audio samples collected by one or more other users at about the same time that the voice-based command originated from the user, wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes: comparing the one or more audio samples collected by the one or more other users against the complex audio signal, and identifying any matches between the one or more audio samples and the complex audio signal as portions of the at least one interfering audio signal. 6. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: comparing the text to a grammatical template; detecting portions of the text which comply with the grammatical template; and identifying the remaining text which does not comply with the grammatical template as representing words which correspond to the at least one interfering audio signal. 7. The computer-implemented method of claim 3 , wherein applying one or more natural language processing techniques to the text includes: using heuristic algorithms to compare the text to a word bank, wherein the word bank includes a plurality of common words that are detected frequently; identifying portions of the text that match entries in the word bank as representing common words; and identifying remaining portions of the text that do not match the entries in the word bank as representing words which correspond to the at least one interfering audio signal. 8. The computer-implemented method of claim 1 , wherein outputting the remaining portion of the text includes: selecting a known command which matches the remaining portion of the text most closely; and outputting the known command, wherein discarding the identified text includes erasing the identified text from memory. 9. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions readable and/or executable by a processor to cause the processor to perform a method comprising: receiving, by the processor, a complex audio signal, wherein the complex audio signal includes an intended audio signal and at least one interfering audio signal, wherein the intended audio signal is a voice-based command originating from a user, wherein the at least one interfering audio signal is background noise; converting, by the processor, the intended audio signal and the at least one interfering audio signal into text which represents a plurality of words included in the complex audio signal; identifying, by the processor, at least some of the text as representing words which correspond to the at least one interfering audio signal; discarding, by the processor, the identified text; evaluating, by the processor, a remaining portion of the text to determine whether the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in a predetermined range; and outputting, by the processor, the remaining portion of the text in response to determining that the remaining portion of the text represents words which convey the voice-based command at an accuracy that is in the predetermined range. 10. The computer program product of claim 9 , the program instructions readable and/or executable by the processor to cause the processor to perform the method comprising: receiving, by the processor, information which corresponds to the at least one interfering audio signal, wherein identifying at least some of the text as representing words which correspond to the at least one interfering audio signal includes using the received information to identify the at least some of the text. 11. The computer program product of claim 10 , wherein the received information includes: a full copy of an audio file which produced the at least one interfering audio signal; and a timing offset which identifies a portion of the audio file that matches the at least one interfering audio signal, wherein using the received information to identify the at least some of the text as representing words which correspond to the at least one interfering audio signal includes comparing the audio file at the timing offset to the complex audio signal. 12. The computer program product of claim 9 , wherein identifying at least some of the text as representing words which correspond to the at least one interfering

Assignees

Inventors

Classifications

G06F40/20Primary
Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L2015/223
Execution procedure of a spoken command · CPC title
G10L25/51
for comparison or discrimination · CPC title

Patent family

Related publications grouped by family.

View patent family 68765314

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10811007B2 cover?: A computer-implemented method, according to one embodiment, includes: receiving a complex audio signal which includes an intended audio signal and at least one interfering audio signal. The complex audio signal is converted into text which represents a plurality of words included in the complex audio signal, and at least some of the text is identified as representing words which correspond to t…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).