Speech identification and extraction from noise using extended high frequency information

US12444429B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12444429-B2
Application numberUS-202218085705-A
CountryUS
Kind codeB2
Filing dateDec 21, 2022
Priority dateDec 21, 2021
Publication dateOct 14, 2025
Grant dateOct 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Improved systems and methods are provided herein for extracting target speech from audio signals that can contain masking speech or other unwanted noise content. These systems and methods include detection of target speech in an input signal by detecting elevated frequency content in the signal above a threshold frequency. Portions of the signal determined to contain such elevated high frequency content are then used to generate audio filters to extract target speech from subsequently-obtained audio signals. This can include performing non-negative matrix factorization to determine a set of basis vectors to represent noise content in the spectral domain and then using the set of basis vectors to decompose subsequently-obtained audio signals into noise signals that can then be removed from the audio signals.

First claim

Opening claim text (preview).

We claim: 1. A non-transitory computer readable medium comprising program instructions executable by at least one processor to cause the at least one processor to perform a method comprising: obtaining a first audio sample; determining that a first portion of the first audio sample contains frequency content at frequencies higher than 5.6 kilohertz that exceeds a threshold energy level; responsive to determining that the first portion contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level, determining a first audio filter based on the first portion of the first audio sample by: determining a first spectrogram for the first portion; and performing non-negative matrix factorization to generate a first matrix and a second matrix whose product corresponds to a low-frequency portion of the first spectrogram that is below a threshold frequency, wherein the first matrix is composed of a set of column vectors that span along a frequency dimension of the first spectrogram, and wherein the second matrix is composed of a set of row vectors that span along a time dimension of the first spectrogram; subsequent to obtaining the first audio sample, obtaining a second audio sample; and applying the first audio filter to the second audio sample to generate a first audio output by: determining a second spectrogram for the second audio sample; applying the first matrix to a low-frequency portion of the second spectrogram that is below the threshold frequency to generate a third spectrogram that represents noise content of the second audio sample; and using the third spectrogram to remove the noise content from the second audio sample, thereby generating the first audio output. 2. The non-transitory computer readable medium of claim 1 , wherein the method further comprises: determining a plurality of zero-crossing rates across time for the first audio sample; and determining a plurality of signal energy levels across time for the first audio sample, wherein determining that the first portion contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level comprises determining (i) that a zero-crossing rate, of the plurality of zero-crossing rates, that corresponds to the first portion exceeds a threshold zero-crossing rate and (ii) that a signal energy level, of the plurality of signal energy levels, that corresponds to the first portion exceeds a threshold signal energy level. 3. The non-transitory computer readable medium of claim 1 , wherein the first audio sample is divided into a plurality of non-overlapping frames, and wherein determining that the first portion contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level comprises: determining that a contiguous subset of the plurality of non-overlapping frames of the first audio sample all contain frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level, wherein the first portion consists of the contiguous subset of frames of the first audio sample. 4. The non-transitory computer readable medium of claim 3 , wherein each frame of the plurality of non-overlapping frames of the first audio sample has a duration between 15 milliseconds and 50 milliseconds. 5. The non-transitory computer readable medium of claim 1 , wherein the method further comprises: prior to obtaining the first audio sample, obtaining a third audio sample; determining that a second portion of the third audio sample contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level; and responsive to determining that the second portion of the third audio sample contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level, determining a second audio filter based on the second portion of the third audio sample by: determining a fourth spectrogram for the second portion; and performing non-negative matrix factorization to generate a third matrix and a fourth matrix whose product corresponds to a portion of the fourth spectrogram below the threshold frequency, wherein the third matrix is composed of a further set of column vectors that span along a frequency dimension of the fourth spectrogram, and wherein the fourth matrix is composed of a further set of row vectors that span along a time dimension of the fourth spectrogram, wherein performing non-negative matrix factorization to generate the first matrix and the second matrix comprises using, as an initial estimate of the first matrix, the third matrix. 6. The non-transitory computer readable medium of claim 1 , wherein determining that the first portion contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level comprises: determining a spectrogram for the first portion; and determining that a total energy in the spectrogram above 5.6 kilohertz exceeds the threshold energy level. 7. The non-transitory computer readable medium of claim 1 , wherein using the third spectrogram to remove the noise content from the second audio sample comprises: performing an inverse transform on the third spectrogram to generated a time-domain noise signal; and subtracting the time-domain noise signal from the second audio sample to generate the first audio output. 8. The non-transitory computer readable medium of claim 1 , wherein the method further comprises: prior to obtaining the first audio sample, obtaining a third audio sample; determining that a second portion of the third audio sample contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level; and responsive to determining that the second portion of the third audio sample contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level, determining a second audio filter based on the second portion of the third audio sample, wherein determining the first audio filter based on the first portion comprises: determining a third audio filter based on the first portion; and determining the first audio filter as a weighted combination of the second audio filter and the third audio filter. 9. A method comprising: obtaining a first audio sample; determining that a first portion of the first audio sample contains frequency content at frequencies higher than 5.6 kilohertz that exceeds a threshold energy level; responsive to determining that the first portion contains frequency content at frequencies higher than 5.6 kilohertz that exceeds the threshold energy level, determining a first audio filter based on the first portion of the first audio sample by: determining a first spectrogram for the first portion; and performing non-negative matrix factorization to generate a first matrix and a second matrix whose product corresponds to a low-frequency portion of the first spectrogram that is below a threshold frequency, wherein the first matrix is composed of a set of column vectors that span along a frequency dimension of the first spectrogram, and wherein the second matrix is composed of a set of row vectors that span along a time dimension of the first spectrogram; subsequent to obtaining the first audio sample, obtaining a second audio sample; and applying the first audio filter to the second audio sample to generate a first audio output by: determining a second spectrogram for the second audio sample; applying the first matrix to a low-frequency portion of the second spectrogram that is below the threshold frequency to generate a third spectrogram that represents noise content of the second audio sample; a

Assignees

Inventors

Classifications

  • characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title

  • for discriminating voice from noise · CPC title

  • Voice signal separating · CPC title

  • the extracted parameters being spectral information of each sub-band · CPC title

  • Processing in the time domain · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12444429B2 cover?
Improved systems and methods are provided herein for extracting target speech from audio signals that can contain masking speech or other unwanted noise content. These systems and methods include detection of target speech in an input signal by detecting elevated frequency content in the signal above a threshold frequency. Portions of the signal determined to contain such elevated high frequenc…
Who is the assignee on this patent?
Univ Illinois, The Board Of Regents Of The Univ Of Illinois
What technology area does this patent fall under?
Primary CPC classification G10L21/0224. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).