Neural sidelobe canceller for target speech separation
US-12300261-B1 · May 13, 2025 · US
US2024428818A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2024428818-A1 |
| Application number | US-202418751015-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 21, 2024 |
| Priority date | Jun 23, 2023 |
| Publication date | Dec 26, 2024 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method including identifying an audio capture device and a target direction associated with the audio capture device, detecting first audio associated with the target direction, enhancing the first audio using a machine learning model configured to detect audio associated with the target direction, optionally, detecting second audio associated with a direction different from the target direction, and optionally, diminishing the second audio using the machine learning model.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: identifying an audio capture device and a target direction associated with the audio capture device; detecting first audio associated with the target direction; and enhancing the first audio using a first machine learning model configured to detect audio associated with the target direction. 2 . The method of claim 1 , further comprising: detecting second audio associated with a direction different from the target direction; and diminishing the second audio using a second machine learning model. 3 . The method of claim 2 , wherein diminishing the second audio includes decreasing an amplitude of at least one sound wave associated with the second audio. 4 . The method of claim 2 , wherein diminishing the second audio includes attenuating the second audio by reducing a signal strength associated with the second audio. 5 . The method of claim 2 , wherein diminishing the second audio includes eliminating the second audio by removing the second audio from an output of the second machine learning model. 6 . The method of claim 2 , wherein the first machine learning model and the second machine learning model are a same machine learning model. 7 . The method of claim 1 , wherein enhancing the first audio includes increasing an amplitude of at least one sound wave associated with the first audio. 8 . The method of claim 1 , wherein enhancing the first audio includes de-reverbing the first audio by removing resonant frequencies from the first audio. 9 . The method of claim 1 , wherein enhancing the first audio includes de-noising the first audio by filtering the first audio. 10 . The method of claim 1 , wherein the target direction is associated with a focus region. 11 . The method of claim 1 , wherein the first machine learning model is trained to detect the audio associated with the target direction using an impulse response dataset. 12 . The method of claim 1 , wherein the enhancing of the first audio using the first machine learning model includes: compressing the first audio using a first machine learning model; and decompressing the compressed audio using a second machine learning model. 13 . The method of claim 1 , wherein the first machine learning model is a neural network model, the neural network model is configured to detect the audio associated with the target direction by training the neural network model, and training the neural network model includes: receiving first training data including at least one first audio signal; receiving second training data including at least one second audio signal; receiving an impulse response dataset; convolving the first training data with a first subset of the impulse response dataset as a first convolved audio, the first subset of the impulse response dataset being associated with the target direction; convolving the second training data with a second subset of the impulse response dataset as a second convolved audio; and training the neural network model based on the first convolved audio and the second convolved audio. 14 . The method of claim 13 , wherein training the neural network model includes training a first neural network model and a second neural network model, the first neural network model being associated with compressing the first audio as compressed first audio, and the second neural network model being associated with decompressing the compressed first audio. 15 . The method of claim 13 , wherein the first training data is associated with a focus region, and the first subset of the impulse response dataset represents an impulse response associated with the focus region. 16 . A non-transitory computer-readable storage medium comprising instructions stored thereon that, when executed by a processor, are configured to cause a computing system to: identify a audio capture device and a target direction associated with the audio capture device; detect first audio associated with the target direction; enhance the first audio using a machine learning model configured to detect audio associated with the target direction; detect second audio associated with a direction different from the target direction; and diminish the second audio using the machine learning model. 17 . The non-transitory computer-readable storage medium of claim 16 , wherein diminishing the second audio includes one of decreasing an amplitude of the second audio, attenuating the second audio, or eliminating the second audio. 18 . The non-transitory computer-readable storage medium of claim 16 , wherein enhancing the first audio includes at least one of increasing an amplitude of the first audio, de-reverbing the first audio and de-noising the first audio. 19 . The non-transitory computer-readable storage medium of claim 16 , wherein the target direction is associated with a focus region. 20 . The non-transitory computer-readable storage medium of claim 16 , wherein the enhancing of the first audio using the machine learning model includes: compressing the first audio using a first machine learning model; and decompressing the compressed audio using a second machine learning model. 21 . The non-transitory computer-readable storage medium of claim 16 , wherein the machine learning model is a neural network model, the neural network model is configured to detect the audio associated with the target direction by training the neural network model, and training the neural network model includes: receiving first training data including at least one first audio signal; receiving second training data including at least one second audio signal; receiving an impulse response dataset; convolving the first training data with a first subset of the impulse response dataset as a first convolved audio, the first subset of the impulse response dataset being associated with the target direction; convolving the second training data with a second subset of the impulse response dataset as a second convolved audio; and training the neural network model based on the first convolved audio and the second convolved audio. 22 . The non-transitory computer-readable storage medium of claim 21 , wherein training the neural network model includes training a first neural network model and a second neural network model, the first neural network model being associated with compressing the first audio as compressed first audio, and the second neural network model being associated with decompressing the compressed first audio. 23 . An apparatus comprising: at least one processor; and at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to: identify a audio capture device and a target direction associated with the audio capture device; detect first audio associated with the target direction; enhance the first audio using a machine learning model configured to detect audio associated with the target direction; detect second audio associated with a direction different from the target direction; and diminish the second audio using the machine learning model. 24 . The apparatus of claim 23 , wherein diminishing the second audio includes one of decreasing an amplitude of the second audio, attenuating the second audio, or eliminating the second audio, and enhancing
the noise being echo, reverberation of the speech · CPC title
for improving intelligibility · CPC title
Processing in the frequency domain · CPC title
Automatic adjustment · CPC title
Training · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.