Time-frequency directional processing of audio signals
US-2015086038-A1 · Mar 26, 2015 · US
US10090001B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10090001-B2 |
| Application number | US-201615225595-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 1, 2016 |
| Priority date | Aug 1, 2016 |
| Publication date | Oct 2, 2018 |
| Grant date | Oct 2, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
(ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. Other embodiments are described.
Opening claim text (preview).
What is claimed is: 1. A system for performing speech enhancement using a Neural Network based combined signal comprising: at least one microphone to receive at least one of a near-end speaker signal and ambient noise signal, and to generate an acoustic signal; at least one accelerometer to receive at least one of the near-end speaker signal and the ambient noise signal, and to generate an accelerometer signal; and a neural network to receive the acoustic signal and the accelerometer signal, and to generate a speech reference signal, wherein the neural network is trained offline by: exciting the at least one accelerometer and the at least one microphone using a training accelerometer signal and a training acoustic signal, respectively, wherein the training accelerometer signal and the training acoustic signal have speech segments, selecting speech included in the training accelerometer signal and in the training acoustic signal, and spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. 2. The system of claim 1 , wherein the neural network provides spatial localization of features, weight sharing and sub sampling of hidden units. 3. The system of claim 1 , wherein the neural network generates the speech reference signal based on the weight parameter set in the neural network. 4. The system of claim 1 , wherein the speech reference signal includes at least one of: speech presence probabilities, artificial speech or artificial speech magnitude. 5. The system of claim 1 , wherein the neural network is a multilayer perception (MLP) neural network or a convolution deep neural network (CDNN). 6. The system of claim 1 , further comprising: a speech suppressor to receive the speech reference signal and the acoustic signal, and to generate a noise reference signal using spectral subtraction; and a noise suppressor to receive the acoustic signal, the noise reference signal, and the speech reference signal, and to generate an enhanced speech signal. 7. The system of claim 6 , further comprising: a signal-to-noise ratio (SNR) detector that receives the enhanced speech signal, the noise reference signal and the acoustic signal to generate an SNR information signal; and a neural network training unit that receives the SNR information signal, generates an update signal based on the SNR information signal, and transmits the update signal to the neural network to cause updates to the weight parameter in the neural network. 8. The system of claim 7 , wherein the neural network training unit causes in-the-field weight updates to the neural network. 9. A method of speech enhancement using a Neural Network based combined signal comprising: training a neural network offline, wherein training the neural network offline includes: exciting at least one accelerometer and at least one microphone using a training accelerometer signal and a training acoustic signal, respectively, wherein the training accelerometer signal and the training acoustic signal are correlated during clean speech segments, selecting speech included in the training accelerometer signal and in the training acoustic signal, and spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal; and generating by the neural network a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. 10. The method of claim 9 , wherein the neural network provides spatial localization of features, weight sharing and subsampling of hidden units. 11. The method of claim 9 , wherein the neural network generates the speech reference signal based on the weight parameter set in the neural network. 12. The method of claim 9 , wherein the speech reference signal includes at least one of: speech presence probabilities, artificial speech or artificial speech magnitude. 13. The method of claim 9 , wherein the neural network is a multilayer perception (MLP) neural network or a convolution deep neural network (CDNN). 14. The method of claim 9 , wherein the at least one microphone receives at least one of a near-end speaker signal and ambient noise signal and generates an acoustic signal, and wherein the at least one accelerometer receives at least one of the near-end speaker signal and the ambient noise signal, and generates the accelerometer signal. 15. The method of claim 9 , further comprising generating by a speech suppressor a noise reference signal using spectral subtraction of the speech reference signal from the acoustic signal; and generating an enhanced speech signal by a noise suppressor using the acoustic signal, the noise reference signal, and the speech reference signal. 16. The method of claim 15 , further comprising: generating by a signal-to-noise ratio (SNR) detector an SNR information signal using the enhanced speech signal, the noise reference signal and the acoustic signal; and generating by a neural network training unit an update signal based on the SNR information signal; and transmitting the update signal to the neural network. 17. The method of claim 16 , further comprising: updating by the neural network the weight parameter based on the update signal. 18. The method of claim 17 , wherein the neural network training unit causes in-the-field weight updates to the neural network. 19. A computer-readable non-transitory storage medium have stored thereon instructions, which when executed by a processor, causes the processor to perform a method of speech enhancement using a Neural Network based combined signal comprising: training a neural network offline, wherein training the neural network offline includes: exciting at least one accelerometer and at least one microphone using a training accelerometer signal and a training acoustic signal, respectively, wherein the training accelerometer signal and the training acoustic signal are correlated during clean speech segments, selecting speech included in the training accelerometer signal and in the training acoustic signal, and spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal; and causing the neural network to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. 20. The computer-readable storage medium of claim 19 , having stored therein instructions, when executed by the processor, causes the processor to perform the method further comprising: generating a noise reference signal using spectral subtraction of the speech reference signal from the acoustic signal; and generating an enhanced speech signal using the acoustic signal, the noise reference signal, and the speech reference signal. 21. The computer-readable storage medium of claim 20 , having stored therein instructions, when executed by the processor, causes the processor to perform the method further comprising: generating an SNR information signal using the enhanced speech signal, the noise reference signal and the acoustic signal; and generating an update signal based on the SNR i
for discriminating voice from noise · CPC title
using neural networks · CPC title
for transmitting results of analysis · CPC title
Processing in the frequency domain · CPC title
using properties of sound source · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.