System and method for performing speech enhancement using a neural network-based combined symbol

US10090001B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10090001-B2
Application numberUS-201615225595-A
CountryUS
Kind codeB2
Filing dateAug 1, 2016
Priority dateAug 1, 2016
Publication dateOct 2, 2018
Grant dateOct 2, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

(ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. Other embodiments are described.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for performing speech enhancement using a Neural Network based combined signal comprising: at least one microphone to receive at least one of a near-end speaker signal and ambient noise signal, and to generate an acoustic signal; at least one accelerometer to receive at least one of the near-end speaker signal and the ambient noise signal, and to generate an accelerometer signal; and a neural network to receive the acoustic signal and the accelerometer signal, and to generate a speech reference signal, wherein the neural network is trained offline by: exciting the at least one accelerometer and the at least one microphone using a training accelerometer signal and a training acoustic signal, respectively, wherein the training accelerometer signal and the training acoustic signal have speech segments, selecting speech included in the training accelerometer signal and in the training acoustic signal, and spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. 2. The system of claim 1 , wherein the neural network provides spatial localization of features, weight sharing and sub sampling of hidden units. 3. The system of claim 1 , wherein the neural network generates the speech reference signal based on the weight parameter set in the neural network. 4. The system of claim 1 , wherein the speech reference signal includes at least one of: speech presence probabilities, artificial speech or artificial speech magnitude. 5. The system of claim 1 , wherein the neural network is a multilayer perception (MLP) neural network or a convolution deep neural network (CDNN). 6. The system of claim 1 , further comprising: a speech suppressor to receive the speech reference signal and the acoustic signal, and to generate a noise reference signal using spectral subtraction; and a noise suppressor to receive the acoustic signal, the noise reference signal, and the speech reference signal, and to generate an enhanced speech signal. 7. The system of claim 6 , further comprising: a signal-to-noise ratio (SNR) detector that receives the enhanced speech signal, the noise reference signal and the acoustic signal to generate an SNR information signal; and a neural network training unit that receives the SNR information signal, generates an update signal based on the SNR information signal, and transmits the update signal to the neural network to cause updates to the weight parameter in the neural network. 8. The system of claim 7 , wherein the neural network training unit causes in-the-field weight updates to the neural network. 9. A method of speech enhancement using a Neural Network based combined signal comprising: training a neural network offline, wherein training the neural network offline includes: exciting at least one accelerometer and at least one microphone using a training accelerometer signal and a training acoustic signal, respectively, wherein the training accelerometer signal and the training acoustic signal are correlated during clean speech segments, selecting speech included in the training accelerometer signal and in the training acoustic signal, and spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal; and generating by the neural network a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. 10. The method of claim 9 , wherein the neural network provides spatial localization of features, weight sharing and subsampling of hidden units. 11. The method of claim 9 , wherein the neural network generates the speech reference signal based on the weight parameter set in the neural network. 12. The method of claim 9 , wherein the speech reference signal includes at least one of: speech presence probabilities, artificial speech or artificial speech magnitude. 13. The method of claim 9 , wherein the neural network is a multilayer perception (MLP) neural network or a convolution deep neural network (CDNN). 14. The method of claim 9 , wherein the at least one microphone receives at least one of a near-end speaker signal and ambient noise signal and generates an acoustic signal, and wherein the at least one accelerometer receives at least one of the near-end speaker signal and the ambient noise signal, and generates the accelerometer signal. 15. The method of claim 9 , further comprising generating by a speech suppressor a noise reference signal using spectral subtraction of the speech reference signal from the acoustic signal; and generating an enhanced speech signal by a noise suppressor using the acoustic signal, the noise reference signal, and the speech reference signal. 16. The method of claim 15 , further comprising: generating by a signal-to-noise ratio (SNR) detector an SNR information signal using the enhanced speech signal, the noise reference signal and the acoustic signal; and generating by a neural network training unit an update signal based on the SNR information signal; and transmitting the update signal to the neural network. 17. The method of claim 16 , further comprising: updating by the neural network the weight parameter based on the update signal. 18. The method of claim 17 , wherein the neural network training unit causes in-the-field weight updates to the neural network. 19. A computer-readable non-transitory storage medium have stored thereon instructions, which when executed by a processor, causes the processor to perform a method of speech enhancement using a Neural Network based combined signal comprising: training a neural network offline, wherein training the neural network offline includes: exciting at least one accelerometer and at least one microphone using a training accelerometer signal and a training acoustic signal, respectively, wherein the training accelerometer signal and the training acoustic signal are correlated during clean speech segments, selecting speech included in the training accelerometer signal and in the training acoustic signal, and spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal; and causing the neural network to generate a speech reference signal based on an accelerometer signal from the at least one accelerometer and an acoustic signal received from the at least one microphone. 20. The computer-readable storage medium of claim 19 , having stored therein instructions, when executed by the processor, causes the processor to perform the method further comprising: generating a noise reference signal using spectral subtraction of the speech reference signal from the acoustic signal; and generating an enhanced speech signal using the acoustic signal, the noise reference signal, and the speech reference signal. 21. The computer-readable storage medium of claim 20 , having stored therein instructions, when executed by the processor, causes the processor to perform the method further comprising: generating an SNR information signal using the enhanced speech signal, the noise reference signal and the acoustic signal; and generating an update signal based on the SNR i

Assignees

Inventors

Classifications

  • for discriminating voice from noise · CPC title

  • G10L25/30Primary

    using neural networks · CPC title

  • for transmitting results of analysis · CPC title

  • Processing in the frequency domain · CPC title

  • G10L21/028Primary

    using properties of sound source · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10090001B2 cover?
(ii) selecting speech included in the training accelerometer signal and in the training acoustic signal, and (iii) spatially localizing the speech by setting a weight parameter in the neural network based on the selected speech included in the training accelerometer signal and in the training acoustic signal. The neural network that is trained offline is then used to generate a speech reference…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G10L25/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 02 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).