Method of analysing an audio signal
US-8990081-B2 · Mar 24, 2015 · US
US9805738B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9805738-B2 |
| Application number | US-201214423543-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 4, 2012 |
| Priority date | Sep 4, 2012 |
| Publication date | Oct 31, 2017 |
| Grant date | Oct 31, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An arrangement is described for speech signal processing. An input microphone signal is received that includes a speech signal component and a noise component. The microphone signal is transformed into a frequency domain set of short-term spectra signals. Then speech formant components within the spectra signals are estimated based on detecting regions of high energy density in the spectra signals. One or more dynamically adjusted gain factors are applied to the spectra signals to enhance the speech formant components.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method employing at least one hardware implemented computer processor for speech signal processing comprising: receiving an input microphone signal having a speech signal component and a noise component; transforming the microphone signal into a frequency domain set of short term spectra signals; estimating speech formant components within the spectra signals based on detecting regions of high energy density in the spectra signals; applying one or more dynamically adjusted gain factors to the spectra signals to enhance the speech formant components only during voiced speech phonemes and on the speech formant components having signal-to-noise ratio above a threshold; adjusting the gain factors around a center frequency of the speech formant components based upon a presumed reliability of the estimation of the speech formant components, including adjusting the gain factors to boost the speech formant components more for higher reliability formant estimations than lower reliability formant estimations; and requiring a minimum clearance between ones of the speech formant components. 2. The method according to claim 1 , wherein the speech formant components are estimated based on finding spectral peaks using a linear predictive coding filter. 3. The method according to claim 1 , wherein the speech formant components are estimated based on infinite impulse response smoothing of the spectral signals using a plurality of different smoothing constants. 4. The method according to claim 1 , wherein the gain factors are based on shaped windows concentrated on frequency regions corresponding to the speech formant components. 5. The method according to claim 4 , wherein the shaped windows are dynamically adjusted as a function of a corresponding phoneme associated with the speech signal component. 6. The method according to claim 4 , wherein the shaped windows are dynamically adjusted as a function of a signal to noise ratio of the microphone signal. 7. The method according to claim 1 , wherein the gain factors are applied to underestimate the noise component so as to reduce speech distortion in formant regions of the spectra signals. 8. The method according to claim 1 , further comprising: combining the gain factors with one or more noise suppression coefficients to increase broadband signal to noise ratio. 9. The method according to claim 1 , further comprising: outputting the formant enhanced spectra signals to at least one of a mobile telephony application and a speech recognition application. 10. The method according to claim 1 , wherein local maxima are determined by finding zeros of a derivative of the spectra signals after smoothing. 11. The method according to claim 1 , further including applying the one or more dynamically adjusted gain factors at a substantial center of the respective speech formant components. 12. The method according to claim 1 , wherein the speech signal component comprises non-whispered speech. 13. A speech signal processing system comprising: a speech signal input for receiving a microphone signal having a speech signal component and a noise component; a signal pre-processor for transforming the microphone signal into a frequency domain set of short term spectra signals; a formant estimating module for estimating speech formant components within the spectra signals based on detecting regions of high energy density in the spectra signals; and a formant enhancement module for applying one or more dynamically adjusted gain factors to the spectra signals to enhance the speech formant components only during voiced speech phonemes and on the speech formant components having signal-to-noise ratio above a threshold and for adjusting the gain factors around a center frequency of the speech formant components based upon a presumed reliability of the estimation of the speech formant components, wherein the gain factors are adjusted to boost the speech formant components more for higher reliability formant estimations than lower reliability formant estimations, and wherein there is a minimum clearance between ones of the speech formant components. 14. The system according to claim 13 , wherein the formant estimating module estimates the speech formant components based on finding spectral peaks in a linear predictive coding filter. 15. The system according to claim 13 , wherein the formant estimating module estimates the speech formant components based on infinite impulse response smoothing of the spectral signals using a plurality of different smoothing constants. 16. The system according to claim 13 , wherein the gain factors are based on shaped windows concentrated on frequency regions corresponding to the speech formant components. 17. The system according to claim 16 , the formant enhancement module dynamically adjusts the shaped windows as a function of a corresponding phoneme associated with the speech signal component. 18. The system according to claim 16 , wherein the formant enhancement module dynamically adjusts the shaped windows as a function of a signal to noise ratio of the microphone signal. 19. The system according to claim 13 , wherein the formant enhancement module applies the gain factors to underestimate the noise component so as to reduce speech distortion in formant regions of the spectra signals. 20. The system according to claim 13 , wherein the formant enhancement module further combines the gain factors with one or more noise suppression coefficients to increase broadband signal to noise ratio. 21. The system according to claim 13 , further comprising: a processing output for providing the formant enhanced spectra signals to at least one of a mobile telephony application and a speech recognition application.
Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
Codebook for LPC parameters · CPC title
Processing in the frequency domain · CPC title
Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.