Formant dependent speech signal enhancement

US9805738B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9805738-B2
Application numberUS-201214423543-A
CountryUS
Kind codeB2
Filing dateSep 4, 2012
Priority dateSep 4, 2012
Publication dateOct 31, 2017
Grant dateOct 31, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An arrangement is described for speech signal processing. An input microphone signal is received that includes a speech signal component and a noise component. The microphone signal is transformed into a frequency domain set of short-term spectra signals. Then speech formant components within the spectra signals are estimated based on detecting regions of high energy density in the spectra signals. One or more dynamically adjusted gain factors are applied to the spectra signals to enhance the speech formant components.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method employing at least one hardware implemented computer processor for speech signal processing comprising: receiving an input microphone signal having a speech signal component and a noise component; transforming the microphone signal into a frequency domain set of short term spectra signals; estimating speech formant components within the spectra signals based on detecting regions of high energy density in the spectra signals; applying one or more dynamically adjusted gain factors to the spectra signals to enhance the speech formant components only during voiced speech phonemes and on the speech formant components having signal-to-noise ratio above a threshold; adjusting the gain factors around a center frequency of the speech formant components based upon a presumed reliability of the estimation of the speech formant components, including adjusting the gain factors to boost the speech formant components more for higher reliability formant estimations than lower reliability formant estimations; and requiring a minimum clearance between ones of the speech formant components. 2. The method according to claim 1 , wherein the speech formant components are estimated based on finding spectral peaks using a linear predictive coding filter. 3. The method according to claim 1 , wherein the speech formant components are estimated based on infinite impulse response smoothing of the spectral signals using a plurality of different smoothing constants. 4. The method according to claim 1 , wherein the gain factors are based on shaped windows concentrated on frequency regions corresponding to the speech formant components. 5. The method according to claim 4 , wherein the shaped windows are dynamically adjusted as a function of a corresponding phoneme associated with the speech signal component. 6. The method according to claim 4 , wherein the shaped windows are dynamically adjusted as a function of a signal to noise ratio of the microphone signal. 7. The method according to claim 1 , wherein the gain factors are applied to underestimate the noise component so as to reduce speech distortion in formant regions of the spectra signals. 8. The method according to claim 1 , further comprising: combining the gain factors with one or more noise suppression coefficients to increase broadband signal to noise ratio. 9. The method according to claim 1 , further comprising: outputting the formant enhanced spectra signals to at least one of a mobile telephony application and a speech recognition application. 10. The method according to claim 1 , wherein local maxima are determined by finding zeros of a derivative of the spectra signals after smoothing. 11. The method according to claim 1 , further including applying the one or more dynamically adjusted gain factors at a substantial center of the respective speech formant components. 12. The method according to claim 1 , wherein the speech signal component comprises non-whispered speech. 13. A speech signal processing system comprising: a speech signal input for receiving a microphone signal having a speech signal component and a noise component; a signal pre-processor for transforming the microphone signal into a frequency domain set of short term spectra signals; a formant estimating module for estimating speech formant components within the spectra signals based on detecting regions of high energy density in the spectra signals; and a formant enhancement module for applying one or more dynamically adjusted gain factors to the spectra signals to enhance the speech formant components only during voiced speech phonemes and on the speech formant components having signal-to-noise ratio above a threshold and for adjusting the gain factors around a center frequency of the speech formant components based upon a presumed reliability of the estimation of the speech formant components, wherein the gain factors are adjusted to boost the speech formant components more for higher reliability formant estimations than lower reliability formant estimations, and wherein there is a minimum clearance between ones of the speech formant components. 14. The system according to claim 13 , wherein the formant estimating module estimates the speech formant components based on finding spectral peaks in a linear predictive coding filter. 15. The system according to claim 13 , wherein the formant estimating module estimates the speech formant components based on infinite impulse response smoothing of the spectral signals using a plurality of different smoothing constants. 16. The system according to claim 13 , wherein the gain factors are based on shaped windows concentrated on frequency regions corresponding to the speech formant components. 17. The system according to claim 16 , the formant enhancement module dynamically adjusts the shaped windows as a function of a corresponding phoneme associated with the speech signal component. 18. The system according to claim 16 , wherein the formant enhancement module dynamically adjusts the shaped windows as a function of a signal to noise ratio of the microphone signal. 19. The system according to claim 13 , wherein the formant enhancement module applies the gain factors to underestimate the noise component so as to reduce speech distortion in formant regions of the spectra signals. 20. The system according to claim 13 , wherein the formant enhancement module further combines the gain factors with one or more noise suppression coefficients to increase broadband signal to noise ratio. 21. The system according to claim 13 , further comprising: a processing output for providing the formant enhanced spectra signals to at least one of a mobile telephony application and a speech recognition application.

Assignees

Inventors

Classifications

  • Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients · CPC title

  • G10L25/18Primary

    the extracted parameters being spectral information of each sub-band · CPC title

  • Codebook for LPC parameters · CPC title

  • Processing in the frequency domain · CPC title

  • G10L21/02Primary

    Speech enhancement, e.g. noise reduction or echo cancellation (reducing echo effects in line transmission systems H04B3/20; echo suppression in hands-free telephones H04M9/08) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9805738B2 cover?
An arrangement is described for speech signal processing. An input microphone signal is received that includes a speech signal component and a noise component. The microphone signal is transformed into a frequency domain set of short-term spectra signals. Then speech formant components within the spectra signals are estimated based on detecting regions of high energy density in the spectra sign…
Who is the assignee on this patent?
Krini Mohamed, Schalk-Schupp Ingo, Buck Markus, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L25/18. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 31 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).