Audio processing device and a method for estimating a signal-to-noise-ratio of a sound signal
US-2021058713-A1 · Feb 25, 2021 · US
US12597433B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12597433-B2 |
| Application number | US-202318484927-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 11, 2023 |
| Priority date | Apr 16, 2021 |
| Publication date | Apr 7, 2026 |
| Grant date | Apr 7, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A speech signal enhancement method includes: performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum to obtain a second speech signal, where the first time-frequency spectrum is used to indicate a time domain feature and a frequency domain feature of the first speech signal, and the first power spectrum is a power spectrum of a noise signal in the first speech signal; determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal; and determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed, and performing gain compensation on the second speech signal based on the damage compensation gain.
Opening claim text (preview).
What is claimed is: 1 . A speech signal enhancement method, comprising: performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum to obtain a second speech signal, wherein the first time-frequency spectrum is used to indicate a time domain feature and a frequency domain feature of the first speech signal, and the first power spectrum is a power spectrum of a noise signal in the first speech signal; determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal, wherein the voiced signal is a signal with a cepstral coefficient greater than or equal to a preset threshold in the second speech signal; and determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed, and performing gain compensation on the second speech signal based on the damage compensation gain; wherein the determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal comprises: performing homomorphic positive analysis processing on the second speech signal to obtain a target cepstral coefficient of the second speech signal: determining a maximum cepstral coefficient in the target cepstral coefficient, and determining a signal corresponding to the maximum cepstral coefficient in the second speech signal as the voiced signal; and performing gain amplification processing on the maximum cepstral coefficient, to perform gain compensation on the voiced signal. 2 . The method according to claim 1 , wherein before the performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum, the method further comprises: performing a short-time Fourier transform on the first speech signal to obtain the first time-frequency spectrum; determining a power spectrum of the first speech signal according to the first time-frequency spectrum, and determining a target power spectrum in the power spectrum of the first speech signal, wherein the target power spectrum is a power spectrum of a signal with a smallest power spectrum in signals within a preset time window; and performing recursive smoothing processing on the target power spectrum to obtain the first power spectrum. 3 . The method according to claim 1 , wherein the performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum comprises: determining a posterior signal-to-noise ratio corresponding to the first speech signal according to the first power spectrum and the power spectrum of the first speech signal, and performing recursive smoothing processing on the posterior signal-to-noise ratio to obtain a prior signal-to-noise ratio corresponding to the first speech signal; determining a target noise reduction gain according to the posterior signal-to-noise ratio and the prior signal-to-noise ratio; and performing noise reduction processing on the first speech signal according to the first time-frequency spectrum and the target noise reduction gain. 4 . The method according to claim 1 , wherein the determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed comprises: performing homomorphic inverse analysis processing on a first cepstral coefficient and the maximum cepstral coefficient on which the gain amplification processing has been performed, to obtain a first logarithmic time-frequency spectrum, wherein the first cepstral coefficient is a cepstral coefficient in the target cepstral coefficient other than the maximum cepstral coefficient; and determining a logarithmic time-frequency spectrum of the second speech signal according to a time-frequency spectrum of the second speech signal, and determining the damage compensation gain according to a difference between the first logarithmic time-frequency spectrum and the logarithmic time-frequency spectrum of the second speech signal. 5 . The method according to claim 1 , wherein the second speech signal is a signal obtained by performing noise reduction processing on a target frequency domain signal, and the target frequency domain signal is a signal obtained by performing a short-time Fourier transform on the first speech signal; and after the performing gain compensation on the second speech signal based on the damage compensation gain, the method further comprises: performing time-frequency inverse transform processing on the second speech signal on which the gain compensation has been performed, to obtain a target time domain signal, and outputting the target time domain signal. 6 . A chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the speech signal enhancement method according to claim 1 . 7 . An electronic device, comprising a processor, a memory, and a program or an instruction stored in the memory and runnable on the processor, wherein the program or the instruction is executed by the processor to implement: performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum to obtain a second speech signal, wherein the first time-frequency spectrum is used to indicate a time domain feature and a frequency domain feature of the first speech signal, and the first power spectrum is a power spectrum of a noise signal in the first speech signal; determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal, wherein the voiced signal is a signal with a cepstral coefficient greater than or equal to a preset threshold in the second speech signal; and determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed, and performing gain compensation on the second speech signal based on the damage compensation gain; wherein the determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal comprises: performing homomorphic positive analysis processing on the second speech signal to obtain a target cepstral coefficient of the second speech signal; determining a maximum cepstral coefficient in the target cepstral coefficient, and determining a signal corresponding to the maximum cepstral coefficient in the second speech signal as the voiced signal; and performing gain amplification processing on the maximum cepstral coefficient, to perform gain compensation on the voiced signal. 8 . The electronic device according to claim 7 , wherein before the performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum, the method further comprises: performing a short-time Fourier transform on the first speech signal to obtain the first time-frequency spectrum; determining a power spectrum of the first speech signal according to the first time-frequency spectrum, and determining a target power spectrum in the power spectrum of the first speech signal, wherein the target power spectrum is a power spectrum of a signal with a smallest power spectrum in signals within a preset time window; and performing recursive smoothing processing on the target power spectrum to obtain the first power spectrum. 9 . The electronic device according to claim 7 , wherein the performing noise reduction processing on a first speech s
the extracted parameters being power information · CPC title
Discriminating between voiced and unvoiced parts of speech signals (G10L25/90 takes precedence) · CPC title
the extracted parameters being the cepstrum · CPC title
Processing in the frequency domain · CPC title
Processing in the time domain · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.