Speech signal enhancement method and apparatus, and electronic device

US12597433B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12597433-B2
Application numberUS-202318484927-A
CountryUS
Kind codeB2
Filing dateOct 11, 2023
Priority dateApr 16, 2021
Publication dateApr 7, 2026
Grant dateApr 7, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A speech signal enhancement method includes: performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum to obtain a second speech signal, where the first time-frequency spectrum is used to indicate a time domain feature and a frequency domain feature of the first speech signal, and the first power spectrum is a power spectrum of a noise signal in the first speech signal; determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal; and determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed, and performing gain compensation on the second speech signal based on the damage compensation gain.

First claim

Opening claim text (preview).

What is claimed is: 1 . A speech signal enhancement method, comprising: performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum to obtain a second speech signal, wherein the first time-frequency spectrum is used to indicate a time domain feature and a frequency domain feature of the first speech signal, and the first power spectrum is a power spectrum of a noise signal in the first speech signal; determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal, wherein the voiced signal is a signal with a cepstral coefficient greater than or equal to a preset threshold in the second speech signal; and determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed, and performing gain compensation on the second speech signal based on the damage compensation gain; wherein the determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal comprises: performing homomorphic positive analysis processing on the second speech signal to obtain a target cepstral coefficient of the second speech signal: determining a maximum cepstral coefficient in the target cepstral coefficient, and determining a signal corresponding to the maximum cepstral coefficient in the second speech signal as the voiced signal; and performing gain amplification processing on the maximum cepstral coefficient, to perform gain compensation on the voiced signal. 2 . The method according to claim 1 , wherein before the performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum, the method further comprises: performing a short-time Fourier transform on the first speech signal to obtain the first time-frequency spectrum; determining a power spectrum of the first speech signal according to the first time-frequency spectrum, and determining a target power spectrum in the power spectrum of the first speech signal, wherein the target power spectrum is a power spectrum of a signal with a smallest power spectrum in signals within a preset time window; and performing recursive smoothing processing on the target power spectrum to obtain the first power spectrum. 3 . The method according to claim 1 , wherein the performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum comprises: determining a posterior signal-to-noise ratio corresponding to the first speech signal according to the first power spectrum and the power spectrum of the first speech signal, and performing recursive smoothing processing on the posterior signal-to-noise ratio to obtain a prior signal-to-noise ratio corresponding to the first speech signal; determining a target noise reduction gain according to the posterior signal-to-noise ratio and the prior signal-to-noise ratio; and performing noise reduction processing on the first speech signal according to the first time-frequency spectrum and the target noise reduction gain. 4 . The method according to claim 1 , wherein the determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed comprises: performing homomorphic inverse analysis processing on a first cepstral coefficient and the maximum cepstral coefficient on which the gain amplification processing has been performed, to obtain a first logarithmic time-frequency spectrum, wherein the first cepstral coefficient is a cepstral coefficient in the target cepstral coefficient other than the maximum cepstral coefficient; and determining a logarithmic time-frequency spectrum of the second speech signal according to a time-frequency spectrum of the second speech signal, and determining the damage compensation gain according to a difference between the first logarithmic time-frequency spectrum and the logarithmic time-frequency spectrum of the second speech signal. 5 . The method according to claim 1 , wherein the second speech signal is a signal obtained by performing noise reduction processing on a target frequency domain signal, and the target frequency domain signal is a signal obtained by performing a short-time Fourier transform on the first speech signal; and after the performing gain compensation on the second speech signal based on the damage compensation gain, the method further comprises: performing time-frequency inverse transform processing on the second speech signal on which the gain compensation has been performed, to obtain a target time domain signal, and outputting the target time domain signal. 6 . A chip, comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the speech signal enhancement method according to claim 1 . 7 . An electronic device, comprising a processor, a memory, and a program or an instruction stored in the memory and runnable on the processor, wherein the program or the instruction is executed by the processor to implement: performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum to obtain a second speech signal, wherein the first time-frequency spectrum is used to indicate a time domain feature and a frequency domain feature of the first speech signal, and the first power spectrum is a power spectrum of a noise signal in the first speech signal; determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal, wherein the voiced signal is a signal with a cepstral coefficient greater than or equal to a preset threshold in the second speech signal; and determining a damage compensation gain of the second speech signal according to the voiced signal on which the gain compensation has been performed, and performing gain compensation on the second speech signal based on the damage compensation gain; wherein the determining a voiced signal in the second speech signal, and performing gain compensation on the voiced signal comprises: performing homomorphic positive analysis processing on the second speech signal to obtain a target cepstral coefficient of the second speech signal; determining a maximum cepstral coefficient in the target cepstral coefficient, and determining a signal corresponding to the maximum cepstral coefficient in the second speech signal as the voiced signal; and performing gain amplification processing on the maximum cepstral coefficient, to perform gain compensation on the voiced signal. 8 . The electronic device according to claim 7 , wherein before the performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum, the method further comprises: performing a short-time Fourier transform on the first speech signal to obtain the first time-frequency spectrum; determining a power spectrum of the first speech signal according to the first time-frequency spectrum, and determining a target power spectrum in the power spectrum of the first speech signal, wherein the target power spectrum is a power spectrum of a signal with a smallest power spectrum in signals within a preset time window; and performing recursive smoothing processing on the target power spectrum to obtain the first power spectrum. 9 . The electronic device according to claim 7 , wherein the performing noise reduction processing on a first speech s

Assignees

Inventors

Classifications

  • the extracted parameters being power information · CPC title

  • Discriminating between voiced and unvoiced parts of speech signals (G10L25/90 takes precedence) · CPC title

  • the extracted parameters being the cepstrum · CPC title

  • Processing in the frequency domain · CPC title

  • Processing in the time domain · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12597433B2 cover?
A speech signal enhancement method includes: performing noise reduction processing on a first speech signal according to a first time-frequency spectrum and a first power spectrum to obtain a second speech signal, where the first time-frequency spectrum is used to indicate a time domain feature and a frequency domain feature of the first speech signal, and the first power spectrum is a power sp…
Who is the assignee on this patent?
Vivo Mobile Communication Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L21/0232. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 07 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).