Voice enhancement method, apparatus and system, and computer-readable storage medium

US12469511B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12469511-B2
Application numberUS-202118263357-A
CountryUS
Kind codeB2
Filing dateJun 30, 2021
Priority dateJan 28, 2021
Publication dateNov 11, 2025
Grant dateNov 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed are a voice enhancement method, apparatus and system and a computer-readable storage medium. The method includes acquiring a time-domain microphone signal and a time-domain bone conduction signal at the current moment; determining whether the signals are voice signals, if yes, performing a noise cancellation processing to the time-domain microphone signal by a pre-established DNN noise cancellation model, performing a frequency-domain noise cancellation processing to the time-domain bone conduction signal, if not, setting an output signal at the current moment as zero; performing a high-pass filtering processing to the time-domain microphone signal from which noise has been cancelled, to obtain a first output time-domain signal, performing a low-pass filtering processing to the time-domain bone conduction signal from which noise has been cancelled, to obtain a second output time-domain signal; obtaining an output time-domain signal at the current moment according to the first and second output time-domain signals.

First claim

Opening claim text (preview).

What is claimed is: 1 . A voice enhancement method, comprising: acquiring a time-domain microphone signal and a time-domain bone conduction signal at the current moment; determining whether the time-domain microphone signal and the time-domain bone conduction signal are voice signals, if the time-domain microphone signal and the time-domain bone conduction signal are voice signals, performing a noise cancellation processing to the time-domain microphone signal by a pre-established DNN noise cancellation model so as to obtain a time-domain microphone signal from which noise has been cancelled, and performing a frequency-domain noise cancellation processing to the time-domain bone conduction signal so as to obtain a time-domain bone conduction signal from which noise has been cancelled, if the time-domain microphone signal and the time-domain bone conduction signal are not voice signals, setting an output signal at the current moment as zero; performing a high-pass filtering processing to the time-domain microphone signal from which noise has been cancelled, so as to obtain a first output time-domain signal, and performing a low-pass filtering processing to the time-domain bone conduction signal from which noise has been cancelled, so as to obtain a second output time-domain signal; and obtaining an output time-domain signal at the current moment according to the first output time-domain signal and the second output time-domain signal. 2 . The voice enhancement method of claim 1 , wherein performing a frequency-domain noise cancellation processing to the time-domain bone conduction signal, so as to obtain a time-domain bone conduction signal from which noise has been cancelled comprises: converting the time-domain bone conduction signal into a frequency-domain bone conduction signal through time-to-frequency transformation; performing a frequency-domain noise cancellation processing to the frequency-domain bone conduction signal so as to obtain a frequency-domain bone conduction signal from which noise has been cancelled; and determining whether a bandwidth of the frequency-domain bone conduction signal from which noise has been cancelled reaches a preset bandwidth, if the bandwidth of the frequency-domain bone conduction signal from which noise has been cancelled reaches the preset bandwidth, directly performing frequency-to-time inverse transformation to the frequency-domain bone conduction signal from which noise has been cancelled so as to obtain the time-domain bone conduction signal from which noise has been cancelled, if the bandwidth of the frequency-domain bone conduction signal from which noise has been cancelled does not reach the preset bandwidth, expanding the bandwidth of the frequency-domain bone conduction signal from which noise has been cancelled by using a pre-established DNN bandwidth expanding model so that the expanded bandwidth reaches the preset bandwidth, and performing frequency-to-time transformation to the expanded frequency-domain bone conduction signal so as to obtain the time-domain bone conduction signal from which noise has been cancelled. 3 . The voice enhancement method of claim 1 , wherein performing a noise cancellation processing to the time-domain microphone signal by a pre-established DNN noise cancellation model so as to obtain a time-domain microphone signal from which noise has been cancelled comprises: performing a time-to-frequency transformation to the time-domain microphone signal to obtain a corresponding frequency-domain microphone signal; extracting a first signal feature of the frequency-domain microphone signal, and processing the first signal feature by using the pre-established DNN noise cancellation model, so as to obtain first gains corresponding to first frequency points of the frequency-domain microphone signal respectively; calculating the product of spectral signals corresponding to the first frequency points in the frequency-domain microphone signal and corresponding first gains, to obtain spectral signals from which noise has been cancelled corresponding to the first frequency points respectively, so as to obtain a frequency-domain microphone signal from which noise has been cancelled; and performing a frequency-to-time transformation to the frequency-domain microphone signal from which noise has been cancelled to obtain the time-domain microphone signal from which noise has been cancelled. 4 . The voice enhancement method of claim 1 , wherein determining whether the time-domain microphone signal and the time-domain bone conduction signal are voice signals comprises: performing a voice activation detection to the time-domain bone conduction signal to determine whether the time-domain bone conduction signal is a voice signal; and when the time-domain bone conduction signal is a voice signal, the time-domain microphone signal is a voice signal. 5 . The voice enhancement method of claim 4 , wherein performing a voice activation detection to the time-domain bone conduction signal to determine whether the time-domain bone conduction signal is a voice signal comprises: calculating a zero-crossing rate and a pitch period corresponding to the time-domain bone conduction signal; performing time-to-frequency transformation to the time-domain bone conduction signal to obtain a frequency-domain bone conduction signal; calculating a spectral energy and a spectral centroid corresponding to the frequency-domain bone conduction signal; comprehensively determining the zero-crossing rate, the pitch period, the spectral energy and the spectral centroid to obtain a voice activation detection flag bit corresponding to the time-domain bone conduction signal; and determining whether the time-domain bone conduction signal is a voice signal according to the voice activation detection flag bit. 6 . The voice enhancement method of claim 5 , wherein comprehensively determining the zero-crossing rate, the pitch period, the spectral energy and the spectral centroid to obtain a voice activation detection flag bit corresponding to the time-domain bone conduction signal comprises: determining whether the spectrum energy is less than a first preset value, if the spectrum energy is less than the first preset value, the voice activation detection flag bit corresponding to the time-domain bone conduction signal is 0, if the spectrum energy is not less than the first preset value, proceed to a next step for determination; determining whether the zero-crossing rate is greater than a second preset value, if the zero-crossing rate is greater than the second preset value, the voice activation detection flag bit corresponding to the time-domain bone conduction signal is 0, if the zero-crossing rate is not greater than the second preset value, proceed to a next step for determination; determining whether the pitch period is greater than a third preset value or less than a fourth preset value, if the pitch period is greater than the third preset value or less than the fourth preset value, the voice activation detection flag bit corresponding to the time-domain bone conduction signal is 0, if the pitch period is not greater than the third preset value and not less than the fourth preset value, proceed to a next step for determination; determining whether the spectral centroid is greater than a fifth preset value, if the spectral centroid is greater than the fifth preset value, the voice activation detection flag bit corresponding to the time-domain bone conduction signal is 0, if the spectral centroid is not greater than the fifth preset value, the voice activation detection flag bit corresponding to the time-domain bone conduction signal is 1; and determining whether the time-domain bone conduction signal is a voice signal according to the voice activation

Assignees

Inventors

Classifications

  • characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques · CPC title

  • Processing in the frequency domain · CPC title

  • Electric hearing aids · CPC title

  • acting directly on the eardrum, the ossicles or the skull, e.g. mastoid, tooth, maxillary or mandibular bone, or mechanically stimulating the cochlea, e.g. at the oval window · CPC title

  • Aspects relating to mechanical or electronic switches or control elements, e.g. functioning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12469511B2 cover?
Disclosed are a voice enhancement method, apparatus and system and a computer-readable storage medium. The method includes acquiring a time-domain microphone signal and a time-domain bone conduction signal at the current moment; determining whether the signals are voice signals, if yes, performing a noise cancellation processing to the time-domain microphone signal by a pre-established DNN nois…
Who is the assignee on this patent?
Goertek Inc
What technology area does this patent fall under?
Primary CPC classification G10L21/0308. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).