Audio signal generation model and training method using generative adversarial network

US12548586B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12548586-B2
Application numberUS-202318097062-A
CountryUS
Kind codeB2
Filing dateJan 13, 2023
Priority dateFeb 22, 2022
Publication dateFeb 10, 2026
Grant dateFeb 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A generative adversarial network-based audio signal generation model for generating a high quality audio signal may comprise: a generator generating an audio signal with an external input; a harmonic-percussive separation model separating the generated audio signal into a harmonic component signal and a percussive component signal; and at least one discriminator evaluating whether each of the harmonic component signal and the percussive component signal is real or fake.

First claim

Opening claim text (preview).

What is claimed is: 1 . A generative adversarial network-based audio signal generation model executed by a processor to generate a high quality audio signal, the audio signal generation model comprising: a generator generating an audio signal with an external input; a harmonic-percussive separation model separating the generated audio signal into a harmonic component signal and a percussive component signal; a first discriminator evaluating whether the harmonic component signal is real or fake; and a second discriminator evaluating whether the percussive component signal is real or fake, wherein the first discriminator has a first kernel dilation factor greater than a second kernel dilation factor of the second discriminator, and the first discriminator has a first receptive field greater than a second receptive field of the second discriminator, wherein the generator is trained to minimize errors between samples of real signals and audio signals generated by the generator, using a restoration loss function applied to the generator, in a first phase training, and wherein the generator, the harmonic-percussive separation model, the first discriminator, and the second discriminator are adversarial trained through end-to-end learning, after the first phase training, in a second phase training. 2 . The signal generation model of claim 1 , wherein the generator and the at least one discriminator allow error backpropagation of a loss function. 3 . The signal generation model of claim 1 , wherein the harmonic-percussive separation model comprises: a short-time Fourier transform model converting the generated audio signal into a spectrogram; a harmonic masking model and a percussive masking model masking a harmonic component and a percussive component, respectively; and an inverse short-time Fourier transform module converting the masked spectrogram into the audio signal. 4 . A learning method of a generative adversarial network-based audio signal generation model executed by a processor, wherein the method comprising: (a) generating, by a generator, an audio signal; (b) separating the generated audio signal into a harmonic component signal and a percussive component signal using a harmonic-percussive separation model; (c) evaluating, by a first discriminator, whether the harmonic component signal is real or fake, and (d) evaluating, by a second discriminator, whether the percussive component signal is real or fake, wherein the first discriminator has a first kernel dilation factor greater than a second kernel dilation factor of the second discriminator, and the first discriminator has a first receptive field greater than a second receptive field of the second discriminator, wherein the generator is trained to minimize errors between samples of real signals and audio signals generated by the generator, using a restoration loss function applied to the generator, in a first phase training, and wherein (a) to (d) are performed repeatedly for the generator, the harmonic-percussive separation model, the first discriminator, and the second discriminator to learn in a backward propagation manner for adversarial training through end-to-end learning after the first phase training, as a second phase training. 5 . An apparatus for generating an audio signal using a generative adversarial network, the apparatus comprising: a memory configured to store at least one instruction; a processor configured to execute the at least one instruction stored in the memory, a generator generating an audio signal with an external input; a harmonic-percussive separation model separating the generated audio signal into a harmonic component signal and a percussive component signal; a first discriminator evaluating whether the harmonic component signal is real or fake; and a second discriminator evaluating whether the percussive component signal is real or fake, wherein the first discriminator has a first kernel dilation factor greater than a second kernel dilation factor of the second discriminator, and the first discriminator has a first receptive field greater than a second receptive field of the second discriminator, wherein the processor is configured to: train the generator to minimize errors between samples of real signals and audio signals generated by the generator, using a restoration loss function applied to the generator, in a first phase training, and adversarial train the generator, the harmonic-percussive separation model, the first discriminator, and the second discriminator, through end-to-end learning, after the first phase training, in a second phase training. 6 . The apparatus of claim 5 , wherein the generator and the at least one discriminator allow error backpropagation of a loss function.

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Non-supervised learning, e.g. competitive learning · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Activation functions · CPC title

  • G06N3/0475Primary

    Generative networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12548586B2 cover?
A generative adversarial network-based audio signal generation model for generating a high quality audio signal may comprise: a generator generating an audio signal with an external input; a harmonic-percussive separation model separating the generated audio signal into a harmonic component signal and a percussive component signal; and at least one discriminator evaluating whether each of the h…
Who is the assignee on this patent?
Electronics & Telecommunications Res Inst, Univ Yonsei Iacf
What technology area does this patent fall under?
Primary CPC classification G06N3/0475. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).