Audio signal encoding and decoding method using a neural network model to generate a quantized latent vector, and encoder and decoder for performing the same

US12205605B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12205605-B2
Application numberUS-202217670172-A
CountryUS
Kind codeB2
Filing dateFeb 11, 2022
Priority dateApr 15, 2021
Publication dateJan 21, 2025
Grant dateJan 21, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An audio signal encoding and decoding method using a neural network model, and an encoder and decoder for performing the same are disclosed. A method of encoding an audio signal using a neural network model, the method may include identifying an input signal, generating a quantized latent vector by inputting the input signal into a neural network model encoding the input signal, and generating a bitstream corresponding to the quantized latent vector, wherein the neural network model may include i) a feature extraction layer generating a latent vector by extracting a feature of the input signal, ii) a plurality of downsampling blocks downsampling the latent vector, and iii) a plurality of quantization blocks performing quantization of a downsampled latent vector.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of encoding an audio signal using a neural network model, the method comprising: identifying an input signal; generating a quantized latent vector by inputting the input signal into the neural network model encoding the input signal; and generating a bitstream corresponding to the quantized latent vector, wherein the neural network model comprises i) a feature extraction layer generating a latent vector by extracting a feature of the input signal, ii) a plurality of downsampling blocks producing a plurality of downsampled latent vectors corresponding to downsamplings of the latent vector respectively, and iii) a plurality of quantization blocks performing quantization of the plurality of downsampled latent vectors respectively, and wherein each of the plurality of quantization blocks comprises a conversion layer converting the respective downsampled latent vector to produce a respective converted latent vector, and a vector quantization layer performing vector quantization on the respective converted latent vector based on a codebook. 2. The method of claim 1 , wherein the plurality of downsampled latent vectors respectively correspond to downsamplings of the quantized latent vector to different respective time resolutions. 3. The method of claim 1 , wherein the vector quantization layer performs vector quantization of the converted latent vector by determining a code in the codebook in a nearest distance from the converted latent vector. 4. The method of claim 1 , wherein a downsampling block of the plurality of downsampling blocks comprises a convolution layer performing a convolution operation and a maxpool layer processing a max-pooling operation on an operation result of the convolution layer. 5. The method of claim 4 , wherein the downsampling block further comprises a residual block increasing non-linearity of an operation result of the maxpool layer, and the residual block comprises a convolution layer performing a convolution operation, a batch normalization layer performing batch normalization, and an activation layer. 6. A method of decoding an audio signal using a neural network model, the method comprising: identifying a bitstream generated by an encoder; and generating an output signal by inputting the bitstream into the neural network model generating an output signal from the bitstream, wherein the neural network model comprises a plurality of inverse-quantization blocks extracting respective inverse-quantized latent vectors having different respective time resolutions from the bitstream, a plurality of upsampling blocks upsampling the inversely-quantized latent vectors to produce upsampled latent vectors, respectively, and a restoration layer generating an output signal from the upsampled latent vectors. 7. The method of claim 6 , wherein the plurality of upsampling blocks upsamples the inversely-quantized latent vectors in an ascending order of time resolutions, and a current upsampling block of the plurality of upsampling blocks upsamples a combination of i) a latent vector, among the inversely-quantized latent vectors, having a same time resolution as an upsampled latent vector produced by a previous upsampling block of the plurality of upsampling blocks and ii) the upsampled latent vector produced by the previous upsampling block. 8. The method of claim 6 , wherein the inverse-quantization block comprises a residual block increasing non-linearity of the latent vector, and a convolution layer performing a convolution operation. 9. An encoder for performing a method of encoding an audio signal using a neural network model, the encoder comprising: a processor configured to identify an input signal, generate a quantized latent vector by inputting the input signal to the neural network model encoding the input signal, and generate a bitstream corresponding to the quantized latent vector, wherein the neural network model comprises i) a feature extraction layer generating a latent vector by extracting a feature of the input signal, ii) a plurality of downsampling blocks producing a plurality of downsampled latent vector corresponding to downsamplings of the latent vector respectively, and iii) a plurality of quantization blocks performing quantization of the plurality of downsampled latent vectors respectively, and wherein each of the plurality of quantization blocks comprises a conversion layer converting the respective downsampled latent vector to produce a respective converted latent vector, and a vector quantization layer performing vector quantization on the respective converted latent vector based on a codebook. 10. The encoder of claim 9 , wherein the plurality of downsampled latent vectors respectively correspond to downsamplings of the quantized latent vector to different respective time resolutions. 11. The encoder of claim 9 , wherein the vector quantization layer performs vector quantization of the converted latent vector by determining a code in the codebook in a nearest distance from the converted latent vector. 12. The encoder of claim 9 , wherein a downsampling block of the plurality of downsampling blocks comprises a convolution layer performing a convolution operation and a maxpool layer processing a max-pooling operation on an operation result of the convolution layer. 13. The encoder of claim 12 , wherein the downsampling block further comprises a residual block increasing non-linearity of an operation result of the maxpool layer, and the residual block comprises a convolution layer performing a convolution operation, a batch normalization layer performing batch normalization, and an activation layer.

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Codebooks · CPC title

  • Multi-stage vector quantisation · CPC title

  • Backpropagation, e.g. using gradient descent · CPC title

  • Quantised networks; Sparse networks; Compressed networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12205605B2 cover?
An audio signal encoding and decoding method using a neural network model, and an encoder and decoder for performing the same are disclosed. A method of encoding an audio signal using a neural network model, the method may include identifying an input signal, generating a quantized latent vector by inputting the input signal into a neural network model encoding the input signal, and generating …
Who is the assignee on this patent?
Electronics & Telecommunications Res Inst, Univ Yonsei Iacf
What technology area does this patent fall under?
Primary CPC classification G10L19/038. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 21 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).