Sliding-window rate-distortion optimization in neural network-based video coding
US-2024323416-A1 · Sep 26, 2024 · US
US12587664B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12587664-B2 |
| Application number | US-202418758390-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2024 |
| Priority date | Sep 11, 2023 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for learned image compression implemented in an autoencoder includes: a) extracting from an image a latent space by the learnable encoder; b) quantizing the latent space by a quantizer to obtain a quantized latent space; c) entropy coding the quantized latent space by an entropy encoder to obtain a bitstream, wherein an entropy model used to encode the latent space is represented by a probability distribution; d) entropy decoding the bitstream by an entropy decoder to obtain an entropy decoded bitstream; e) feeding the entropy decoded bitstream to the decoder; f) recover a reconstructed image by the decoder; g) training the autoencoder via standard gradient descent of the backpropagated error gradient by finding learnable parameters of the learnable encoder and of the decoder that minimize a rate distortion cost function, wherein the entropy encoder is based on a differentiable formulation of a soft frequency counter.
Opening claim text (preview).
The invention claimed is: 1 . A method for learned image compression implemented in an autoencoder comprising a learnable encoder (f a ) and a decoder (f s ), said method comprising the steps of: a) extracting from an image (x) a latent space (y) by means of said learnable encoder (f a ); b) quantizing said latent space (y) by means of a quantizer (U|Q) to obtain a quantized latent space (ŷ); c) entropy coding said quantized latent space (ŷ) by means of an entropy encoder to obtain a bitstream, wherein an entropy model used to encode said latent space (y) is represented by a probability distribution p ŷ ; d) entropy decoding said bitstream by means of an entropy decoder to obtain an entropy decoded bitstream; e) feeding said entropy decoded bitstream to said decoder (f s ); f) recover a reconstructed image ({circumflex over (x)}) by means of said decoder (f s ); g) training said autoencoder via standard gradient descent of the backpropagated error gradient by finding learnable parameters (θ f ,θ g ) of said learnable encoder (f a ) and of said decoder (f s ) that minimize a rate distortion cost function L, wherein said entropy encoder is based on a differentiable formulation of a soft frequency counter (SFC). 2 . The method according to claim 1 , wherein said latent space (y) comprises a number N c of latent space channels having a dimension N d , and wherein, given a j-th channel of said latent space (y) and a quantization level l i j of said j-th channel, the soft frequency counter (SFC) associates every value of said latent space y n j to a weight inversely proportional to the distance with l i j , where n varies within the same channel and ranges from 1 to N d . 3 . The method according to claim 2 , wherein said soft frequency counter (SFC) relies on a scalar function φ i j and wherein a first order entropy H {tilde over (p)} of a probability distribution {acute over (p)} j for every single channel of said latent space is: H p ~ = - 1 N c ∑ j = 1 N c H p ~ j = - 1 N c ∑ j = 1 N c ∑ i = 1 L SFC ( l i j ) log 2 [ SFC ( l i j ) ] where SFC ( l i j ) = ∑ n = 1 N d ϕ i j ( y n j ) ∑ m = 1 L ∑ n = 1 N d ϕ m j ( y n j ) and ϕ i j
Auto-encoder networks; Encoder-decoder networks · CPC title
Entropy coding, e.g. variable length coding [VLC] or arithmetic coding · CPC title
Quantisation · CPC title
according to rate distortion criteria (rate-distortion as a criterion for motion estimation H04N19/567) · CPC title
Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.