Systems and methods for generating a latent space residual
US-11012718-B2 · May 18, 2021 · US
US11375194B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11375194-B2 |
| Application number | US-202017017020-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 10, 2020 |
| Priority date | Nov 16, 2019 |
| Publication date | Jun 28, 2022 |
| Grant date | Jun 28, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and method for video compression using conditional entropy coding. An ordered sequence of image frames can be transformed to produce an entropy coding for each image frame. Each of the entropy codings provide a compressed form of image information based on a prior image frame and a current image frame (the current image frame occurring after the prior image frame). In this manner, the compression model can capture temporal relationships between image frames or encoded representations of the image frames using a conditional entropy encoder trained to approximate the joint entropy between frames in the image frame sequence.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for encoding a video that comprises at least two image frames having a sequential order, the method comprising: encoding, using an encoder model, a prior image frame of the at least two image frames to generate a first latent representation; encoding, using the encoder model, a current image frame that occurs after the prior image frame based on the sequential order to generate a second latent representation; determining, and using a hyperprior encoder model, a hyperprior code based on the first latent representation and the second latent representation, wherein the hyperprior code is indicative of differences between the current image frame and the prior image frame, the prior image frame occurring before the current image frame in the sequential order; one or more of the encoder model and the hyperprior encoder model having been trained with a loss function comprising a term associated with a probability of determining the second latent representation, given the first latent representation and the hyperprior code; determining, using a hyperprior decoder model, one or more conditional probability parameters based on the first latent representation and the hyperprior code; generating, using an entropy coder, an entropy coding of the current image frame based on the one or more conditional probability parameters and the second latent representation; and storing the entropy coding and the hyperprior code. 2. The computer-implemented method of claim 1 , further comprising: encoding, using the encoder model, a third image frame of the at least two image frames that occurs after the current image frame to generate a third latent representation. 3. The computer-implemented method of claim 1 , wherein the current image frame occurs immediately after the prior image frame. 4. The computer-implemented method of claim 1 , further comprising: performing internal learning to optimize the second latent representation, the hyperprior code, or both the second latent representation and the hyperprior code. 5. The computer-implemented method of claim 4 , wherein performing internal learning comprises: setting as learnable parameters one or more of the second latent representation, the hyperprior code, or both the second latent representation and the hyperprior code; modifying the learnable parameters to reduce the loss function, the loss function evaluating one or both of: a difference between the current image frame and a decoded image frame generated from the entropy coding of the current image frame; and the probability of determining the second latent representation, given the first latent representation and the hyperprior code. 6. The computer-implemented method of claim 5 , wherein modifying the learnable parameters to reduce the loss function comprises: backpropagating gradients for the learnable parameters over a number of iterations; and updating values for one or more of the learnable parameters at one or more iterations of the number of iterations; wherein during said modifying, all hyperprior decoder model and decoder model parameters are fixed. 7. The computer-implemented method of claim 1 , wherein the hyperprior encoder model comprises a trained neural network. 8. The computer-implemented method of claim 1 , wherein: determining, using the hyperprior encoder model, the hyperprior code is based only on image information included in the first latent representation and the second latent representation. 9. A computer-implemented method for decoding a video that comprises two or more image frames having a sequential order, the method comprising: for the two or more image frames, respectively; obtaining a hyperprior code for a current image frame and a decoded version of a latent representation of a previous sequential image frame, wherein the hyperprior code is indicative of differences between the current image frame and the previous sequential image frame, the previous sequential image frame occurring before the current image frame in the sequential order; determining, using a hyperprior decoder model, one or more conditional probability parameters for the current image frame based at least in part on the hyperprior code for the current image frame and the decoded version of the latent representation of the previous sequential image frame, wherein the hyperprior decoder model has been trained with a loss function comprising a term associated with a probability of determining a latent representation of the current image frame, given the latent representation of the previous sequential image frame and the hyperprior code; decoding, using the one or more conditional probability parameters for the current frame, an entropy code for the current image frame to obtain a decoded version of the latent representation of the current image frame; and providing the decoded version of the latent representation of the current image frame for use in decoding a next entropy code for a next sequential image frame. 10. The computer-implemented method of claim 9 , further comprising: decoding, using a decoder model, the decoded version of a latent representation of the current image frame to obtain a reconstructed version of the current image frame. 11. One or more non-transitory computer-readable media that store: a video compression model, the video compression model comprising: a hyperprior encoder model, the hyperprior encoder model having been trained with a loss function comprising a term associated with a probability of determining a second latent representation, given a first latent representation and a hyperprior code; and a hyperprior decoder model; and instructions for performing encoding comprising: obtaining a video comprising an ordered sequence of image frames; determining a latent representation for at least two sequential image frames in the ordered sequence, wherein the latent representation for the at least two sequential image frames includes the first latent representation associated with a prior image frame and the second latent representation associated with a current image frame; generating the hyperprior code for the at least two sequential image frames by providing the first latent representation and the second latent representation to the hyperprior encoder model, wherein the hyperprior code is indicative of differences between the current image frame and the prior image frame; generating one or more conditional probability parameters for the at least two sequential image frames by providing the hyperprior code associated with the current image frame and the first latent representation to the hyperprior decoder model; and determining an entropy coding for the at least two sequential image frames by providing the conditional probability parameters for the current image frame and the first latent representation associated with the prior image frame to an entropy coder. 12. The one or more non-transitory computer-readable media of claim 11 , wherein the one or more non-transitory computer-readable media further store: an encoder model and a decoder model, and wherein determining the latent representation for the at least two sequential image frames in the ordered sequence comprises: encoding, using the encoder model, the at least two sequential image frames in the ordered sequence. 13. The one or more non-transitory computer-readable media of claim 11 , wherein the one or more non-transitory computer-readable media further store: instructions for performing decoding comprising: obtaining the hyperprior code for the current image frame and a decoded versio
Combinations of networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Supervised learning · CPC title
Incoming video signal characteristics or properties · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.