Instance-adaptive image and video compression using machine learning systems
US-2022103839-A1 · Mar 31, 2022 · US
US12120359B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12120359-B2 |
| Application number | US-202217704692-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 25, 2022 |
| Priority date | Apr 8, 2021 |
| Publication date | Oct 15, 2024 |
| Grant date | Oct 15, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system processing hardware executes a machine learning (ML) model-based video compression encoder to receive uncompressed video content and corresponding motion compensated video content, compare the uncompressed and motion compensated video content to identify an image space residual, transform the image space residual to a latent space representation of the uncompressed video content, and transform, using a trained image compression ML model, the motion compensated video content to a latent space representation of the motion compensated video content. The ML model-based video compression encoder further encodes the latent space representation of the image space residual to produce an encoded latent residual, encodes, using the trained image compression ML model, the latent space representation of the motion compensated video content to produce an encoded latent video content, and generates, using the encoded latent residual and the encoded latent video content, a compressed video content corresponding to the uncompressed video content.
Opening claim text (preview).
What is claimed is: 1. A system comprising: a computing platform including a processing hardware and a system memory storing a machine learning (ML) model-based video compression encoder and a trained image compression ML model; the processing hardware configured to execute the ML model-based video compression encoder to: receive an uncompressed video content and a motion compensated video content corresponding to the uncompressed video content; compare the uncompressed video content with the motion compensated video content to identify an image space residual corresponding to the uncompressed video content; transform the image space residual to a latent space representation of the image space residual; receive, using the trained image compression ML model, the motion compensated video content; transform, using the trained image compression ML model, the motion compensated video content to a latent space representation of the motion compensated video content; encode the latent space representation of the image space residual to produce an encoded latent residual; encode, using the trained image compression ML model, the latent space representation of the motion compensated video content to produce an encoded latent video content; and generate, using the encoded latent residual and the encoded latent video content, a compressed video content corresponding to the uncompressed video content. 2. The system of claim 1 , wherein the encoded latent residual and the encoded latent video content are produced in parallel. 3. The system of claim 1 , wherein the processing hardware is configured to execute the ML model-based video compression encoder to generate the compressed video content corresponding to the uncompressed video content based on a difference between the encoded latent residual and the encoded latent video content. 4. The system of claim 1 , wherein the trained image compression ML model comprises a trained artificial neural network (NN). 5. The system of claim 4 , wherein the trained NN is trained using an objective function including an adversarial loss. 6. The system of claim 4 , wherein the trained NN comprises a generative adversarial network (GAN). 7. A method for use by a system including a computing platform having a processing hardware and a system memory storing a machine learning (ML) model-based video compression encoder and a trained image compression ML model, the method comprising: receiving, by the ML model-based video compression encoder executed by the processing hardware, an uncompressed video content and a motion compensated video content corresponding to the uncompressed video content; comparing, by the ML model-based video compression encoder executed by the processing hardware, the uncompressed video content with the motion compensated video content, thereby identifying an image space residual corresponding to the uncompressed video content; transforming, by the ML model-based video compression encoder executed by the processing hardware, the image space residual to a latent space representation of the image space residual; receiving, by the trained image compression ML model executed by the processing hardware, the motion compensated video content; transforming, by the trained image compression ML model executed by the processing hardware, the motion compensated video content to a latent space representation of the motion compensated video content; encoding, by the ML model-based video compression encoder executed by the processing hardware, the latent space representation of the image space residual to produce an encoded latent residual; encoding, by the trained image compression ML model executed by the processing hardware, the latent space representation of the motion compensated video content to produce an encoded video content; and generating, by the ML model-based video compression encoder executed by the processing hardware and using the encoded latent residual and the encoded latent video content, a compressed video content corresponding to the uncompressed video content. 8. The method of claim 7 , wherein the encoded latent residual and the encoded latent video content are produced in parallel. 9. The method of claim 7 , wherein the compressed video content corresponding to the uncompressed video content is generated based on a difference between the encoded latent residual and the encoded latent video content. 10. The method of claim 7 , wherein the trained image compression ML model comprises a trained artificial neural network (NN). 11. The method of claim 10 , wherein the trained NN is trained using an objective function including an adversarial loss. 12. The method of claim 7 , wherein the trained NN comprises a generative adversarial network (GAN). 13. A system comprising: a computing platform including a processing hardware and a system memory storing a machine learning (ML) model-based video compression encoder and a trained image compression ML model; the processing hardware configured to execute the ML model-based video compression encoder to: receive, using the trained image compression ML model, an uncompressed video content and a motion compensated video content corresponding to the uncompressed video content; transform, using the trained image compression ML model, the uncompressed video content to a first latent space representation of the uncompressed video content; transform, using the trained image compression ML model, the motion compensated video content to a second latent space representation of the uncompressed video content; and generate a bitstream for transmitting a compressed video content corresponding to the uncompressed video content based on the first latent space representation and the second latent space representation. 14. The system of claim 13 , wherein the processing hardware is further configured to execute the ML model-based video compression encoder to: determine, using the first latent space representation and the second latent space representation, a latent space residual. 15. The system of claim 14 , wherein the processing hardware is further configured to execute the ML model-based video compression encoder to: generate the bitstream for transmitting the compressed video content corresponding to the uncompressed video content using the latent space residual. 16. The system of claim 14 , wherein the latent space residual is based on a difference between the first latent space representation and the second latent space representation. 17. The system of claim 13 , wherein the transformation of the uncompressed video content to the first latent space representation, and the transformation of the motion compensated video content to the second latent space representation, are performed in parallel. 18. The system of claim 13 , wherein the trained image compression ML model comprises a trained artificial neural network (NN). 19. The system of claim 13 , wherein the trained NN is trained using an objective function including an adversarial loss. 20. The system of claim 13 , wherein the trained NN comprises a generative adversarial network (GAN).
Supervised learning · CPC title
Adversarial learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.