Machine learning techniques for video downsampling
US-2022198607-A1 · Jun 23, 2022 · US
US12278969B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12278969-B2 |
| Application number | US-202318230409-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 4, 2023 |
| Priority date | Oct 13, 2021 |
| Publication date | Apr 15, 2025 |
| Grant date | Apr 15, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system includes a machine learning (ML) model-based video downsampler configured to receive an input video sequence having a first display resolution, and to map the input video sequence to a lower resolution video sequence having a second display resolution lower than the first display resolution. The system also includes a neural network-based (NN-based) proxy video codec configured to transform the lower resolution video sequence into a decoded proxy bitstream. In addition, the system includes an upsampler configured to produce an output video sequence using the decoded proxy bitstream.
Opening claim text (preview).
What is claimed is: 1. A video processing system comprising: an upsampler; a video codec; a trained machine learning (ML) model-based video downsampler trained using a neural network-based (NN-based) proxy video codec; and a processing hardware configured to: receive an input video sequence having a first display resolution; extract a content sample of the input video sequence; map, using the trained ML model-based video downsampler, the content sample to a lower resolution sample; transform, using one of the video codec or the NN-based proxy video codec, the lower resolution sample into a decoded sample bitstream; predict, using the upsampler and the decoded sample bitstream, an output sample corresponding to the content sample; and modify, based on the predicted output sample, one or more parameters of the trained ML model-based video downsampler; wherein the ML model-based video downsampler is trained using the input video sequence, the output sample, and an objective function based on an estimated rate of the lower resolution sample and a plurality of perceptual loss functions. 2. The video processing system of claim 1 , wherein the NN-based proxy video codec is differentiable. 3. The video processing system of claim 1 , wherein modifying the one or more parameters of the trained ML model-based video downsampler renders the trained ML model-based video downsampler content adaptive. 4. The video processing system of claim 1 , wherein the trained ML model-based video downsampler is configured to support arbitrary scaling factors. 5. The system of claim 1 , wherein the objective function comprises the estimated rate of the lower resolution sample in combination with a weighted sum of the plurality of perceptual loss functions. 6. The system of claim 5 , wherein the ML model-based video downsampler is further configured to receive a plurality of weighting factors included in the weighted sum of the plurality of perceptual loss functions, and wherein the ML model-based video downsampler is trained further using the plurality of weighting factors. 7. The video processing system of claim 1 , further comprising a simulation module including the upsampler. 8. The video processing system of claim 1 , wherein the upsampler comprises an ML model-based upsampler. 9. The video processing system of claim 8 , wherein the ML model-based upsampler and the ML model-based video downsampler are trained concurrently. 10. A method for use by a video processing system including an upsampler, a video codec, and a trained machine learning (ML) model-based video downsampler trained using a neural network-based (NN-based) proxy video codec, the method comprising: receiving an input video sequence having a first display resolution; extracting a content sample of the input video sequence; mapping, using the trained ML model-based video downsampler, the content sample to a lower resolution sample; transforming, using one of the video codec or the NN-based proxy video codec, the lower resolution sample into a decoded sample bitstream; predicting, using the upsampler and the decoded sample bitstream, an output sample corresponding to the content sample; and modifying, based on the predicted output sample, one or more parameters of the trained ML model-based video downsampler; wherein the ML model-based video downsampler is trained using the input video sequence, the output sample, and an objective function based on an estimated rate of the lower resolution sample and a plurality of perceptual loss functions. 11. The method of claim 10 , wherein the NN-based proxy video codec is differentiable. 12. The method of claim 10 , wherein modifying the one or more parameters of the trained ML model-based video downsampler renders the trained ML model-based video downsampler content adaptive. 13. The method of claim 10 , wherein the trained ML model-based video downsampler is configured to support arbitrary scaling factors. 14. The method of claim 10 , wherein the objective function comprises the estimated rate of the lower resolution sample in combination with a weighted sum of the plurality of perceptual loss functions. 15. The method of claim 14 , wherein the ML model-based video downsampler is further configured to receive a plurality of weighting factors included in the weighted sum of the plurality of perceptual loss functions, and wherein the ML model-based video downsampler is trained further using the plurality of weighting factors. 16. The method of claim 10 , wherein the upsampler is included in a simulation module. 17. The method of claim 10 , wherein the upsampler comprises an ML model-based upsampler. 18. The method of claim 17 , wherein the ML model-based upsampler and the ML model-based video downsampler are trained concurrently.
using neural networks · CPC title
using neural networks · CPC title
Learning methods · CPC title
the unit being bits, e.g. of the compressed video stream · CPC title
Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.