Hybrid spatio-temporal neural models for video compression
US-2025191353-A1 · Jun 12, 2025 · US
US12598317B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12598317-B2 |
| Application number | US-202318532638-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 7, 2023 |
| Priority date | Dec 7, 2023 |
| Publication date | Apr 7, 2026 |
| Grant date | Apr 7, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for decoding visual content using a hybrid framework based on convolutional and neural radiance networks. A decoder receives bitstreams of model parameters, a sequence level representation, and a cross-resolution representation for reconstructing a sequence of frames. The model parameters comprise neural radiance network parameters. The decoder decodes the bitstreams of the model parameters, the sequence level representation, and the cross-resolution representation. The decoder generates, via a channel transformer, a combined representation based on the sequence level representation and the cross-resolution representation. The decoder adapts a neural network model based on the neural radiance network parameters. The decoder reconstructs the sequence of frames by determining, via the adapted neural network model based on the combined representation, pixel attribute information for each frame of the reconstructed sequence of frames. The decoder generates, for display at a client device, the reconstructed sequence of frames.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, at a client device, bitstreams of a plurality of model parameters, a sequence level representation, and a cross-resolution representation for reconstructing a sequence of frames, wherein the plurality of model parameters comprises neural radiance network parameters; decoding the bitstreams of the plurality of model parameters, the sequence level representation, and the cross-resolution representation; generating, via a channel transformer, a combined representation based on the sequence level representation and the cross-resolution representation; adapting a neural network model based on the neural radiance network parameters; reconstructing the sequence of frames by determining, via the adapted neural network model based on the combined representation, pixel attribute information for each frame of the reconstructed sequence of frames; and generating, for display at the client device, the reconstructed sequence of frames, wherein the cross-resolution representation comprises latent features corresponding to each frame of the sequence of frames. 2 . The method of claim 1 , wherein the adapted neural network model is trained to output pixel attribute information based on a plurality of pixel coordinates. 3 . The method of claim 2 , wherein the plurality of pixel coordinates comprises a timeline corresponding to the sequence of frames. 4 . The method of claim 1 , wherein determining, via the adapted neural network model, pixel attribute information for each frame of the reconstructed sequence of frames comprises determining a plurality of color values corresponding to a pixel of each frame based on respective spatio-temporal coordinates of the pixel. 5 . The method of claim 1 , wherein adapting the neural network model based on the plurality of neural radiance network parameters comprises selecting a network configuration from a plurality of pre-determined network configurations. 6 . The method of claim 5 , wherein the network configuration is selected based on a target reconstruction quality for the reconstructed sequence of frames. 7 . The method of claim 1 , wherein the client device is an extended reality (XR) device, and wherein the sequence of frames collectively defines a field of view (FoV) at the XR device. 8 . The method of claim 7 , wherein the sequence of frames corresponds to one or more display refresh cycles at the XR device. 9 . A system comprising: communications circuitry configured to receive, at a client device, bitstreams of a plurality of model parameters, a sequence level representation, and a cross-resolution representation for reconstructing a sequence of frames, wherein the plurality of model parameters comprises neural radiance network parameters; and control circuitry configured to: decode the bitstreams of the plurality of model parameters, the sequence level representation, and the cross-resolution representation; generate, via a channel transformer, a combined representation based on the sequence level representation and the cross-resolution representation; adapt a neural network model based on the neural radiance network parameters; reconstruct the sequence of frames by determining, via the adapted neural network model based on the combined representation, pixel attribute information for each frame of the reconstructed sequence of frames; and generate, for display at the client device, the reconstructed sequence of frames, wherein the cross-resolution representation comprises latent features corresponding to each frame of the sequence of frames. 10 . The system of claim 9 , wherein the sequence of frames corresponds to a first resolution, and wherein the control circuitry is further configured to use a convolutional network model comprising a plurality of residual spatial attention blocks to generate the combined representation at the first resolution. 11 . The system of claim 10 , wherein the control circuitry is configured to output, using the adapted neural network model, pixel attribute information based on a plurality of pixel coordinates. 12 . The system of claim 11 , wherein the plurality of pixel coordinates comprises a timeline corresponding to the sequence of frames. 13 . The system of claim 9 , wherein the control circuitry is further configured to determine a plurality of color values corresponding to a pixel of each frame based on respective spatio-temporal coordinates of the pixel. 14 . The system of claim 13 , wherein the control circuitry is further configured to select a network configuration from a plurality of pre-determined network configurations. 15 . The system of claim 14 , wherein the control circuitry is configured to select the network configuration based on a target reconstruction quality for the reconstructed sequence of frames. 16 . The system of claim 9 , wherein the client device is an extended reality (XR) device, and the sequence of frames collectively defines a field of view (FoV) at the XR device. 17 . The system of claim 16 , wherein the sequence of frames corresponds to one or more display refresh cycles at the XR device. 18 . A method comprising: receiving, at a client device, bitstreams of a plurality of model parameters, a sequence level representation, and a cross-resolution representation for reconstructing a sequence of frames, wherein the plurality of model parameters comprises neural radiance network parameters; decoding the bitstreams of the plurality of model parameters, the sequence level representation, and the cross-resolution representation; generating, via a channel transformer, a combined representation based on the sequence level representation and the cross-resolution representation; adapting a neural network model based on the neural radiance network parameters; reconstructing the sequence of frames by determining, via the adapted neural network model based on the combined representation, pixel attribute information for each frame of the reconstructed sequence of frames; and generating, for display at the client device, the reconstructed sequence of frames, wherein the sequence of frames corresponds to a first resolution, the method further comprising using a convolutional network model comprising a plurality of residual spatial attention blocks to generate the combined representation at the first resolution. 19 . The method of claim 18 , wherein the cross-resolution representation comprises latent features corresponding to each frame of the sequence of frames. 20 . The method of claim 18 , wherein the adapted neural network model is trained to output pixel attribute information based on a plurality of pixel coordinates.
the region being a picture, frame or field · CPC title
characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation (H04N19/635 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.