Collusion attack prevention
US-2024362739-A1 · Oct 31, 2024 · US
US2025330601A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025330601-A1 |
| Application number | US-202519016225-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jan 10, 2025 |
| Priority date | Apr 23, 2024 |
| Publication date | Oct 23, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods are provided for streaming free-viewpoint videos (FVV) of a dynamic 3D scene. Each time step of the dynamic 3D scene is represented as a set of 3D Gaussians, each 3D Gaussian having a set of Gaussian attributes. Gaussian residuals are encoded for every time step. In at least one embodiment, position residuals are sparsified and non-position attribute residuals are quantized, thereby achieving compression factors without sacrificing reconstruction quality.
Opening claim text (preview).
What is claimed is: 1 . A method for encoding a free-viewpoint video (FVV), the method comprising: receiving, as input, a first multi-view frame of a scene corresponding to a first time; generating, based on the first multi-view frame, a plurality of three-dimensional (3D) Gaussians that collectively represent the scene at the first time, wherein each 3D Gaussian includes a position attribute and one or more non-position attributes; receiving, as further input, a second multi-view frame of the scene corresponding to a second time; and determining, for each 3D Gaussian of the plurality of 3D Gaussians, a position residual and one or more non-position residuals, the determining comprising: modeling the position residuals using a sparsity framework, modeling the non-position residuals using a quantization framework, and learning the position residuals and the non-position residuals using machine learning techniques. 2 . The method according to claim 1 , wherein modeling the position residuals using the sparsity framework comprises representing each position residual as a product of a learnable gate and a learnable full-precision position residual. 3 . The method according to claim 2 , wherein modeling the position residuals using the sparsity framework further comprises initializing parameters of each learnable gate based on a score vector computed, at least in part, from a gradient of a reconstruction loss with respect to a position of a corresponding Gaussian at the second time and a gradient of a reconstruction loss with respect to a position of the corresponding Gaussian at the first time. 4 . The method according to claim 1 , wherein modeling the non-position residuals using the quantization framework comprises representing each non-position residual as a product of a learnable integer latent and a learnable codebook. 5 . The method according to claim 4 , wherein the one or more non-position attributes comprise: a rotation attribute, a scale attribute, an opacity attribute, and a color attribute, wherein the one or more non-position residuals comprise: a rotation residual, a scale residual, an opacity residual, and a color residual, and wherein each respective rotation residual is modeled as a product of a corresponding respective rotation integer latent and a common rotation codebook, each respective scale residual is modeled as a product of a corresponding respective scale integer latent and a common scale codebook, each respective opacity residual is modeled as a product of a corresponding respective opacity integer latent and a common opacity codebook, and each respective color residual is modeled as a product of a corresponding respective color integer latent and a common color codebook. 6 . The method according to claim 1 , wherein the learning the position residuals and the non-position residuals using machine learning techniques comprises: rendering, during a forward pass, an output image corresponding to a viewpoint from which a respective image of the second multi-view frame was captured, the rendering being performed using the position attributes, the non-position attributes, learnable position residuals, and learnable non-position residuals, computing a loss by comparing the output image to the respective image of the second multi-view frame, calculating, during a backward pass, gradients of the computed loss with respect to the learnable position residuals and the learnable non-position residuals, and updating the learnable position residuals and the learnable non-position residuals based on the calculated gradients. 7 . The method according to claim 6 , wherein the loss includes a reconstruction loss and a regularization loss, wherein the reconstruction loss measures a difference between the output image and the respective image of the second multi-view frame, and wherein the regularization loss decreases as the sparsity of the learnable position residuals increases. 8 . The method according to claim 6 , wherein the learning the position residuals and the non-position residuals further comprises defining, for the respective image of the second multi-view frame, static and dynamic regions, and wherein the rendering the output image is performed only for the dynamic regions of the respective image of the second multi-view frame. 9 . The method according to claim 1 , further comprising generating, based on the first multi-view frame, a point cloud, wherein the point cloud is generated using a structure-from-motion algorithm and enhanced using depth map provided via monocular depth estimation. 10 . A system for encoding a free-viewpoint video (FVV), the system comprising: processing circuitry configured to: receive, as input, a first multi-view frame of a scene corresponding to a first time; generate, based on the first multi-view frame, a plurality of three-dimensional (3D) Gaussians that collectively represent the scene at the first time, wherein each 3D Gaussian includes a position attribute and one or more non-position attributes, receive, as further input, a second multi-view frame of the scene corresponding to a second time, and determine, for each 3D Gaussian of the plurality of 3D Gaussians, a position residual and one or more non-position residuals, the processing circuitry being configured to determine the position residuals and the non-position residuals by: modeling the position residuals using a sparsity framework, modeling the non-position residuals using a quantization framework, and learning the position residuals and the non-position residuals using machine learning techniques; and one or more memories configured to store the first multi-view frame, the plurality of 3D Gaussians, the second multi-view frame, and the position residuals and the non-position residuals. 11 . The system according to claim 10 , the processing circuitry being configured to model the position residuals using the sparsity framework by representing each position residual as a product of a learnable gate and a learnable full-precision position residual. 12 . The system according to claim 11 , the processing circuitry being configured to model the position residuals using the sparsity framework by further initializing parameters of each learnable gate based on a score vector computed, at least in part, from a gradient of a reconstruction loss with respect to a position of a corresponding Gaussian at the second time and a gradient of a reconstruction loss with respect to a position of the corresponding Gaussian at the first time. 13 . The system according to claim 10 , the processing circuitry being configured to model the non-position residuals using the quantization framework by representing each non-position residual as a product of a learnable integer latent and a learnable codebook. 14 . The system according to claim 13 , wherein the one or more non-position attributes comprise: a rotation attribute, a scale attribute, an opacity attribute, and a color attribute, wherein the one or more non-position residuals comprise: a rotation residual, a scale residual, an opacity residual, and a color residual, and wherein the processing circuitry is configured to: model each respective rotation residual as a product of a corresponding respective rotation integer latent and a common rotation codebook, model each respective scale residual as a product of a corresponding respective scale integer latent and a common scale codebook, model each respective opacity residual as a product of a corresponding respective opacity integer latent and a common opacity codebook, and model each respecti
specially adapted for multi-view video sequence encoding · CPC title
Quantisation · CPC title
by compressing encoding parameters before transmission · CPC title
Position within a video image, e.g. region of interest [ROI] · CPC title
the region being a picture, frame or field · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.