Systems and Methods for Compression of Three-Dimensional Volumetric Representations
US-2023154051-A1 · May 18, 2023 · US
US12548234B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12548234-B2 |
| Application number | US-202318497945-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 30, 2023 |
| Priority date | Nov 15, 2022 |
| Publication date | Feb 10, 2026 |
| Grant date | Feb 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model computes a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene. The scene reconstruction model maps the first 3D feature grid to a first 3D representation of the first scene. The scene reconstruction model computes a first reconstruction loss based on the first 3D representation and the set of RGBD images. The scene reconstruction model modifies at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for generating three-dimensional (3D) representations of scenes, the method comprising: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of the first scene; computing a first reconstruction loss based on the first 3D representation and the set of RGBD images; and modifying at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene. 2 . The computer-implemented method of claim 1 , wherein computing the first 3D feature grid comprises performing one or more spatial interpolation operations on a fused surface representation of the first scene. 3 . The computer-implemented method of claim 1 , wherein computing the first 3D feature grid comprises assigning a first geometry feature vector and a first texture feature vector to a first voxel to generate a first grid cell. 4 . The computer-implemented method of claim 1 , wherein mapping the first 3D feature grid comprises: aggregating a plurality of positional encodings associated with the first 3D feature grid and a plurality of geometry feature vectors included in the first 3D feature grid to generate a plurality of input vectors; and executing the first pre-trained geometry decoder on the plurality of input vectors to generate a plurality of signed distance function values. 5 . The computer-implemented method of claim 1 , wherein mapping the first 3D feature grid comprises: generating a plurality of texture input vectors based on a plurality of texture feature vectors included in the first 3D feature grid and a plurality of signed distance function values generated by the first pre-trained geometry decoder; and executing the first pre-trained texture decoder on the plurality of texture input vectors to generate a plurality of radiance values. 6 . The computer-implemented method of claim 1 , wherein computing the first reconstruction loss comprises rendering a first reconstructed RGBD image based on the first 3D representation and a first viewpoint associated with a first RGBD image included in the set of RGBD images. 7 . The computer-implemented method of claim 1 , wherein computing the first reconstruction loss comprises computing at least one of a pixel-wise rendering loss or an approximated signed distance function loss. 8 . The computer-implemented method of claim 1 , wherein modifying the at least one of the first 3D feature grid, the first trained geometry decoder, or the first trained texture decoder comprises replacing a first value for a first geometry feature vector included in the first 3D feature grid with a second value. 9 . The computer-implemented method of claim 1 , wherein modifying the at least one of the first 3D feature grid, the first pre-trained geometry decoder, or the first pre-trained texture decoder comprises replacing a first value for a first learnable parameter included in the first pre-trained geometry decoder or the first pre-trained texture decoder with a second value. 10 . The computer-implemented method of claim 1 , further comprising, prior to generating the second 3D representation, removing at least one of a first voxel, a first geometry feature vector associated with the first voxel, or a first texture feature vector associated with the first voxel from the first 3D feature grid based on a first signed distance function value associated with the first voxel. 11 . One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to generate three-dimensional (3D) representations of scenes by performing the steps of: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of the first scene; computing a first reconstruction loss based on the first 3D representation and the set of RGBD images; and modifying at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene. 12 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first 3D feature grid comprises performing one or more spatial interpolation operations on a fused surface representation of the first scene. 13 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first 3D feature grid comprises assigning a first geometry feature vector and a first texture feature vector to a first voxel to generate a first grid cell. 14 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first 3D feature grid comprises: aggregating a plurality of positional encodings associated with the first 3D feature grid and a plurality of geometry feature vectors included in the first 3D feature grid to generate a plurality of input vectors; and executing the first pre-trained geometry decoder on the plurality of input vectors to generate a plurality of signed distance function values. 15 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first 3D feature grid comprises: generating a plurality of texture input vectors based on a plurality of texture feature vectors included in the first 3D feature grid and a plurality of signed distance function values generated by the first pre-trained geometry decoder; and executing the first pre-trained texture decoder on the plurality of texture input vectors to generate a plurality of radiance values. 16 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first reconstruction loss comprises rendering a first reconstructed RGBD image based on the first 3D representation and a first viewpoint associated with a first RGBD image included in the set of RGBD images. 17 . The one or more non-transitory computer readable media of claim 16 , wherein the first viewpoint is specified by at least one of a rotation matrix, a 3D translation, or an intrinsic matrix associated with a camera. 18 . The one or more non-transitory computer readable media of claim 11 , wherein modifying the at least one of the first 3D feature grid, the first trained geometry decoder, or the first trained texture decoder comprises replacing a first value for a first texture feature vector included in the first 3D feature grid with a second value. 19 . The one or more non-transitory computer readable media of claim 11 , further comprising, prior to generating the second 3D representation, removing one or more voxels from the first 3D feature grid based on a plurality of signed distance function (SDF) values included in the first 3D representation and a threshold SDF value. 20 . A system comprising: one or more memories storing instructions; and one or more processors coupled to the one or more memories that, when executing the instructions, perform the steps of: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of
Range image; Depth image; 3D point clouds · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Volume rendering · CPC title
Color image · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.