Systems and Methods for Compression of Three-Dimensional Volumetric Representations
US-2023154051-A1 · May 18, 2023 · US
US12548258B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12548258-B2 |
| Application number | US-202318497938-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 30, 2023 |
| Priority date | Nov 15, 2022 |
| Publication date | Feb 10, 2026 |
| Grant date | Feb 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In various embodiments, a training application trains a machine learning model to generate three-dimensional (3D) representations of two-dimensional images. The training application maps a depth image and a viewpoint to signed distance function (SDF) values associated with 3D query points. The training application maps a red, blue, and green (RGB) image to radiance values associated with the 3DI query points. The training application computes a red, blue, green, and depth (RGBD) reconstruction loss based on at least the SDF values and the radiance values. The training application modifies at least one of a pre-trained geometry encoder, a pre-trained geometry decoder, an untrained texture encoder, or an untrained texture decoder based on the RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for training a machine learning model to generate three-dimensional representations of two-dimensional images, the method comprising: mapping a first depth image and a first viewpoint to a first plurality of signed distance function (SDF) values associated with a first plurality of three-dimensional (3D) query points; mapping a first red, blue, and green (RGB) image to a first plurality of radiance values associated with the first plurality of 3D query points; computing a first red, blue, green, and depth (RGBD) reconstruction loss based on at least the first plurality of SDF values and the first plurality of radiance values; and modifying at least one of a first pre-trained geometry encoder, a first pre-trained geometry decoder, a first untrained texture encoder, or a first untrained texture decoder based on the first RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images. 2 . The computer-implemented method of claim 1 , wherein computing the first RGBD reconstruction loss comprises rendering a first reconstructed RGBD image based on the first plurality of SDF values, the first plurality of radiance values, and the first viewpoint. 3 . The computer-implemented method of claim 1 , wherein computing the first RGBD reconstruction loss comprises computing at least one of a pixel-wise rendering loss or an approximated SDF loss. 4 . The computer-implemented method of claim 1 , wherein modifying at least one of the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder comprises replacing a first value for a first learnable parameter included in the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder with a second value. 5 . The computer-implemented method of claim 1 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values comprises projecting the first depth image into a world coordinate system based on the first viewpoint. 6 . The computer-implemented method of claim 1 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values comprises: determining a first plurality of 3D surface points based on the first depth image and the first viewpoint; and computing a first plurality of geometry feature vectors associated with the first plurality of 3D surface points. 7 . The computer-implemented method of claim 1 , wherein mapping the first RGB image to the first plurality of radiance values comprises: determining a first plurality of input vectors based on a first plurality of query points and a first texture surface representation generated by the first untrained texture encoder; and executing the first untrained texture decoder on the first plurality of input vectors. 8 . The computer-implemented method of claim 1 , further comprising: mapping a second depth image and a second viewpoint to a second plurality of SDF values associated with a second plurality of 3D query points; computing a geometric reconstruction loss based on at least the second plurality of SDF values; and modifying a first untrained geometry encoder and a first untrained geometry decoder based on the geometric reconstruction loss to generate the first pre-trained geometry encoder and the first pre-trained geometry decoder. 9 . The computer-implemented method of claim 8 , wherein the first depth image and the second depth image are associated with different scenes. 10 . The computer-implemented method of claim 1 , wherein the first viewpoint is specified by at least one of a rotation matrix, a 3D translation, or an intrinsic matrix associated with a camera. 11 . One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to generate three-dimensional representations of two-dimensional images by performing the steps of: mapping a first depth image and a first viewpoint to a first plurality of signed distance function (SDF) values associated with a first plurality of three-dimensional (3D) query points; mapping a first red, blue, green (RGB) image to a first plurality of radiance values associated with the first plurality of 3D query points; computing a first red, blue, green, and depth (RGBD) reconstruction loss based on at least the first plurality of SDF values and the first plurality of radiance values; and modifying at least one of a first pre-trained geometry encoder, a first pre-trained geometry decoder, a first untrained texture encoder, or a first untrained texture decoder based on the first RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images. 12 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first RGBD reconstruction loss comprises rendering a first reconstructed RGBD image based on the first plurality of SDF values, the first plurality of radiance values, and the first viewpoint. 13 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first RGBD reconstruction loss comprises computing at least one of a pixel-wise rendering loss or an approximated SDF loss. 14 . The one or more non-transitory computer readable media of claim 11 , wherein modifying at least one of the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder comprises replacing a first value for a first learnable parameter included in the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder with a second value. 15 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values comprises projecting the first depth image into a world coordinate system based on the first viewpoint. 16 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values further comprises: determining a first plurality of input vectors based on a first plurality of query points and a first geometric surface representation generated by the first pre-trained geometry encoder; and executing the first pre-trained geometry decoder on the first plurality of input vectors. 17 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first RGB image to the first plurality of radiance values comprises executing the first untrained texture encoder on the first RGB image to generate a plurality of texture feature vectors associated with a plurality of pixels included in the first RGB image. 18 . The one or more non-transitory computer readable media of claim 11 , further comprising: mapping the first depth image and the first viewpoint to a second plurality of SDF values associated with the first plurality of 3D query points; computing a geometric reconstruction loss based on at least the second plurality of SDF values; and modifying a first untrained geometry encoder and a first untrained geometry decoder based on the geometric reconstruction los
Related publications grouped by family.
Answers are generated from the same data shown on this page.