Techniques for fine-tuning a machine learning model to reconstruct a three-dimensional scene

US12548234B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12548234-B2
Application numberUS-202318497945-A
CountryUS
Kind codeB2
Filing dateOct 30, 2023
Priority dateNov 15, 2022
Publication dateFeb 10, 2026
Grant dateFeb 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model computes a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene. The scene reconstruction model maps the first 3D feature grid to a first 3D representation of the first scene. The scene reconstruction model computes a first reconstruction loss based on the first 3D representation and the set of RGBD images. The scene reconstruction model modifies at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for generating three-dimensional (3D) representations of scenes, the method comprising: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of the first scene; computing a first reconstruction loss based on the first 3D representation and the set of RGBD images; and modifying at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene. 2 . The computer-implemented method of claim 1 , wherein computing the first 3D feature grid comprises performing one or more spatial interpolation operations on a fused surface representation of the first scene. 3 . The computer-implemented method of claim 1 , wherein computing the first 3D feature grid comprises assigning a first geometry feature vector and a first texture feature vector to a first voxel to generate a first grid cell. 4 . The computer-implemented method of claim 1 , wherein mapping the first 3D feature grid comprises: aggregating a plurality of positional encodings associated with the first 3D feature grid and a plurality of geometry feature vectors included in the first 3D feature grid to generate a plurality of input vectors; and executing the first pre-trained geometry decoder on the plurality of input vectors to generate a plurality of signed distance function values. 5 . The computer-implemented method of claim 1 , wherein mapping the first 3D feature grid comprises: generating a plurality of texture input vectors based on a plurality of texture feature vectors included in the first 3D feature grid and a plurality of signed distance function values generated by the first pre-trained geometry decoder; and executing the first pre-trained texture decoder on the plurality of texture input vectors to generate a plurality of radiance values. 6 . The computer-implemented method of claim 1 , wherein computing the first reconstruction loss comprises rendering a first reconstructed RGBD image based on the first 3D representation and a first viewpoint associated with a first RGBD image included in the set of RGBD images. 7 . The computer-implemented method of claim 1 , wherein computing the first reconstruction loss comprises computing at least one of a pixel-wise rendering loss or an approximated signed distance function loss. 8 . The computer-implemented method of claim 1 , wherein modifying the at least one of the first 3D feature grid, the first trained geometry decoder, or the first trained texture decoder comprises replacing a first value for a first geometry feature vector included in the first 3D feature grid with a second value. 9 . The computer-implemented method of claim 1 , wherein modifying the at least one of the first 3D feature grid, the first pre-trained geometry decoder, or the first pre-trained texture decoder comprises replacing a first value for a first learnable parameter included in the first pre-trained geometry decoder or the first pre-trained texture decoder with a second value. 10 . The computer-implemented method of claim 1 , further comprising, prior to generating the second 3D representation, removing at least one of a first voxel, a first geometry feature vector associated with the first voxel, or a first texture feature vector associated with the first voxel from the first 3D feature grid based on a first signed distance function value associated with the first voxel. 11 . One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to generate three-dimensional (3D) representations of scenes by performing the steps of: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of the first scene; computing a first reconstruction loss based on the first 3D representation and the set of RGBD images; and modifying at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene. 12 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first 3D feature grid comprises performing one or more spatial interpolation operations on a fused surface representation of the first scene. 13 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first 3D feature grid comprises assigning a first geometry feature vector and a first texture feature vector to a first voxel to generate a first grid cell. 14 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first 3D feature grid comprises: aggregating a plurality of positional encodings associated with the first 3D feature grid and a plurality of geometry feature vectors included in the first 3D feature grid to generate a plurality of input vectors; and executing the first pre-trained geometry decoder on the plurality of input vectors to generate a plurality of signed distance function values. 15 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first 3D feature grid comprises: generating a plurality of texture input vectors based on a plurality of texture feature vectors included in the first 3D feature grid and a plurality of signed distance function values generated by the first pre-trained geometry decoder; and executing the first pre-trained texture decoder on the plurality of texture input vectors to generate a plurality of radiance values. 16 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first reconstruction loss comprises rendering a first reconstructed RGBD image based on the first 3D representation and a first viewpoint associated with a first RGBD image included in the set of RGBD images. 17 . The one or more non-transitory computer readable media of claim 16 , wherein the first viewpoint is specified by at least one of a rotation matrix, a 3D translation, or an intrinsic matrix associated with a camera. 18 . The one or more non-transitory computer readable media of claim 11 , wherein modifying the at least one of the first 3D feature grid, the first trained geometry decoder, or the first trained texture decoder comprises replacing a first value for a first texture feature vector included in the first 3D feature grid with a second value. 19 . The one or more non-transitory computer readable media of claim 11 , further comprising, prior to generating the second 3D representation, removing one or more voxels from the first 3D feature grid based on a plurality of signed distance function (SDF) values included in the first 3D representation and a threshold SDF value. 20 . A system comprising: one or more memories storing instructions; and one or more processors coupled to the one or more memories that, when executing the instructions, perform the steps of: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12548234B2 cover?
In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model computes a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene. The scene reconstruction model maps the first 3D feature grid to a first 3D representation of the first scene. The scene reconstr…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T7/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).