What technology area does this patent fall under?

Primary CPC classification G06T7/40. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Techniques for fine-tuning a machine learning model to reconstruct a three-dimensional scene

US12548234B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12548234-B2
Application number	US-202318497945-A
Country	US
Kind code	B2
Filing date	Oct 30, 2023
Priority date	Nov 15, 2022
Publication date	Feb 10, 2026
Grant date	Feb 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model computes a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene. The scene reconstruction model maps the first 3D feature grid to a first 3D representation of the first scene. The scene reconstruction model computes a first reconstruction loss based on the first 3D representation and the set of RGBD images. The scene reconstruction model modifies at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for generating three-dimensional (3D) representations of scenes, the method comprising: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of the first scene; computing a first reconstruction loss based on the first 3D representation and the set of RGBD images; and modifying at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene. 2 . The computer-implemented method of claim 1 , wherein computing the first 3D feature grid comprises performing one or more spatial interpolation operations on a fused surface representation of the first scene. 3 . The computer-implemented method of claim 1 , wherein computing the first 3D feature grid comprises assigning a first geometry feature vector and a first texture feature vector to a first voxel to generate a first grid cell. 4 . The computer-implemented method of claim 1 , wherein mapping the first 3D feature grid comprises: aggregating a plurality of positional encodings associated with the first 3D feature grid and a plurality of geometry feature vectors included in the first 3D feature grid to generate a plurality of input vectors; and executing the first pre-trained geometry decoder on the plurality of input vectors to generate a plurality of signed distance function values. 5 . The computer-implemented method of claim 1 , wherein mapping the first 3D feature grid comprises: generating a plurality of texture input vectors based on a plurality of texture feature vectors included in the first 3D feature grid and a plurality of signed distance function values generated by the first pre-trained geometry decoder; and executing the first pre-trained texture decoder on the plurality of texture input vectors to generate a plurality of radiance values. 6 . The computer-implemented method of claim 1 , wherein computing the first reconstruction loss comprises rendering a first reconstructed RGBD image based on the first 3D representation and a first viewpoint associated with a first RGBD image included in the set of RGBD images. 7 . The computer-implemented method of claim 1 , wherein computing the first reconstruction loss comprises computing at least one of a pixel-wise rendering loss or an approximated signed distance function loss. 8 . The computer-implemented method of claim 1 , wherein modifying the at least one of the first 3D feature grid, the first trained geometry decoder, or the first trained texture decoder comprises replacing a first value for a first geometry feature vector included in the first 3D feature grid with a second value. 9 . The computer-implemented method of claim 1 , wherein modifying the at least one of the first 3D feature grid, the first pre-trained geometry decoder, or the first pre-trained texture decoder comprises replacing a first value for a first learnable parameter included in the first pre-trained geometry decoder or the first pre-trained texture decoder with a second value. 10 . The computer-implemented method of claim 1 , further comprising, prior to generating the second 3D representation, removing at least one of a first voxel, a first geometry feature vector associated with the first voxel, or a first texture feature vector associated with the first voxel from the first 3D feature grid based on a first signed distance function value associated with the first voxel. 11 . One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to generate three-dimensional (3D) representations of scenes by performing the steps of: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of the first scene; computing a first reconstruction loss based on the first 3D representation and the set of RGBD images; and modifying at least one of the first 3D feature grid, a first pre-trained geometry decoder, or a first pre-trained texture decoder based on the first reconstruction loss to generate a second 3D representation of the first scene. 12 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first 3D feature grid comprises performing one or more spatial interpolation operations on a fused surface representation of the first scene. 13 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first 3D feature grid comprises assigning a first geometry feature vector and a first texture feature vector to a first voxel to generate a first grid cell. 14 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first 3D feature grid comprises: aggregating a plurality of positional encodings associated with the first 3D feature grid and a plurality of geometry feature vectors included in the first 3D feature grid to generate a plurality of input vectors; and executing the first pre-trained geometry decoder on the plurality of input vectors to generate a plurality of signed distance function values. 15 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first 3D feature grid comprises: generating a plurality of texture input vectors based on a plurality of texture feature vectors included in the first 3D feature grid and a plurality of signed distance function values generated by the first pre-trained geometry decoder; and executing the first pre-trained texture decoder on the plurality of texture input vectors to generate a plurality of radiance values. 16 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first reconstruction loss comprises rendering a first reconstructed RGBD image based on the first 3D representation and a first viewpoint associated with a first RGBD image included in the set of RGBD images. 17 . The one or more non-transitory computer readable media of claim 16 , wherein the first viewpoint is specified by at least one of a rotation matrix, a 3D translation, or an intrinsic matrix associated with a camera. 18 . The one or more non-transitory computer readable media of claim 11 , wherein modifying the at least one of the first 3D feature grid, the first trained geometry decoder, or the first trained texture decoder comprises replacing a first value for a first texture feature vector included in the first 3D feature grid with a second value. 19 . The one or more non-transitory computer readable media of claim 11 , further comprising, prior to generating the second 3D representation, removing one or more voxels from the first 3D feature grid based on a plurality of signed distance function (SDF) values included in the first 3D representation and a threshold SDF value. 20 . A system comprising: one or more memories storing instructions; and one or more processors coupled to the one or more memories that, when executing the instructions, perform the steps of: computing a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene; mapping the first 3D feature grid to a first 3D representation of

Assignees

Nvidia Corp

Inventors

Classifications

G06T2207/10028
Range image; Depth image; 3D point clouds · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title
G06T2207/20081
Training; Learning · CPC title
G06T15/08
Volume rendering · CPC title
G06T2207/10024
Color image · CPC title

Patent family

Related publications grouped by family.

View patent family 91024022

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12548234B2 cover?: In various embodiments, a scene reconstruction model generates three-dimensional (3D) representations of scenes. The scene reconstruction model computes a first 3D feature grid based on a set of red, blue, green, and depth (RGBD) images associated with a first scene. The scene reconstruction model maps the first 3D feature grid to a first 3D representation of the first scene. The scene reconstr…
Who is the assignee on this patent?: Nvidia Corp
What technology area does this patent fall under?: Primary CPC classification G06T7/40. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).