Techniques for training a machine learning model to reconstruct different three-dimensional scenes

US12548258B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12548258-B2
Application numberUS-202318497938-A
CountryUS
Kind codeB2
Filing dateOct 30, 2023
Priority dateNov 15, 2022
Publication dateFeb 10, 2026
Grant dateFeb 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In various embodiments, a training application trains a machine learning model to generate three-dimensional (3D) representations of two-dimensional images. The training application maps a depth image and a viewpoint to signed distance function (SDF) values associated with 3D query points. The training application maps a red, blue, and green (RGB) image to radiance values associated with the 3DI query points. The training application computes a red, blue, green, and depth (RGBD) reconstruction loss based on at least the SDF values and the radiance values. The training application modifies at least one of a pre-trained geometry encoder, a pre-trained geometry decoder, an untrained texture encoder, or an untrained texture decoder based on the RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for training a machine learning model to generate three-dimensional representations of two-dimensional images, the method comprising: mapping a first depth image and a first viewpoint to a first plurality of signed distance function (SDF) values associated with a first plurality of three-dimensional (3D) query points; mapping a first red, blue, and green (RGB) image to a first plurality of radiance values associated with the first plurality of 3D query points; computing a first red, blue, green, and depth (RGBD) reconstruction loss based on at least the first plurality of SDF values and the first plurality of radiance values; and modifying at least one of a first pre-trained geometry encoder, a first pre-trained geometry decoder, a first untrained texture encoder, or a first untrained texture decoder based on the first RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images. 2 . The computer-implemented method of claim 1 , wherein computing the first RGBD reconstruction loss comprises rendering a first reconstructed RGBD image based on the first plurality of SDF values, the first plurality of radiance values, and the first viewpoint. 3 . The computer-implemented method of claim 1 , wherein computing the first RGBD reconstruction loss comprises computing at least one of a pixel-wise rendering loss or an approximated SDF loss. 4 . The computer-implemented method of claim 1 , wherein modifying at least one of the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder comprises replacing a first value for a first learnable parameter included in the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder with a second value. 5 . The computer-implemented method of claim 1 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values comprises projecting the first depth image into a world coordinate system based on the first viewpoint. 6 . The computer-implemented method of claim 1 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values comprises: determining a first plurality of 3D surface points based on the first depth image and the first viewpoint; and computing a first plurality of geometry feature vectors associated with the first plurality of 3D surface points. 7 . The computer-implemented method of claim 1 , wherein mapping the first RGB image to the first plurality of radiance values comprises: determining a first plurality of input vectors based on a first plurality of query points and a first texture surface representation generated by the first untrained texture encoder; and executing the first untrained texture decoder on the first plurality of input vectors. 8 . The computer-implemented method of claim 1 , further comprising: mapping a second depth image and a second viewpoint to a second plurality of SDF values associated with a second plurality of 3D query points; computing a geometric reconstruction loss based on at least the second plurality of SDF values; and modifying a first untrained geometry encoder and a first untrained geometry decoder based on the geometric reconstruction loss to generate the first pre-trained geometry encoder and the first pre-trained geometry decoder. 9 . The computer-implemented method of claim 8 , wherein the first depth image and the second depth image are associated with different scenes. 10 . The computer-implemented method of claim 1 , wherein the first viewpoint is specified by at least one of a rotation matrix, a 3D translation, or an intrinsic matrix associated with a camera. 11 . One or more non-transitory computer readable media including instructions that, when executed by one or more processors, cause the one or more processors to generate three-dimensional representations of two-dimensional images by performing the steps of: mapping a first depth image and a first viewpoint to a first plurality of signed distance function (SDF) values associated with a first plurality of three-dimensional (3D) query points; mapping a first red, blue, green (RGB) image to a first plurality of radiance values associated with the first plurality of 3D query points; computing a first red, blue, green, and depth (RGBD) reconstruction loss based on at least the first plurality of SDF values and the first plurality of radiance values; and modifying at least one of a first pre-trained geometry encoder, a first pre-trained geometry decoder, a first untrained texture encoder, or a first untrained texture decoder based on the first RGBD reconstruction loss to generate a trained machine learning model that generates 3D representations of RGBD images. 12 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first RGBD reconstruction loss comprises rendering a first reconstructed RGBD image based on the first plurality of SDF values, the first plurality of radiance values, and the first viewpoint. 13 . The one or more non-transitory computer readable media of claim 11 , wherein computing the first RGBD reconstruction loss comprises computing at least one of a pixel-wise rendering loss or an approximated SDF loss. 14 . The one or more non-transitory computer readable media of claim 11 , wherein modifying at least one of the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder comprises replacing a first value for a first learnable parameter included in the first pre-trained geometry encoder, the first pre-trained geometry decoder, the first untrained texture encoder, or the first untrained texture decoder with a second value. 15 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values comprises projecting the first depth image into a world coordinate system based on the first viewpoint. 16 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first depth image and the first viewpoint to the first plurality of SDF values further comprises: determining a first plurality of input vectors based on a first plurality of query points and a first geometric surface representation generated by the first pre-trained geometry encoder; and executing the first pre-trained geometry decoder on the first plurality of input vectors. 17 . The one or more non-transitory computer readable media of claim 11 , wherein mapping the first RGB image to the first plurality of radiance values comprises executing the first untrained texture encoder on the first RGB image to generate a plurality of texture feature vectors associated with a plurality of pixels included in the first RGB image. 18 . The one or more non-transitory computer readable media of claim 11 , further comprising: mapping the first depth image and the first viewpoint to a second plurality of SDF values associated with the first plurality of 3D query points; computing a geometric reconstruction loss based on at least the second plurality of SDF values; and modifying a first untrained geometry encoder and a first untrained geometry decoder based on the geometric reconstruction los

Assignees

Inventors

Classifications

  • G06T17/00Primary

    Three-dimensional [3D] modelling for computer graphics · CPC title

  • Depth or shape recovery · CPC title

  • G06T17/20Primary

    Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12548258B2 cover?
In various embodiments, a training application trains a machine learning model to generate three-dimensional (3D) representations of two-dimensional images. The training application maps a depth image and a viewpoint to signed distance function (SDF) values associated with 3D query points. The training application maps a red, blue, and green (RGB) image to radiance values associated with the 3D…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T17/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).