Machine learning-based generation of three-dimensional representations

US2026030837A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2026030837-A1
Application numberUS-202418780747-A
CountryUS
Kind codeA1
Filing dateJul 23, 2024
Priority dateJul 23, 2024
Publication dateJan 29, 2026
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus comprises at least one processing device configured to extract a set of features from a user prompt using a natural language processing model, to initialize a three-dimensional scene reconstruction model utilizing a set of parameters determined based at least in part on the set of features extracted from the user prompt, and to generate, utilizing the three-dimensional scene reconstruction model, a set of two-dimensional images of a given scene from two or more different viewpoint perspectives. The at least one processing device is also configured to apply an image diffusion model to the generated set of two-dimensional images to generate a refined set of two-dimensional images, to modify the three-dimensional scene reconstruction model based at least in part on the refined set of two-dimensional images, and to utilize the modified three-dimensional scene reconstruction model to generate a three-dimensional representation of the given scene.

First claim

Opening claim text (preview).

What is claimed is: 1 . An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured: to extract a set of features from a user prompt using a natural language processing model; to initialize a three-dimensional scene reconstruction model utilizing a set of parameters determined based at least in part on the set of features extracted from the user prompt; to generate, utilizing the three-dimensional scene reconstruction model, a set of two-dimensional images of a given scene from two or more different viewpoint perspectives; to apply an image diffusion model to the generated set of two-dimensional images to generate a refined set of two-dimensional images; to modify the three-dimensional scene reconstruction model based at least in part on the refined set of two-dimensional images; and to utilize the modified three-dimensional scene reconstruction model to generate a three-dimensional representation of the given scene. 2 . The apparatus of claim 1 wherein the three-dimensional scene reconstruction model comprises a Neural Radiance Field (NeRF) model configured to take as input a three-dimensional position vector and a two-dimensional viewing direction and output a color and density at each of two or more points of the given scene. 3 . The apparatus of claim 2 wherein initializing the three-dimensional scene reconstruction model comprises initializing weights of a neural network that represents a neural radiance field. 4 . The apparatus of claim 1 wherein generating the set of two-dimensional images of the given scene from two or more different viewpoint perspectives comprises: selecting the two or more different viewpoint perspectives to capture a range of perspectives of the given scene; for each of the two or more different viewpoint perspectives, performing ray tracing through the given scene for a plurality of rays, where a color and density of each of the plurality of rays is computed using the three-dimensional scene reconstruction model; and synthesizing the set of two-dimensional images of the given scene using the plurality of rays. 5 . The apparatus of claim 1 wherein the image diffusion model comprises a denoising diffusion probabilistic model (DDPM). 6 . The apparatus of claim 1 wherein applying the image diffusion model to the generated set of two-dimensional images comprises applying a noise-reduction process to the generated set of two-dimensional images by: inputting the generated set of two-dimensional images to the image diffusion model; predicting noise added at each timestep based at least in part on an output of the image diffusion model; and removing the predicted noise from the generated set of two-dimensional images to generate the refined set of two-dimensional images. 7 . The apparatus of claim 1 wherein modifying the three-dimensional scene reconstruction model based at least in part on the refined set of two-dimensional images comprises: estimating probability densities for pixels of the refined set of two-dimensional images; and adjusting the set of parameters of the three-dimensional scene reconstruction model based at least in part on the estimated probability densities. 8 . The apparatus of claim 7 wherein estimating the probability densities for the pixels of the refined set of two-dimensional images utilizes a density estimation model that takes the refined set of two-dimensional images and the user prompt as input and computes probability density likelihoods of the pixels of the refined set of two-dimensional images. 9 . The apparatus of claim 7 wherein adjusting the set of parameters of the three-dimensional scene reconstruction model comprises utilizing a gradient descent algorithm that utilizes a loss function comprising a negative log-likelihood of the estimated probability densities for the pixels of the refined set of two-dimensional images. 10 . The apparatus of claim 1 wherein the user prompt comprises a natural language description of a design of a product, and wherein utilizing the modified three-dimensional scene reconstruction model to generate the three-dimensional representation of the given scene comprises generating a three-dimensional representation of a prototype of the product. 11 . The apparatus of claim 1 wherein the user prompt comprises a natural language description of a virtual showroom of one or more products, and wherein utilizing the modified three-dimensional scene reconstruction model to generate the three-dimensional representation of the given scene comprises generating a three-dimensional representation of the one or more products for the virtual showroom. 12 . The apparatus of claim 1 wherein the user prompt comprises a natural language description specifying one or more customizations of a product, and wherein utilizing the modified three-dimensional scene reconstruction model to generate the three-dimensional representation of the given scene comprises generating a three-dimensional representation of a customized version of the product based at least in part on the specified one or more customizations. 13 . The apparatus of claim 1 wherein the user prompt comprises a natural language description of one or more features of a product, and wherein utilizing the modified three-dimensional scene reconstruction model to generate the three-dimensional representation of the given scene comprises generating a three-dimensional representation of a training simulation for the one or more features of the product. 14 . The apparatus of claim 1 wherein the user prompt comprises a natural language description of a configuration of an information technology infrastructure environment, and utilizing the modified three-dimensional scene reconstruction model to generate the three-dimensional representation of the given scene comprises generating a three-dimensional representation of the configuration of the information technology infrastructure environment. 15 . A computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device: to extract a set of features from a user prompt using a natural language processing model; to initialize a three-dimensional scene reconstruction model utilizing a set of parameters determined based at least in part on the set of features extracted from the user prompt; to generate, utilizing the three-dimensional scene reconstruction model, a set of two-dimensional images of a given scene from two or more different viewpoint perspectives; to apply an image diffusion model to the generated set of two-dimensional images to generate a refined set of two-dimensional images; to modify the three-dimensional scene reconstruction model based at least in part on the refined set of two-dimensional images; and to utilize the modified three-dimensional scene reconstruction model to generate a three-dimensional representation of the given scene. 16 . The computer program product of claim 15 wherein the three-dimensional scene reconstruction model comprises a Neural Radiance Field (NeRF) model configured to take as input a three-dimensional position vector and a two-dimensional viewing direction and output a color and density at each of two or more points of the given scene. 17 . The computer program product of claim 15 wherein modifying the three-dimensio

Assignees

Inventors

Classifications

  • Range image; Depth image; 3D point clouds · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • G06T17/00Primary

    Three-dimensional [3D] modelling for computer graphics · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2026030837A1 cover?
An apparatus comprises at least one processing device configured to extract a set of features from a user prompt using a natural language processing model, to initialize a three-dimensional scene reconstruction model utilizing a set of parameters determined based at least in part on the set of features extracted from the user prompt, and to generate, utilizing the three-dimensional scene recons…
Who is the assignee on this patent?
Dell Products Lp
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 29 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).