Synthesizing three-dimensional shapes using latent diffusion models in content generation systems and applications

US2024005604A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2024005604-A1
Application numberUS-202318320716-A
CountryUS
Kind codeA1
Filing dateMay 19, 2023
Priority dateMay 19, 2022
Publication dateJan 4, 2024
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Approaches presented herein provide for the unconditional generation of novel three dimensional (3D) object shape representations, such as point clouds or meshes. In at least one embodiment, a first denoising diffusion model (DDM) can be trained to synthesize a 1D shape latent from Gaussian noise, and a second DDM can be trained to generate a set of latent points conditioned on this 1D shape latent. The shape latent and set of latent points can be provided to a decoder to generate a 3D point cloud representative of a random object from among the object classes on which the models were trained. A surface reconstruction process may be used to generate a surface mesh from this generated point cloud. Such an approach can scale to complex and/or multimodal distributions, and can be highly flexible as it can be adapted to various tasks such as multimodal voxel- or text-guided synthesis.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method, comprising: generating, using a first generative diffusion model, a shape latent representing a shape of a three-dimensional object; generating, using a second generative diffusion model and the shape latent, a set of latent points representative of latent features of the three-dimensional object; providing the shape latent and the set of latent points as input to a decoder network; and receiving, from the decoder network, a point cloud comprising a set of points representative of the three-dimensional object. 2 . The computer-implemented method of claim 1 , further comprising: generating, using the point cloud, a three-dimensional mesh for use in rendering a two-dimensional image of the three-dimensional object. 3 . The computer-implemented method of claim 1 , further comprising: providing the shape latent as a conditioning input to the second generative diffusion model. 4 . The computer-implemented method of claim 1 , further comprising: providing Gaussian noise as input to the first generative diffusion model. 5 . The computer-implemented method of claim 1 , wherein the shape latent is a one-dimensional, vector-valued global shape latent. 6 . The computer-implemented method of claim 1 , further comprising: training the first diffusion network using a set of shape latents of a first latent space generated using a hierarchical variational autoencoder (VAE) trained to generate shape latents from a set of input point clouds. 7 . The computer-implemented method of claim 6 , wherein the hierarchical variational autoencoder (VAE) is further trained to generate latent point clouds from the set of input point clouds, the method further comprising: training the second diffusion network using a set of latent point clouds of a second latent space generated using the hierarchical variational autoencoder (VAE). 8 . The computer-implemented method of claim 1 , wherein the three-dimensional object is determined unconditionally from one of a set of object classes on which at least the first diffusion network was trained. 9 . The computer-implemented method of claim 1 , further comprising: providing, as input to an encoder, a voxel-based representation of the three-dimensional object in order to condition at least the first diffusion network to generate the shape latent approximating the voxel-based representation. 10 . The computer-implemented method of claim 1 , further comprising: providing, as input to an encoder, a noisy input shape in order to condition at least the first diffusion network to generate the shape latent approximating the noisy input shape. 11 . The computer-implemented method of claim 1 , further comprising: providing, as input to the first diffusion network, a text encoding in order to condition at least the first diffusion network to generate the shape latent based in part on text used to generate the text encoding. 12 . The computer-implemented method of claim 1 , further comprising: manipulating one or more of the shape latent or the latent point cloud in order to modify the point cloud to be received from the decoder. 13 . A processor, comprising: one or more circuits to: generate, using a first generative diffusion model, a shape latent representing a shape of a three-dimensional object; generate, using a second generative diffusion model and the shape latent, a latent point cloud representative of latent features of the three-dimensional object; provide the shape latent and the latent point cloud as input to a decoder network; and receive, from the decoder network, a point cloud comprising a set of points representative of the three-dimensional object. 14 . The processor of claim 13 , wherein the one or more circuits are further to provide the shape latent as a conditioning input to the second generative diffusion model, wherein the shape latent is a one-dimensional, vector-valued global shape latent. 15 . The processor of claim 13 , wherein the one or more circuits are further to train the first diffusion network using a set of shape latents of a first latent space, and to train the second diffusion network using a set of latent point clouds of a second latent space, generated using a hierarchical variational autoencoder (VAE). 16 . The processor of claim 13 , wherein the processor is comprised in at least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing digital twin operations; a system for performing light transport simulation; a system for rendering graphical output; a system for performing deep learning operations; a system implemented using an edge device; a system for generating or presenting virtual reality (VR) content; a system for generating or presenting augmented reality (AR) content; a system for generating or presenting mixed reality (MR) content; a system incorporating one or more Virtual Machines (VMs); a system implemented at least partially in a data center; a system for performing hardware testing using simulation; a system for performing generative content operations using a language model; a system for synthetic data generation; a system for performing generative AI operations using a large language model (LLM), a collaborative content creation platform for 3D assets; or a system implemented at least partially using cloud computing resources. 17 . A system, comprising: one or more processors to generate a point cloud representing a random three-dimensional object from a set of object classes, the point cloud generated using a shape latent determined using a first generative diffusion network and a set of latent points determined using a second generative diffusion network. 18 . The system of claim 17 , wherein the one or more processors are further to provide the shape latent as a conditioning input to the second generative diffusion model, wherein the shape latent is a one-dimensional, vector-valued global shape latent. 19 . The system of claim 17 , wherein the one or more processors are further to train the first diffusion network using a set of shape latents of a first latent space, and to train the second diffusion network using a set of latent point clouds of a second latent space, generated using a hierarchical variational autoencoder (VAE). 20 . The system of claim 17 , wherein the system comprises at least one of: a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing digital twin operations; a system for performing light transport simulation; a system for rendering graphical output; a system for performing deep learning operations; a system for performing generative AI operations using a large language model (LLM), a system implemented using an edge device; a system for generating or presenting virtual reality (VR) content; a system for generating or presenting augmented reality (AR) content; a system for generating or presenting mixed reality (MR) content; a system incorporating one or more Virtual Machines (VMs); a system implemented at least partially in a data center; a system for performing hardware testing using simulation; a system for performing generative content operations using a language model; a system for synthetic data generation; a col

Assignees

Inventors

Classifications

  • G06T17/20Primary

    Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title

  • Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components · CPC title

  • G06T19/20Primary

    Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title

  • Particle system, point based geometry or rendering · CPC title

  • Shape modification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2024005604A1 cover?
Approaches presented herein provide for the unconditional generation of novel three dimensional (3D) object shape representations, such as point clouds or meshes. In at least one embodiment, a first denoising diffusion model (DDM) can be trained to synthesize a 1D shape latent from Gaussian noise, and a second DDM can be trained to generate a set of latent points conditioned on this 1D shape la…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06T17/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 04 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).