What technology area does this patent fall under?

Primary CPC classification G06T15/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Autodecoding latent 3D diffusion models

Patent metadata
Field	Value
Publication number	US-12494013-B2
Application number	US-202318211149-A
Country	US
Kind code	B2
Filing date	Jun 16, 2023
Priority date	Jun 16, 2023
Publication date	Dec 9, 2025
Grant date	Dec 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating static and articulated 3D assets are provided that include a 3D autodecoder at their core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. The appropriate intermediate volumetric latent space is then identified and robust normalization and de-normalization operations are implemented to learn a 3D diffusion from 2D images or monocular videos of rigid or articulated objects. The methods are flexible enough to use either existing camera supervision or no camera information at all—instead efficiently learning the camera information during training. The generated results are shown to outperform state-of-the-art alternatives on various benchmark datasets and metrics, including multi-view image datasets of synthetic objects, real in-the-wild videos of moving people, and a large-scale, real video dataset of static objects.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of training a three-dimensional (3D) diffusion model to embed properties from two-dimensional (2D) images learned from a target dataset in a latent space using an autodecoder, comprising: processing embedding vectors of an autodecoder (G) comprising a library of embedding vectors corresponding to objects in a training dataset to generate a latent 3D feature volume; decoding, by the autodecoder, the latent 3D feature volume into a 3D voxel grid for density and radiance representative of an object's shape and appearance; splitting the autodecoder into a first part G 1 and a second part G 2 ; t,? normalizing features before using features F from the latent 3D feature volume for diffusion by the 3D diffusion model, where median m is a center of distribution of the latent 3D feature volume and a Normalized InterQuartile Range (IQR) approximates a scale of the latent 3D feature volume: training, using the autodecoder, the 3D diffusion model operating in a 3D latent space obtained from the first part G 1 using volumetric rendering of the 3D voxel grid with two-dimensional (2D) reconstruction supervision from training images in a training dataset to extract structure and appearance properties from the training dataset; and generating, using the second part G 2 and the structure and appearance properties extracted from the training dataset, a 3D representation of the object. 2 . The method of claim 1 , further comprising progressively upsampling the latent 3D feature volume before decoding the upsampled latent 3D feature volume into the 3D voxel grid. 3 . The method of claim 1 , further comprising, during inference, denormalizing the features F from the structure and appearance properties extracted from the training dataset by the second part G 2 as F×IQR+m prior to generating the 3D representation of the object. 4 . The method of claim 1 , further comprising learning the embedding vectors by the autodecoder. 5 . The method of claim 1 , wherein decoding by the autodecoder comprises providing at least four residual blocks at each resolution in the autodecoder and using self-attention layers in a second level of resolution 8 3 and in a third level of resolution 16 3 of the autodecoder. 6 . The method of claim 1 , wherein the object is in a canonical pose and training the 3D voxel grid comprises training the 3D voxel grid using ground truth poses, poses estimated using structure from motion, or poses learned from the training dataset during training. 7 . The method of claim 6 , wherein the canonical pose comprises a canonical voxel representation of a density grid that is a discrete representation of a density field and a canonical representation of a red, green, blue (RGB) radiance field, further comprising tri-linearly interpolating density values and RGB values from the 3D voxel grid after decoding. 8 . The method of claim 1 , further comprising removing a background of the training images in the training dataset prior to training the 3D diffusion model. 9 . The method of claim 1 , wherein the object is an articulated non-rigid object, further comprising modeling a shape of the object and local motion from dynamic poses as well as a corresponding non-rigid deformation of a local region. 10 . The method of claim 9 , further comprising estimating, using a differentiable Perspective-n-Point algorithm, camera poses for each component of the non-rigid object and progressively refining estimated camera poses during training using a combination of learned 3D keypoints for each component of the non-rigid object and corresponding 2D projections predicted in each image, and combining the components with plausible deformations using a learned volumetric linear blend skinning (LBS) algorithm having skinning weights for each component of the non-rigid object that are estimated during training of the 3D diffusion model. 11 . The method of claim 1 , further comprising representing each object in the training dataset by an embedding vector comprising a concatenation of smaller embedding vectors, wherein representing each object comprises using a deterministic mapping from each training object index to its corresponding concatenated embedding vector using a hashing function where for object index k, the corresponding embedding index is: m ⁡ ( k ) = [ ( a · k ) ⁢ mod ⁢   2 w ] ≫ ( w - r ) , for a table having 2 ′ entries where w and a are heuristic hashing parameters used to reduce a number of collisions while maintaining an appropriate table size. 12 . The method of claim 1 , further comprising decomposing a target non-rigid object into regions, where each region contains 3D keypoints and corresponding 2D projections per image, that are shared across all non-rigid objects and aligning the non-rigid objects in a learned canonical space to allow for motion transfer between the non-rigid objects. 13 . The method of claim 1 , wherein the training comprises extracting a text description of an object in the training dataset by providing a hint and a first view of the object along with a question requesting a description of a shape and color of the object for use in an inference stage to identify the object. 14 . A system that embeds properties learned from a target dataset in a latent space into a volumetric representation of an object for rendering, comprising: a volumetric autodecoder (G) that learns embedding vectors of a library of embedding vectors corresponding to objects in a training dataset to generate a latent 3D feature volume and that decodes the latent 3D feature volume into a 3D voxel grid for density and radiance representative of an object's shape and appearance, the autodecoder comprising a first part G 1 and a second part G 2 ; and a 3 D diffusion model that is trained on a latent representation by the volumetric autodecoder, the 3D diffusion model operating in a 3D latent space obtained from the first part G 1 using volumetric rendering of the voxel grid with two-dimensional (2D) reconstruction supervision from training images in a training dataset to extract structure and appearance properties from the training dataset, wherein the volumetric autodecoder normalizes features F ^ = ( F - m

Assignees

Snap Inc

Inventors

Classifications

G06T2219/2021
Shape modification · CPC title
G06T2219/2016
Rotation, translation, scaling · CPC title
G06T2210/21
Collision detection, intersection · CPC title
G06T19/20
Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title
G06T15/20
Perspective computation · CPC title

Patent family

Related publications grouped by family.

View patent family 91664753

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12494013B2 cover?: Systems and methods for generating static and articulated 3D assets are provided that include a 3D autodecoder at their core. The 3D autodecoder framework embeds properties learned from the target dataset in the latent space, which can then be decoded into a volumetric representation for rendering view-consistent appearance and geometry. The appropriate intermediate volumetric latent space is t…
Who is the assignee on this patent?: Snap Inc
What technology area does this patent fall under?: Primary CPC classification G06T15/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Neural Radiance Field Generative Modeling of Object Classes from Single Two-Dimensional Views

Synthetic data generation using morphable models with identity and expression embeddings

Geometry-Free Neural Scene Representations Through Novel-View Synthesis

Learning Articulated Shape Reconstruction from Imagery

Deformable neural radiance fields

Figure-Ground Neural Radiance Fields For Three-Dimensional Object Category Modelling

Detection of prostate cancer in multi-parametric mri using random forest with instance weighting & mr prostate segmentation by deep learning with holistically-nested networks

Frequently asked questions