Unsupervised volumetric animation

US12400388B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12400388-B2
Application numberUS-202218089984-A
CountryUS
Kind codeB2
Filing dateDec 28, 2022
Priority dateDec 28, 2022
Publication dateAug 26, 2025
Grant dateAug 26, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Unsupervised volumetric 3D animation (UVA) of non-rigid deformable objects without annotations learns the 3D structure and dynamics of objects solely from single-view red/green/blue (RGB) videos and decomposes the single-view RGB videos into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable perspective-n-point (PnP) algorithm, the UVA model learns the underlying object 3D geometry and parts decomposition in an entirely unsupervised manner from still or video images. This allows the UVA model to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. The UVA model can obtain animatable 3D objects from a single or a few images. The UVA method also features a space in which all objects are represented in their canonical, animation-ready form. Applications include the creation of lenses from images or videos for social media applications.

First claim

Opening claim text (preview).

What is claimed is: 1. An unsupervised volumetric animation system for three-dimensional (3D) animation of a non-rigid deformable object, comprising: a canonical voxel generator to produce a 3D volumetric representation of the non-rigid deformable object in a canonical pose parameterized as a voxel grid, wherein the non-rigid deformable object is represented as a set of moving rigid parts, and assigns each 3D point of the non-rigid deformable object to a corresponding moving rigid part of the non-rigid deformable object; a two-dimensional (2D) keypoint predictor to estimate a pose, in a given image frame, of each moving rigid part of an input object to be animated; a volumetric skinning algorithm to map a canonical object volume of the non-rigid deformable object into a deformed volume that represents, as a deformed object, the input object to be animated with the pose in a current frame; and a volumetric renderer to render the deformed object as an image of the input object. 2. The system of claim 1 , wherein the input object to be animated is extracted from a video or a still image. 3. The system of claim 1 , wherein the 2D keypoint predictor uses a pose extracted from the input object to be animated to predict a set of 2D keypoints that correspond to 3D keypoints of the input object to be animated. 4. The system of claim 3 , wherein the volumetric renderer takes a deformed density and radiance of the deformed volume produced via volumetric skinning using a canonical density (V DENSITY ) of the non-rigid deformable object, a radiance of the non-rigid deformable object, a set of poses for different moving rigid parts of the input object to be animated, and moving rigid parts of the input object to be animated represented as linear blend skinning (LBS) weights. 5. The system of claim 4 , wherein the volumetric renderer volumetrically renders the deformed radiance to produce the image. 6. The system of claim 1 , wherein the 2D keypoint predictor estimates the pose of each moving rigid part by learning a set of 3D keypoints in a canonical space and comprises a 2D convolutional neural network that detects 2D projections of the moving rigid part to provide a set of corresponding 2D keypoints in a current frame. 7. The system of claim 6 , further comprising a perspective-n-point (PnP) algorithm that processes a differentiable PnP formulation to recover the pose of each moving rigid part from corresponding 2D keypoints and 3D keypoints. 8. The system of claim 7 , wherein the 2D keypoint predictor introduces N k learnable canonical 3D keypoints for each moving rigid part, shares 3D keypoints K p 3D of the moving rigid part among objects in a dataset, defines a 2D keypoints prediction network C that takes frame F i as input and outputs 2D keypoints K p 2D for each part p, where each 2D keypoint corresponds to its respective 3D keypoint, and recovers the pose of moving rigid part p as: T p - 1 = PnP ⁡ ( K p 2 ⁢ D , K p 3 ⁢ D ) = PnP ⁡ ( C ⁡ ( F i ) , K p 3 ⁢ D ) . 9. A method of providing three-dimensional (3D) animation of a non-rigid deformable object, comprising: producing, using a canonical voxel generator, a 3D volumetric representation of the non-rigid deformable object in a canonical pose parameterized as a voxel grid, wherein the non-rigid deformable object is represented as a set of moving rigid parts; assigning, by the canonical voxel generator, each 3D point of the non-rigid deformable object to a corresponding moving rigid part of the non-rigid deformable object; estimating, by a two-dimensional (2D) keypoint predictor, a pose, in a given image frame, of each moving rigid part of an input object to be animated; mapping, by a volumetric skinning algorithm, a canonical object volume of the non-rigid deformable object into a deformed volume that represents, as a deformed object, the input object to be animated with the pose in a current frame; and rendering, by a volumetric renderer, the deformed object as an image of the input object. 10. The method of claim 9 , further comprising extracting the input object to be animated from a video or a still image. 11. The method of claim 9 , wherein the assigning comprises learning, for each moving rigid part, a set of canonical 3D keypoints during training. 12. The method of claim 9 , further comprising using, by the 2D keypoint predictor, a pose extracted from the input object to be animated to predict a set of 2D keypoints that correspond to 3D keypoints of the input object to be animated. 13. The method of claim 12 , wherein the mapping comprises the volumetric renderer taking a deformed density and radiance of the deformed volume produced via volumetric skinning using a canonical density (V DENSITY ) of the non-rigid deformable object, a radiance of the non-rigid deformable object, a set of poses for different moving rigid parts of the input object to be animated, and moving rigid parts of the input object to be animated represented as linear blend skinning (LBS) weights. 14. The method of claim 13 , wherein the rendering comprises volumetrically rendering the deformed radiance by the volumetric renderer to produce the image. 15. The method of claim 9 , wherein estimating the pose of each moving rigid part comprises learning a set of 3D keypoints in a canonical space and detecting 2D projections of the moving rigid part to provide a set of corresponding 2D keypoints in a current frame using a 2D convolutional neural network. 16. The method of claim 15 , wherein the estimating the pose of each moving rigid part further comprises using, by a perspective-n-point (PnP) algorithm, a differentiable PnP formulation to recover the pose of each moving rigid part from corresponding 2D keypoints and 3D keypoints. 17. The method of claim 16 , further comprising introducing N k learnable canonical 3D keypoints for each moving rigid part, sharing 3D keypoints K p 3D of the moving rigid part among objects in a datase

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12400388B2 cover?
Unsupervised volumetric 3D animation (UVA) of non-rigid deformable objects without annotations learns the 3D structure and dynamics of objects solely from single-view red/green/blue (RGB) videos and decomposes the single-view RGB videos into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable per…
Who is the assignee on this patent?
Chai Menglei, Lee Hsin Ying, Menapace Willi, and 6 more
What technology area does this patent fall under?
Primary CPC classification G06T17/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 26 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).