3d human body pose estimation using a model trained from unlabeled multi-view data
US-2021248772-A1 · Aug 12, 2021 · US
US12400388B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12400388-B2 |
| Application number | US-202218089984-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 28, 2022 |
| Priority date | Dec 28, 2022 |
| Publication date | Aug 26, 2025 |
| Grant date | Aug 26, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Unsupervised volumetric 3D animation (UVA) of non-rigid deformable objects without annotations learns the 3D structure and dynamics of objects solely from single-view red/green/blue (RGB) videos and decomposes the single-view RGB videos into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable perspective-n-point (PnP) algorithm, the UVA model learns the underlying object 3D geometry and parts decomposition in an entirely unsupervised manner from still or video images. This allows the UVA model to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. The UVA model can obtain animatable 3D objects from a single or a few images. The UVA method also features a space in which all objects are represented in their canonical, animation-ready form. Applications include the creation of lenses from images or videos for social media applications.
Opening claim text (preview).
What is claimed is: 1. An unsupervised volumetric animation system for three-dimensional (3D) animation of a non-rigid deformable object, comprising: a canonical voxel generator to produce a 3D volumetric representation of the non-rigid deformable object in a canonical pose parameterized as a voxel grid, wherein the non-rigid deformable object is represented as a set of moving rigid parts, and assigns each 3D point of the non-rigid deformable object to a corresponding moving rigid part of the non-rigid deformable object; a two-dimensional (2D) keypoint predictor to estimate a pose, in a given image frame, of each moving rigid part of an input object to be animated; a volumetric skinning algorithm to map a canonical object volume of the non-rigid deformable object into a deformed volume that represents, as a deformed object, the input object to be animated with the pose in a current frame; and a volumetric renderer to render the deformed object as an image of the input object. 2. The system of claim 1 , wherein the input object to be animated is extracted from a video or a still image. 3. The system of claim 1 , wherein the 2D keypoint predictor uses a pose extracted from the input object to be animated to predict a set of 2D keypoints that correspond to 3D keypoints of the input object to be animated. 4. The system of claim 3 , wherein the volumetric renderer takes a deformed density and radiance of the deformed volume produced via volumetric skinning using a canonical density (V DENSITY ) of the non-rigid deformable object, a radiance of the non-rigid deformable object, a set of poses for different moving rigid parts of the input object to be animated, and moving rigid parts of the input object to be animated represented as linear blend skinning (LBS) weights. 5. The system of claim 4 , wherein the volumetric renderer volumetrically renders the deformed radiance to produce the image. 6. The system of claim 1 , wherein the 2D keypoint predictor estimates the pose of each moving rigid part by learning a set of 3D keypoints in a canonical space and comprises a 2D convolutional neural network that detects 2D projections of the moving rigid part to provide a set of corresponding 2D keypoints in a current frame. 7. The system of claim 6 , further comprising a perspective-n-point (PnP) algorithm that processes a differentiable PnP formulation to recover the pose of each moving rigid part from corresponding 2D keypoints and 3D keypoints. 8. The system of claim 7 , wherein the 2D keypoint predictor introduces N k learnable canonical 3D keypoints for each moving rigid part, shares 3D keypoints K p 3D of the moving rigid part among objects in a dataset, defines a 2D keypoints prediction network C that takes frame F i as input and outputs 2D keypoints K p 2D for each part p, where each 2D keypoint corresponds to its respective 3D keypoint, and recovers the pose of moving rigid part p as: T p - 1 = PnP ( K p 2 D , K p 3 D ) = PnP ( C ( F i ) , K p 3 D ) . 9. A method of providing three-dimensional (3D) animation of a non-rigid deformable object, comprising: producing, using a canonical voxel generator, a 3D volumetric representation of the non-rigid deformable object in a canonical pose parameterized as a voxel grid, wherein the non-rigid deformable object is represented as a set of moving rigid parts; assigning, by the canonical voxel generator, each 3D point of the non-rigid deformable object to a corresponding moving rigid part of the non-rigid deformable object; estimating, by a two-dimensional (2D) keypoint predictor, a pose, in a given image frame, of each moving rigid part of an input object to be animated; mapping, by a volumetric skinning algorithm, a canonical object volume of the non-rigid deformable object into a deformed volume that represents, as a deformed object, the input object to be animated with the pose in a current frame; and rendering, by a volumetric renderer, the deformed object as an image of the input object. 10. The method of claim 9 , further comprising extracting the input object to be animated from a video or a still image. 11. The method of claim 9 , wherein the assigning comprises learning, for each moving rigid part, a set of canonical 3D keypoints during training. 12. The method of claim 9 , further comprising using, by the 2D keypoint predictor, a pose extracted from the input object to be animated to predict a set of 2D keypoints that correspond to 3D keypoints of the input object to be animated. 13. The method of claim 12 , wherein the mapping comprises the volumetric renderer taking a deformed density and radiance of the deformed volume produced via volumetric skinning using a canonical density (V DENSITY ) of the non-rigid deformable object, a radiance of the non-rigid deformable object, a set of poses for different moving rigid parts of the input object to be animated, and moving rigid parts of the input object to be animated represented as linear blend skinning (LBS) weights. 14. The method of claim 13 , wherein the rendering comprises volumetrically rendering the deformed radiance by the volumetric renderer to produce the image. 15. The method of claim 9 , wherein estimating the pose of each moving rigid part comprises learning a set of 3D keypoints in a canonical space and detecting 2D projections of the moving rigid part to provide a set of corresponding 2D keypoints in a current frame using a 2D convolutional neural network. 16. The method of claim 15 , wherein the estimating the pose of each moving rigid part further comprises using, by a perspective-n-point (PnP) algorithm, a differentiable PnP formulation to recover the pose of each moving rigid part from corresponding 2D keypoints and 3D keypoints. 17. The method of claim 16 , further comprising introducing N k learnable canonical 3D keypoints for each moving rigid part, sharing 3D keypoints K p 3D of the moving rigid part among objects in a datase
Aligning objects, relative positioning of parts · CPC title
Shape modification · CPC title
Face · CPC title
Artificial neural networks [ANN] · CPC title
Training; Learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.