Body mesh reconstruction from RGB image

US12394154B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12394154-B2
Application numberUS-202318339780-A
CountryUS
Kind codeB2
Filing dateJun 22, 2023
Priority dateApr 13, 2023
Publication dateAug 19, 2025
Grant dateAug 19, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems are disclosed for generating a body mesh from a single image. The system predicts both a volumetric reconstruction tensor of the monocular image and a pose of an object by applying a first machine learning model to a monocular image. The system identifies a portion of the pose of the object that corresponds to a point in a canonical space associated with a set of position encoding information. The system obtains a point of the volumetric reconstruction tensor corresponding to the identified portion of the pose. The system classifies the obtained point as being inside or outside of a canonical volume by applying a second machine learning model to the obtained point of the volumetric reconstruction tensor together with the set of position encoding information. The system generates a three-dimensional (3D) mesh representing the object in the canonical space.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: accessing a monocular image depicting an object; predicting both a volumetric reconstruction tensor of the monocular image and a pose of the object by applying a first machine learning model to the monocular image; identifying a portion of the pose of the object that corresponds to a point in a canonical space associated with a set of position encoding information; obtaining a point of the volumetric reconstruction tensor corresponding to the identified portion of the pose; classifying the obtained point as being inside or outside of a canonical volume by applying a second machine learning model to the obtained point of the volumetric reconstruction tensor together with the set of position encoding information; and generating a three-dimensional (3D) mesh representing the object in the canonical space in response to classifying the obtained point as being inside or outside of the canonical volume. 2. The method of claim 1 , wherein the pose of the object comprises a skeletal representation of the object. 3. The method of claim 1 , further comprising: selecting the point in the canonical space associated with the set of position encoding information. 4. The method of claim 1 , further comprising: selecting a second portion of the pose of the object that corresponds to a second point in the canonical space associated with a second set of position encoding information; obtaining a second point of the volumetric reconstruction tensor corresponding to the second portion of the pose; and classifying the second point as being inside or outside of the canonical volume by applying the second machine learning model to the second point of the volumetric reconstruction tensor together with the second set of position encoding information. 5. The method of claim 4 , further comprising repeating selection of portions of the pose and classification of points of the volumetric reconstruction tensor corresponding to the selected portions of the pose as being inside or outside of the canonical volume for each portion of the canonical volume. 6. The method of claim 1 , wherein the set of position encoding information represents different body parts of the canonical volume. 7. The method of claim 1 , further comprising training the first and second machine learning models by performing training operations comprising: obtaining a set of training images depicting objects, each of the set of training images being associated with 3D scans of the objects depicted in the set of training images; and generating ground truth canonical space representations of the objects depicted in the set of training images by modifying poses of the 3D scans of the objects to correspond to a pose in the canonical space. 8. The method of claim 7 , wherein the training operations comprise: selecting a first training image depicting a first object corresponding to a first of the ground truth canonical space representations of the first object; predicting both an individual volumetric reconstruction tensor of the monocular image and an individual pose of the first object by applying the first machine learning model to the first training image; identifying an individual portion of the individual pose of the first object that corresponds to an individual point in the canonical space associated with an individual set of position encoding information; obtaining an individual point of the individual volumetric reconstruction tensor corresponding to the individual portion; classifying the individual point as being inside or outside of an individual canonical volume by applying the second machine learning model to the individual point of the volumetric reconstruction tensor together with the individual set of position encoding information; and generating occupancy loss based on a deviation between a classification resulting from classifying the individual point and a corresponding portion of the first ground truth canonical space representation of the object. 9. The method of claim 8 , further comprising updating one or more parameters of the first and second machine learning models based on the occupancy loss. 10. The method of claim 8 , further comprising: generating an individual 3D mesh representing the first object in the canonical space in response to classifying the individual point as being inside or outside of the canonical volume; posing the individual 3D mesh based on the individual pose of the first object predicted by the first machine learning model; and computing a posed loss based on a deviation between the posed individual 3D mesh and the 3D scan of the first object. 11. The method of claim 1 , further comprising: cropping a portion of the monocular image corresponding to a head of the object; generating, based on the cropped portion of the monocular image, facial landmarks of the head of the object; predicting a head-specific canonical space reconstruction by applying a head-specific canonical network to the cropped portion based on the generated facial landmarks; and combining the head-specific canonical space reconstruction with the canonical volume corresponding to the object. 12. The method of claim 11 , further comprising: selecting a facial landmark in a head-specific canonical space representation of the object; identifying a specific portion of the object depicted in the cropped portion of the monocular image that corresponds to the selected facial landmark; obtaining an individual point of the volumetric reconstruction tensor corresponding to the specific portion; classifying the obtained individual point as being inside or outside of the head-specific canonical space representation of the object by applying a third machine learning model to the obtained individual point; and generating the head-specific canonical space reconstruction in response to classifying the obtained individual point as being inside or outside of the head-specific canonical space representation. 13. The method of claim 12 , further comprising: predicting both an additional volumetric reconstruction tensor of the cropped portion of the monocular image and a head pose of the object by applying a fourth machine learning model to the cropped portion of the monocular image, wherein the individual point is obtained from the additional volumetric reconstruction tensor. 14. The method of claim 11 , further comprising: updating a pose of a head region in the canonical volume based on the facial landmarks, wherein the 3D mesh is generated using the canonical volume with the updated pose of the head region. 15. The method of claim 11 , wherein the canonical volume represents a full body canonical space reconstruction of the object, further comprising: identifying a region in which the full body canonical space reconstruction of the object overlaps with the head-specific canonical space reconstruction; and smoothly deforming the region based on differences between the full body canonical space reconstruction of the object and the head-specific canonical space reconstruction. 16. The method of claim 15 , further comprising generating the 3D mesh based on the full body canonical space reconstruction, the head-specific canonical space reconstruction, and the smoothly deformed region. 17. The method of claim 1 , further comprising generating an extended reality (XR) object based on the 3D mesh. 18. The method of claim 17 , further comprising animating the XR object in a view of a real-world environment captured in real time by a camer

Assignees

Inventors

Classifications

  • Mixed reality (object pose determination, tracking or camera calibration for mixed reality G06T7/00) · CPC title

  • Face · CPC title

  • Training; Learning · CPC title

  • Image cropping · CPC title

  • of characters, e.g. humans, animals or virtual beings · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12394154B2 cover?
Methods and systems are disclosed for generating a body mesh from a single image. The system predicts both a volumetric reconstruction tensor of the monocular image and a pose of an object by applying a first machine learning model to a monocular image. The system identifies a portion of the pose of the object that corresponds to a point in a canonical space associated with a set of position en…
Who is the assignee on this patent?
Snap Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/73. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 19 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).