Generative nonlinear human shape models

US12249030B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12249030-B2
Application numberUS-202017922160-A
CountryUS
Kind codeB2
Filing dateApr 30, 2020
Priority dateApr 30, 2020
Publication dateMar 11, 2025
Grant dateMar 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a statistical, articulated 3D human shape modeling pipeline within a fully trainable, modular, deep learning framework. In particular, aspects of the present disclosure are directed to a machine-learned 3D human shape model with at least facial and body shape components that are jointly trained end-to-end on a set of training data. Joint training of the model components (e.g., including both facial, hands, and rest of body components) enables improved consistency of synthesis between the generated face and body shapes.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method to jointly train a machine-learned three-dimensional human shape model in an end-to-end pipeline, the method comprising, for one or more training iterations: obtaining, by a computing system comprising one or more computing devices, one or more ground truth registered shape scans of a training body, wherein the one or more ground truth registered shape scans of the training body comprise at least a ground truth registered full body scan with an arbitrary pose and a ground truth registered facial detail scan, and wherein the ground truth registered full body scan and the ground truth registered facial detail scan are separate from one another; encoding, by the computing system using a shape encoder model, an estimated registered full body scan with a resting pose to obtain a rest shape embedding associated with the training body; decoding, by the computing system using a shape decoder model, the rest shape embedding to obtain identity-based rest shape data for the training body; encoding, by the computing system using a facial encoder model, data derived from the ground truth registered facial detail scan to obtain a facial expression embedding associated with the training body; decoding, by the computing system using a facial decoder model, the facial expression embedding to obtain facial expression data for the training body; generating, by the computing system, a training posed mesh for the training body based at least in part on the identity-based rest shape data, the facial expression data, and a set of pose parameters that correspond to the arbitrary pose; evaluating a reconstructive loss function that compares the training posed mesh generated for the training body with the ground truth registered full body scan with the arbitrary pose and the ground truth registered facial detail scan, wherein the reconstructive loss function comprises a filter that indicates which vertices of the training posed mesh are compared to the ground truth registered full body scan with the arbitrary pose and which vertices of the training posed mesh are compared to the ground truth registered facial detail scan which is separate from the ground truth registered full body scan with the arbitrary pose; jointly training the shape encoder model, the shape decoder model, the facial encoder model, and the facial decoder model based at least in part on the reconstructive loss; and providing the machine-learned three-dimensional human shape model comprising at least the shape decoder model and the facial decoder model. 2. The computer-implemented method of claim 1 , wherein: generating, by the computing system, the training posed mesh for the training body comprises processing, by the computing system using a pose space deformation model, the set of pose parameters to generate pose-dependent shape adjustments for the training body; and the pose space deformation model is jointly trained with the shape encoder model, the shape decoder model, the facial encoder model, and the facial decoder model based at least in part on the reconstructive loss. 3. The computer-implemented method of claim 2 , wherein: generating, by the computing system, the training posed mesh for the training body comprises: processing, by the computing system using a joint centers prediction model, the identity-based rest shape data to generate a plurality of predicted joint centers for a plurality of joints of a skeleton representation of the training body; and processing, by the computing system using a blend skinning model, the facial expression data, the pose-dependent shape adjustments, the identity-based rest shape data, and the one or more predicted joint centers to generate the training posed mesh for the training body; and the joint centers prediction model and the blend skinning model are jointly trained with the shape encoder model, the shape decoder model, the facial encoder model, the facial decoder model, and the pose space deformation model based at least in part on the reconstructive loss. 4. The computer-implemented method of claim 3 , wherein the blend skinning model comprises a linear blend skinning model that has a plurality of learned weights respectively for the plurality of joints. 5. The computer-implemented method of claim 1 , wherein the one or more ground truth registered shape scans of the training body further comprise a ground truth registered hand detail scan, and wherein the reconstructive loss function evaluates a difference between the training posed mesh and the ground truth registered hand detail scan. 6. The computer-implemented method of claim 1 , wherein said jointly training comprises alternating between (1) estimation of the set of pose parameters and (2) updating parameters of the shape encoder model, the shape decoder model, the facial encoder model, and the facial decoder model with the set of pose parameters fixed. 7. The computer-implemented method of claim 1 , wherein the reconstructive loss function evaluates a per-vertex Euclidean distance error with one to one correspondences. 8. A computing system featuring a machine-learned three-dimensional human shape model with at least facial and body shape components jointly trained in an end-to-end pipeline, the computing system comprising: one or more processors; and one or more non-transitory computer-readable media that collectively store a machine-learned three-dimensional human shape model comprising: a machine-learned facial expression decoder model trained to process a facial expression embedding associated with a human body to generate facial expression data for the human body; a machine-learned pose space deformation model trained to process a set of pose parameters to generate pose-dependent shape adjustments for the human body; and a machine-learned shape decoder model trained to process a rest shape embedding associated with the human body to generate identity-based rest shape data for the human body; wherein the machine-learned three-dimensional human shape model has been trained to generate a posed mesh for the human body based at least in part on the facial expression data, the pose-dependent shape adjustments, and the identity-based rest shape data; wherein all of the machine-learned facial expression decoder model, the machine-learned pose space deformation model, and the machine-learned shape decoder model have been jointly trained end-to-end based at least in part on a reconstructive loss function that compares a training posed mesh generated by the machine-learned three-dimensional human shape model for a training body with one or more ground truth registered shape scans of the training body; wherein the one or more ground truth registered shape scans of the training body comprise a ground truth registered full body scan and a ground truth registered facial detail scan that is separate from the ground truth registered full body scan; and wherein the reconstructive loss function comprises a filter that indicates which vertices of the training posed mesh are compared to the ground truth registered full body scan with the arbitrary pose and which vertices of the training posed mesh are compared to the ground truth registered facial detail scan which is separate from the ground truth registered full body scan with the arbitrary pose. 9. The computing system of claim 8 , wherein the machine-learned three-dimensional human shape model further comprises: a machine-learned joint centers prediction model trained to process the identity-based rest shape data to generate a plurality of predicted joint centers for a plurality of joints of a skeleton representation of the human body; and a machine-learned blend skinning

Assignees

Inventors

Classifications

  • Shape modification · CPC title

  • Editing of three-dimensional [3D] images, e.g. changing shapes or colours, aligning objects or positioning parts · CPC title

  • Learning methods · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Training; Learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12249030B2 cover?
The present disclosure provides a statistical, articulated 3D human shape modeling pipeline within a fully trainable, modular, deep learning framework. In particular, aspects of the present disclosure are directed to a machine-learned 3D human shape model with at least facial and body shape components that are jointly trained end-to-end on a set of training data. Joint training of the model com…
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06T17/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).