What technology area does this patent fall under?

Primary CPC classification G06T7/50. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

3D hand shape and pose estimation

US11468636B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11468636-B2
Application number	US-202117222176-A
Country	US
Kind code	B2
Filing date	Apr 5, 2021
Priority date	Dec 5, 2018
Publication date	Oct 11, 2022
Grant date	Oct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the present disclosure involve a system comprising a computer-readable storage medium storing a program and a method for receiving a monocular image that includes a depiction of a hand and extracting features of the monocular image using a plurality of machine learning techniques. The program and method further include modeling, based on the extracted features, a pose of the hand depicted in the monocular image by adjusting skeletal joint positions of a three-dimensional (3D) hand mesh using a trained graph convolutional neural network (CNN); modeling, based on the extracted features, a shape of the hand in the monocular image by adjusting blend shape values of the 3D hand mesh representing surface features of the hand depicted in the monocular image using the trained graph CNN; and generating, for display, the 3D hand mesh adjusted to model the pose and shape of the hand depicted in the monocular image.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: obtaining a first plurality of images that include respective representations of a hand; training a first machine learning technique based on a first feature of the first plurality of images; training a second machine learning technique based on a second feature of the first plurality of images separately from the first machine learning technique; training the first and second machine learning techniques together with a graph convolutional neural network (CNN) based on the first plurality of images; and based on the first and second machine learning techniques, continuously changing an appearance of a three-dimensional (3D) hand mesh by continuously capturing new monocular images of the hand in different positions, wherein the appearance of the 3D hand mesh changes to resemble the different positions of the hand as the hand changes from one position to another position. 2. The method of claim 1 , each representation of the hand comprising a synthetic representation of the hand, the synthetic representation of the hand comprising a graphical representation of the hand, further comprising: obtaining a second plurality of images that include real-world depictions of a hand and reference 3D depth maps; and generating a pseudo-ground truth mesh of each of the real-world depictions of the hand using the graph CNN that has been trained. 3. The method of claim 2 , further comprising training the first and second machine learning techniques together with the graph CNN based on the pseudo-ground truth mesh of each of the real-world depictions of the hand, the second plurality of images of the real-world depictions of a hand and reference 3D depth maps. 4. The method of claim 1 , further comprising: receiving a given monocular image that includes a depiction of a hand; modeling a pose of the hand depicted in the given monocular image by adjusting skeletal joint positions of the 3D hand mesh using the graph CNN, the graph CNN estimating 3D coordinates of vertices in the 3D hand mesh. 5. The method of claim 4 , further comprising: linearly regressing the joint positions using a linear graph CNN; and generating, for display, the 3D hand mesh adjusted to model the pose of the hand depicted in the given monocular image. 6. The method of claim 1 , further comprising: applying the first machine learning technique to a given monocular image to estimate a two-dimensional (2D) heat map of the hand in the given monocular image and to generate an image feature map; and encoding the 2D heat map and the image feature map using the second machine learning technique to generate a feature vector. 7. The method of claim 1 , wherein the first machine learning technique comprises a stacked hourglass network, and wherein the second machine learning technique comprises a residual network. 8. The method of claim 1 further comprising: modeling based on one or more extracted features of a given monocular image, a shape of the hand in the given monocular image by adjusting blend shape values of the 3D hand mesh representing surface features of the hand depicted in the given monocular image using the graph CNN. 9. The method of claim 1 , further comprising generating an image of the first plurality of images by: generating a 3D hand model by combining a plurality of hand joints with a plurality of surface textures; and combining the generated hand model with a background image. 10. The method of claim 9 , further comprising: randomly selecting a hand pose from a plurality of hand poses; adjusting the plurality of hand joints based on the selected hand pose; and adjusting the plurality of surface textures by applying random weights to blend shapes and ratios. 11. The method of claim 9 , wherein generating the 3D hand model comprises: obtaining a 3D hand model that includes a first level of coarseness having a first number of vertices; applying the graph CNN to the first level of coarseness; upsampling the 3D hand model to increase the level of coarseness to a second level of coarseness having a second number of vertices greater than the first number of vertices; generating a tree structure representing correspondences of vertices in the first and second levels of coarseness; and updating the graph CNN based on the upsampled 3D hand model and the generated tree structure. 12. The method of claim 1 , further comprising training the first machine learning technique based on a heat map loss function and training the second machine learning technique based on a 3D pose loss function, and wherein training the first and second machine learning techniques together with the graph CNN comprises training the first and second machine learning techniques together based on the heat map loss function, the 3D pose loss function, and a mesh loss function. 13. The method of claim 1 , further comprising: receiving a second plurality of images that include real-world depictions of a hand and reference 3D depth maps of the real-world depictions of the hand captured using a depth camera; generating a pseudo-ground truth mesh of the real-world depictions of the hand using the graph CNN; and training the first and second machine learning techniques and the graph CNN based on the generated pseudo-ground truth mesh, the real-world depictions of the hand, and the reference 3D depth maps of the real-world depictions of the hand. 14. A system comprising: a processor configured to perform operations comprising: obtaining a first plurality of images that include respective representations of a hand; training a first machine learning technique based on a first feature of the first plurality of images; training a second machine learning technique based on a second feature of the first plurality of images separately from the first machine learning technique; training the first and second machine learning techniques together with a graph convolutional neural network (CNN) based on the first plurality of images; and based on the first and second machine learning techniques, continuously changing an appearance of a three-dimensional (3D) hand mesh by continuously capturing new monocular images of the hand in different positions, wherein the appearance of the 3D hand mesh changes to resemble the different positions of the hand as the hand changes from one position to another position. 15. The system of claim 14 , the operations further comprising modeling based on one or more extracted features of a given monocular image, a shape of the hand in the given monocular image by adjusting blend shape values of a 3D hand mesh representing surface features of the hand depicted in the given monocular image using the graph CNN. 16. The system of claim 14 , wherein the operations further comprise: obtaining a second plurality of images that include real-world depictions of a hand and reference 3D depth maps; and generating a pseudo-ground truth mesh of each of the real-world depictions of the hand using the graph CNN that has been trained. 17. The system of claim 14 , wherein the first machine learning technique comprises a stacked hourglass network, and wherein the second machine learning technique comprises a residual network. 18. A non-transitory machine-readable storage medium that includes instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising: obtaining a first plurality of images that include respective representations of a hand; training a first machine learning techniq

Assignees

Snap Inc

Inventors

Classifications

G06T7/50Primary
Depth or shape recovery · CPC title
G06V40/107
Static hand or arm · CPC title
G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V10/82
using neural networks · CPC title
G06T17/20Primary
Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title

Patent family

Related publications grouped by family.

View patent family 70972046

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468636B2 cover?: Aspects of the present disclosure involve a system comprising a computer-readable storage medium storing a program and a method for receiving a monocular image that includes a depiction of a hand and extracting features of the monocular image using a plurality of machine learning techniques. The program and method further include modeling, based on the extracted features, a pose of the hand dep…
Who is the assignee on this patent?: Snap Inc
What technology area does this patent fall under?: Primary CPC classification G06T7/50. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).