Generating animated three-dimensional models from captured images

US11068698B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11068698-B2
Application numberUS-201916586758-A
CountryUS
Kind codeB2
Filing dateSep 27, 2019
Priority dateDec 7, 2017
Publication dateJul 20, 2021
Grant dateJul 20, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A three-dimensional model (e.g., motion capture model) of a user is generated from captured images or captured video of the user. A machine learning network may track poses and expressions of the user to generate and refine the three-dimensional model. Refinement of the three-dimensional model may provide more accurate tracking of the user's face. Refining of the three-dimensional model may include refining the determinations of poses and expressions at defined locations (e.g., eye corners and/or nose) in the three-dimensional model. The refining may occur in an iterative process. Tracking of the three-dimensional model over time (e.g., during video capture) may be used to generate an animated three-dimensional model (e.g., an animated puppet) of the user that simulates the user's poses and expressions.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: obtaining at least one image of a face of a user using a camera located on a device, the device comprising a computer processor, a memory, and a display; generating one or more first feature vectors from the at least one image, wherein the first feature vectors represent one or more facial features of the face in the at least one image; determining a pose of the face of the user and one or more muscle activations of the face in the at least one image based on the first feature vectors; generating a three-dimensional model of the user's face based on the pose and muscle activations of the face determined from the first feature vectors; defining one or more localized locations of interest on the three-dimensional model of the user's face; for each of the one or more localized locations of interest, generating one or more second feature vectors from the at least one image, wherein the second feature vectors are generated at locations in the at least one image that correspond to the localized locations of interest on the three-dimensional model of the user's face based on a projection of the three-dimensional model onto the at least one image; and refining, at least once, the generated three-dimensional model of the user's face by refining pose and muscle activations for the face using the second feature vectors. 2. The method of claim 1 , further comprising displaying a representation of the three-dimensional model on the display. 3. The method of claim 1 , further comprising: generating and refining three-dimensional models for a plurality of images of the face of the user obtained using the camera; generating an animated three-dimensional model of the user's face based on the refined three-dimensional models generated for the plurality of images; and displaying a representation of the animated three-dimensional model on the display. 4. The method of claim 3 , wherein displaying the representation of the animated three-dimensional model on the display includes displaying a simulation of motion of the user's face in the plurality of images. 5. The method of claim 1 , wherein determining the pose and muscle activations comprises performing regression on the feature vectors. 6. The method of claim 1 , wherein the three-dimensional model is projected onto the at least one image based on parameters of the camera. 7. The method of claim 1 , wherein the refining of the pose and muscle activations for the face using the second feature vectors is repeated a selected number of times. 8. The method of claim 1 , further comprising: assessing a registration loss in the at least one image; determining one or more identity parameters for the face in the at least one image, wherein the identity parameters minimize the assessed registration loss; and wherein the three-dimensional model of the face is generated based on the determined pose and muscle activations from the first feature vectors in combination with the determined identity parameters. 9. The method of claim 8 , further comprising refining the pose and muscle activations of the face determined from the first feature vectors by backpropagating the registration loss into the three-dimensional model. 10. A device, comprising: a camera; a display; and circuitry coupled to the camera and the display, wherein the circuitry is configured to: obtain a plurality of images of a face of a user using the camera; generate one or more first feature vectors from at least one image in the plurality of obtained images, wherein the first feature vectors represent one or more facial features of the face in the at least one image; determine a pose of the face of the user and one or more muscle activations of the face in the at least one image based on the first feature vectors; generate a three-dimensional model of the user's face based on the pose and muscle activations of the face determined from the first feature vectors; define one or more localized locations of interest on the three-dimensional model of the user's face; for each of the one or more localized locations of interest, generate one or more second feature vectors from the at least one image, wherein the second feature vectors are generated at locations in the at least one image that correspond to the localized locations of interest on the three-dimensional model of the user's face based on a projection of the three-dimensional model onto the at least one image; refine, at least once, the generated three-dimensional model of the user's face by refining pose and muscle activations for the face using the second feature vectors; and display a representation of the three-dimensional model on the display. 11. The device of claim 10 , wherein the circuitry is configured to: generate and refine a three-dimensional model for at least one additional image in the plurality of obtained images; generate an animated three-dimensional model of the face of the user based on the refined three-dimensional models generated for the at least one image and the at least one additional image; and display a representation of the animated three-dimensional model on the display. 12. The device of claim 11 , wherein the representation of the animated three-dimensional model displayed on the display includes a simulation of poses and facial movements of the user's face in the at least one image and the at least one additional image. 13. The device of claim 11 , wherein the representation of the animated three-dimensional model displayed on the display includes an animated puppet generated from the animated three-dimensional model of the user's face. 14. The device of claim 10 , wherein the circuitry is configured to obtain the plurality of images of the face of the user by capturing images from a video of the user's face obtained using the camera. 15. The device of claim 10 , wherein the circuitry is configured to display the representation of the three-dimensional model on the display in response to an input by the user on the device. 16. A non-transitory computer-readable medium having instructions stored thereon that are executable by a computing device to perform operations, comprising: obtaining at least one image of a face of a user using a camera located on the computing device; generating one or more first feature vectors from the at least one image, wherein the first feature vectors represent one or more facial features of the face in the at least one image; determining a pose of the face of the user and one or more muscle activations of the face in the at least one image based on the first feature vectors; generating a three-dimensional model of the user's face based on the pose and muscle activations of the face determined from the first feature vectors; defining one or more localized locations of interest on the three-dimensional model of the user's face; for each of the one or more localized locations of interest, generating one or more second feature vectors from the at least one image, wherein the second feature vectors are generated at locations in the at least one image that correspond to the localized locations of interest on the three-dimensional model of the user's face based on a projection of the three-dimensional model onto the at least one image; and refining, at least once, the generated three-dimensional model of the user's face by refining pose and muscle activations for the face using the second feature vectors. 17. The non-transitory computer-readable medium of claim 16 , further comprising displaying

Assignees

Inventors

Classifications

  • G06T17/20Primary

    Finite element generation, e.g. wire-frame surface description, {tesselation} · CPC title

  • G06V40/171Primary

    Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title

  • using comparisons between temporally consecutive images · CPC title

  • Dynamic expression · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11068698B2 cover?
A three-dimensional model (e.g., motion capture model) of a user is generated from captured images or captured video of the user. A machine learning network may track poses and expressions of the user to generate and refine the three-dimensional model. Refinement of the three-dimensional model may provide more accurate tracking of the user's face. Refining of the three-dimensional model may inc…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06T17/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 20 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).