Method and system for implementing three-dimensional facial modeling and visual speech synthesis

US11145100B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11145100-B2
Application numberUS-201816477591-A
CountryUS
Kind codeB2
Filing dateJan 12, 2018
Priority dateJan 12, 2017
Publication dateOct 12, 2021
Grant dateOct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Novel tools and techniques are provided for implementing three-dimensional facial modeling and visual speech synthesis. In various embodiments, a computing system might determine an orientation, size, and location of a face in a received input image; retrieve a three-dimensional model template comprising a face and head; project the input image onto the model template to generate a three-dimensional model; define, on the model, a polygon mesh in a region of facial feature corresponding to feature in the input image; adjust parameters on the model; and display the model. The computing system might parse a text string into allophonic units; encode each allophonic unit into a point(s) in linguistic space corresponding to mouth movements; retrieve, from a codebook, indexed images/morphs corresponding to encoded points in the linguistic space; render the indexed images/morphs into an animation of the three-dimensional model; synchronize, for output, the animation with audio representations of the text string.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, with a computing system, an input image comprising a face; determining, with the computing system, an orientation, a size, and a location of the face in the input image; retrieving, with the computing system, a three-dimensional model template comprising a face and a head; projecting, with the computing system, the input image onto the three-dimensional model template comprising the face and the head to generate a three-dimensional model corresponding to the input image; defining, with the computing system and on the three-dimensional model, a polygon mesh in a region of at least one facial feature; adjusting, with the computing system, parameters on the three-dimensional model, the region of the at least one facial feature corresponding to at least one facial feature in the input image, wherein scaling the three-dimensional model comprises: defining, with the computing system, a first box to frame the face in the input image; defining, with the computing system, a second box to frame the face of the three-dimensional model template; and scaling, with the computing system, the second box of the three-dimensional model template by performing at least one of scaling to fit the first box of the input image or scaling so that the second box is centered on the first box in the input image; and displaying, with the computing system, the three-dimensional model with the face of the input image projected onto the three-dimensional model. 2. The method of claim 1 , wherein the computing system comprises at least one of a client computer, a host computer, a user device, a server computer over a network, a cloud-based computing system, or a distributed computing system. 3. The method of claim 1 , wherein the input image is captured with an image sensor of a device. 4. The method of claim 1 , wherein the input image is at least one of a photograph or a drawing. 5. The method of claim 1 , wherein rotating the three-dimensional model comprises: determining, with the computing system, an eye alignment on the face of the input image; and rotating, with the computing system, the three-dimensional model template to align eyes of the three-dimensional model with the eyes of the input image. 6. The method of claim 1 , wherein the three-dimensional model template comprises at least one facial feature, the method further comprising: determining, with the computing system, at least one facial feature on the face of the input image; determining, with the computing system, an orientation, a size, and a location of the at least one facial feature on the face in the input image; rotating, with the computing system, the three-dimensional model template to orient the at least one facial feature to the corresponding at least one facial feature in the input image; scaling, with the computing system, the three-dimensional model template to match the size of the at least one facial feature to the corresponding at least one facial feature in the input image; translating, with the computing system, the three-dimensional model template to match the location of the at least one facial feature to the corresponding at least one facial feature in the input image; and projecting, with the computing system, the at least one facial feature in the input image onto the corresponding at least one facial feature of the three-dimensional model template to represent the at least one facial feature on the three-dimensional model. 7. The method of claim 6 , wherein the at least one facial feature of the input image comprises at least one of an eye, lip, eyebrow, nose, cheek, ear, forehead, chin, or neck, and wherein the at least one facial feature of the three-dimensional model comprises at least one of an eye, lip, eyebrow, nose, cheek, ear, forehead, chin, or neck. 8. The method of claim 1 , further comprising: determining, with the computing system, a perspective of an input image; and applying, with the computing system, a perspective deformation to the three-dimensional model template. 9. The method of claim 1 , wherein the display of the three-dimensional model is capable of being rotated in any direction. 10. The method of claim 1 , further comprising: rotating, with the computing system, the three-dimensional model template comprising the face and the head to match the orientation of the face in the input image; scaling, with the computing system, the three-dimensional model template comprising the face and the head to match the size of the face in the input image; and translating, with the computing system, the three-dimensional model template comprising the face and the head to match the location of the face in the input image. 11. A device, comprising: a display; one or more processors in communication with an image sensor, an accelerometer, and the display; and a non-transitory computer readable medium in communication with the one or more processors, the non-transitory computer readable medium having encoded thereon a set of instructions executable by the one or more processors to cause the device to: receive an input image comprising a face; determine an orientation, a size, and a location of the face in the input image; retrieve a three-dimensional model template comprising a face and a head; project the input image onto the three-dimensional model template comprising the face and the head to generate a three-dimensional model corresponding to the input image; define, on the three-dimensional model, a polygon mesh in a region of at least one facial feature; adjust parameters on the three-dimensional model, the region of the at least one facial feature corresponding to at least one facial feature in the input image, wherein scaling the three-dimensional model comprises: defining a first box to frame the face in the input image; defining a second box to frame the face of the three-dimensional model template; and scaling the second box of the three-dimensional model template by performing at least one of scaling to fit the first box of the input image or scaling so that the second box is centered on the first box in the input image; and display the three-dimensional model with the face of the input image projected onto the three-dimensional model. 12. An apparatus, comprising: one or more processors; and a non-transitory computer readable medium having encoded thereon a set of instructions executable by the one or more processors to cause the apparatus to: receive an input image comprising a face; determine an orientation, a size, and a location of the face in the input image; retrieve a three-dimensional model template comprising a face and a head; project the input image onto the three-dimensional model template comprising the face and the head to generate a three-dimensional model corresponding to the input image; define, on the three-dimensional model, a polygon mesh in a region of at least one facial feature; adjust parameters on the three-dimensional model, the region of the at least one facial feature corresponding to at least one facial feature in the input image, wherein scaling the three-dimensional model comprises: defining a first box to frame the face in the input image; defining a second box to frame the face of the three-dimensional model template; and scaling the second box of the three-dimensional model template by performing at least one of scaling to fit the first box of the input image or scaling so that the second box is centered on the first box in the input image; and display the three-dimensional model with the face of the input image projected onto the three-dimensio

Assignees

Inventors

Classifications

  • Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title

  • Speech synthesis; Text to speech systems · CPC title

  • G06T15/04Primary

    Texture mapping · CPC title

  • driven by audio data · CPC title

  • Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11145100B2 cover?
Novel tools and techniques are provided for implementing three-dimensional facial modeling and visual speech synthesis. In various embodiments, a computing system might determine an orientation, size, and location of a face in a received input image; retrieve a three-dimensional model template comprising a face and head; project the input image onto the model template to generate a three-dimens…
Who is the assignee on this patent?
Univ Colorado Regents
What technology area does this patent fall under?
Primary CPC classification G06T15/04. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).