Computer generated head

US9959657B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9959657-B2
Application numberUS-201414167238-A
CountryUS
Kind codeB2
Filing dateJan 29, 2014
Priority dateJan 29, 2013
Publication dateMay 1, 2018
Grant dateMay 1, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of animating a computer generation of a head, the head having a mouth which moves in accordance with speech to be output by the head, said method comprising: providing an input related to the speech which is to be output by the movement of the lips; dividing said input into a sequence of acoustic units; selecting expression characteristics for the inputted text; converting said sequence of acoustic units to a sequence of image vectors using a statistical model, wherein said model has a plurality of model parameters describing probability distributions which relate an acoustic unit to an image vector, said image vector comprising a plurality of parameters which define a face of said head; and outputting said sequence of image vectors as video such that the mouth of said head moves to mime the speech associated with the input text with the selected expression, wherein a parameter of a predetermined type of each probability distribution in said selected expression is expressed as a weighted sum of parameters of the same type, and wherein the weighting used is expression dependent, such that converting said sequence of acoustic units to a sequence of image vectors comprises retrieving the expression dependent weights for said selected expression, wherein the parameters are provided in clusters, and each cluster comprises at least one sub-cluster, wherein said expression dependent weights are retrieved for each cluster such that there is one weight per sub-cluster.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of animating a computer generation of a face having a mouth, the method comprising: receiving a text input related to speech, which is to be output by movement of the mouth; dividing the text input into a sequence of acoustic units including one of at least phonemes, graphemes, and words or parts of words; analyzing the text input related to the speech to identify expression-dependent weightings related to a speech expression and a corresponding facial expression, to be input into a statistical model; converting the sequence of acoustic units into a sequence of image vectors and a sequence of speech vectors using the statistical model, wherein the model has a plurality of model parameters comprising mathematical means of probability distributions, which relate an acoustic unit in the sequence of acoustic units to are image vector in the sequence of image vectors and to a speech vector in the sequence of speech vectors, the image vector including a plurality of parameters that define the face; and outputting the sequence of image vectors and the sequence of speech vectors, wherein the sequence of image vectors are output as video such that the mouth moves to mime the speech expression associated with the corresponding facial expression, and wherein the sequence of speech vectors are output as audio, which is synchronized with lip movement of the mouth and is associated with the speech expression, wherein the mathematical means of each probability distribution of the probability distributions for the speech expression and the corresponding facial expression are expressed as a weighted sum of independent mathematical means, wherein weightings used in the weighted sum are the identified expression-dependent weightings, wherein the independent mathematical means are provided in clusters, and wherein here is one expression-dependent weighting per cluster. 2. The method according to claim 1 , wherein each cluster includes at least one sub-cluster, and wherein the identified expression dependent weightings are retrieved for said each cluster such that there is one weight per sub-cluster. 3. The method according to claim 2 , wherein the at least one sub-cluster includes at least one decision tree being based on questions relating to at least one of linguistic differences, phonetic differences, and prosodic differences. 4. The method according to claim 2 , further comprising selecting the speech expression from at least one of different emotions, different accents, or different speaking styles, wherein the selecting includes randomly selecting a set of the identified expression dependent weightings from a plurality of pre-stored sets of the identified expression dependent weightings, and wherein each selected set of the identified expression dependent weightings includes weightings for the at least one sub-cluster. 5. The method according to claim 1 , further comprising selecting the speech expression from at least one of different emotions, different accents, or different speaking styles. 6. The method according to claim 5 , wherein the converting the sequence of the acoustic units into the sequence of image vectors and the sequence of speech vectors using the statistical model includes retrieving the identified expression dependent weightings for the selected speech expression. 7. The method according to claim 5 , wherein the selecting includes providing an input to allow the identified expression-dependent weightings to be selected via the input by a user. 8. The method according to claim 5 , wherein the selecting includes predicting the identified expression-dependent weightings to be used from external information about the speech to be output by the movement of the mouth. 9. The method according to claim 5 , wherein the selecting includes receiving a video input containing the face and varying the identified expression-dependent weightings to simulate an expression on a face of the video input. 10. The method according to claim 5 , wherein the selecting includes receiving an audio input containing the speech to be output by the movement of the mouth and obtaining the identified expression-dependent weightings from the audio input. 11. The method according to claim 1 , further comprising constructing the face from the image vector including the plurality of parameters that define the face, said parameters permitting constructing the face from a weighted sum of modes representing reconstructions of the face or a part of the face. 12. The method according to claim 11 , wherein the weighted sum of modes includes modes to represent a shape of the face and an appearance of the face. 13. The method according to claim 12 , wherein a same weighting of the identified expression-dependent weightings is used for a shape mode and a corresponding appearance mode of the modes to represent the shape of the face and the appearance of the face. 14. The method according to claim 11 , wherein at least one of the modes from the weighted sum of modes represents a pose of the face. 15. The method according to claim 11 , wherein a plurality of the modes from the weighted sum of modes represents a deformation of regions of the face. 16. The method according to claim 11 , wherein at least one of the modes from the weighted sum of modes represents blinking. 17. The method according to claim 11 , wherein at least one of the modes from the weighted sum of modes represents blinking. 18. The method according to claim 11 , wherein static features of the face are modelled with a fixed shape and a fixed texture. 19. A non-transitory computer-readable storage medium having computer-readable instructions stored thereon, which when executed by a computer cause the computer to perform the method of claim 1 . 20. A method of adapting a system for rendering a computer generated face to a new expression, the method comprising: receiving text data related to speech, which is to be output by movement of the mouth; dividing the text data into a sequence of acoustic units including one of at least phonemes, graphemes, and words or parts of words; analyzing the text data related to the speech to identify expression-dependent weightings related to a speech expression and a corresponding facial expression, to be input into a statistical model; converting the sequence of acoustic units into a sequence of image vectors and a sequence of speech vectors using the statistical model, wherein the model has a plurality of model parameters comprising mathematical means of probability distributions, which relate an acoustic unit in the sequence of acoustic units to an image vector in the sequence of image vectors and to a speech vector in the sequence of speech vectors, the image vector including a plurality of parameters that define the face; outputting the sequence of image vectors and the sequence of speech vectors, wherein the sequence of image vectors are output as video such that the mouth moves to mime the speech expression associated with the corresponding facial expression, and wherein the sequence of speech vectors are output as audio, which is synchronized with lip movement of the mouth and is associated with the speech expression, wherein the mathematical means of each probability distribution of the probability distributions for the speech expression and the corresponding facial expression are expressed as a weighted sum of independent mathematical means, wherein weightings used in the weighted sum are

Assignees

Inventors

Classifications

  • G06T13/80Primary

    Two-dimensional [2D] animation, e.g. using sprites · CPC title

  • Synthesis of the lips movements from speech, e.g. for talking heads · CPC title

  • G06T13/205Primary

    driven by audio data · CPC title

  • Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

  • for estimating an emotional state · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9959657B2 cover?
A method of animating a computer generation of a head, the head having a mouth which moves in accordance with speech to be output by the head, said method comprising: providing an input related to the speech which is to be output by the movement of the lips; dividing said input into a sequence of acoustic units; selecting expression characteristics for the inputted text; conver…
Who is the assignee on this patent?
Toshiba Kk
What technology area does this patent fall under?
Primary CPC classification G06T13/80. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 01 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).