Artificial intelligence-based animation character drive method and related apparatus

US12112417B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12112417-B2
Application numberUS-202218080655-A
CountryUS
Kind codeB2
Filing dateDec 13, 2022
Priority dateSep 2, 2019
Publication dateOct 8, 2024
Grant dateOct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

First claim

Opening claim text (preview).

What is claimed is: 1. An animation character drive method performed by a computing device, the method comprising: obtaining simultaneously captured multi-modal media data including a facial expression and a corresponding speech of a speaker; determining a first expression base of a first animation character corresponding to the speaker according to the facial expression; determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information, the acoustic feature characterizing a sound of the speaker when speaking the target text information, and the target expression parameter characterizing a facial expression of the speaker when speaking the target text information based on the first expression base; determining a mapping relationship between the first expression base of the first animation character and the second expression base of the second animation character, wherein an expression parameter of the first expression base is a function of an expression parameter of the second expression base; and driving a second animation character having a second expression base to simulate a sound of the speaker speaking the target text information based on the acoustic feature and a facial expression of the second animation character based on the mapping relationship, the target expression parameter and the second expression base simultaneously. 2. The method according to claim 1 , wherein the determining a first expression base of a first animation character corresponding to the speaker according to the facial expression comprises: determining the first expression base of the first animation character and a face-to-parameter translation parameter of the first animation character according to the facial expression, the face-to-parameter translation parameter being used for identifying a change degree of a face shape of the first animation character relative to a face-to-parameter translation base corresponding to the first animation character. 3. The method according to claim 1 , wherein the second expression base is generated according to a preset relationship between the second expression base and a phoneme, and the determining a mapping relationship between an expression parameter corresponding to the first expression base and an expression parameter corresponding to the second expression base comprises: determining, according to the media data, a phoneme identified by the speech, a time interval corresponding to the phoneme, and video frames in which the media data is in the time interval; determining a first expression parameter corresponding to the phoneme according to the video frames, the first expression parameter being used for identifying a change degree of a facial expression of the speaker when giving the phoneme relative to the first expression base; determining a second expression parameter corresponding to the phoneme according to the preset relationship and the second expression base; and determining the mapping relationship in a function form according to the first expression parameter and the second expression parameter. 4. The method according to claim 1 , wherein the determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information comprises: determining, according to the target text information and the media data, the acoustic feature and an expression feature corresponding to the target text information, the expression feature identifying a facial expression of the speaker when the speaker says the target text information; and determining the target expression parameter according to the first expression base and the expression feature. 5. A computing device comprising a processor and a memory coupled to the processor, wherein the memory stores a plurality of computer programs that, when executed by the processor, cause the computing device to perform a plurality of operations including: obtaining simultaneously captured multi-modal media data including a facial expression and a corresponding speech of a speaker; determining a first expression base of a first animation character corresponding to the speaker according to the facial expression; determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information, the acoustic feature characterizing a sound of the speaker when speaking the target text information, and the target expression parameter characterizing a facial expression of the speaker when speaking the target text information based on the first expression base; determining a mapping relationship between the first expression base of the first animation character and the second expression base of the second animation character, wherein an expression parameter of the first expression base is a function of an expression parameter of the second expression base; and driving a second animation character having a second expression base to simulate a sound of the speaker speaking the target text information based on the acoustic feature and a facial expression of the second animation character based on the mapping relationship, the target expression parameter and the second expression base simultaneously. 6. The computing device according to claim 5 , wherein the determining a first expression base of a first animation character corresponding to the speaker according to the facial expression comprises: determining the first expression base of the first animation character and a face-to-parameter translation parameter of the first animation character according to the facial expression, the face-to-parameter translation parameter being used for identifying a change degree of a face shape of the first animation character relative to a face-to-parameter translation base corresponding to the first animation character. 7. The computing device according to claim 6 , wherein the second expression base is generated according to a preset relationship between the second expression base and a phoneme, and the determining a mapping relationship between an expression parameter corresponding to the first expression base and an expression parameter corresponding to the second expression base comprises: determining, according to the media data, a phoneme identified by the speech, a time interval corresponding to the phoneme, and video frames in which the media data is in the time interval; determining a first expression parameter corresponding to the phoneme according to the video frames, the first expression parameter being used for identifying a change degree of a facial expression of the speaker when giving the phoneme relative to the first expression base; determining a second expression parameter corresponding to the phoneme according to the preset relationship and the second expression base; and determining the mapping relationship in a function form according to the first expression parameter and the second expression parameter. 8. The computing device according to claim 5 , wherein the determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information comprises: determining, according to the target text information and the media data, the acoustic feature and an expression feature corresponding to the target text information, the expression feature identifying a facial expression of the speaker when the speaker says the target text information; an

Assignees

Inventors

Classifications

  • Facial expression recognition · CPC title

  • Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • driven by audio data · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12112417B2 cover?
This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After targe…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06T13/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).