What technology area does this patent fall under?

Primary CPC classification G06T13/40. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Artificial intelligence-based animation character drive method and related apparatus

US12112417B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12112417-B2
Application number	US-202218080655-A
Country	US
Kind code	B2
Filing date	Dec 13, 2022
Priority date	Sep 2, 2019
Publication date	Oct 8, 2024
Grant date	Oct 8, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.

First claim

Opening claim text (preview).

What is claimed is: 1. An animation character drive method performed by a computing device, the method comprising: obtaining simultaneously captured multi-modal media data including a facial expression and a corresponding speech of a speaker; determining a first expression base of a first animation character corresponding to the speaker according to the facial expression; determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information, the acoustic feature characterizing a sound of the speaker when speaking the target text information, and the target expression parameter characterizing a facial expression of the speaker when speaking the target text information based on the first expression base; determining a mapping relationship between the first expression base of the first animation character and the second expression base of the second animation character, wherein an expression parameter of the first expression base is a function of an expression parameter of the second expression base; and driving a second animation character having a second expression base to simulate a sound of the speaker speaking the target text information based on the acoustic feature and a facial expression of the second animation character based on the mapping relationship, the target expression parameter and the second expression base simultaneously. 2. The method according to claim 1 , wherein the determining a first expression base of a first animation character corresponding to the speaker according to the facial expression comprises: determining the first expression base of the first animation character and a face-to-parameter translation parameter of the first animation character according to the facial expression, the face-to-parameter translation parameter being used for identifying a change degree of a face shape of the first animation character relative to a face-to-parameter translation base corresponding to the first animation character. 3. The method according to claim 1 , wherein the second expression base is generated according to a preset relationship between the second expression base and a phoneme, and the determining a mapping relationship between an expression parameter corresponding to the first expression base and an expression parameter corresponding to the second expression base comprises: determining, according to the media data, a phoneme identified by the speech, a time interval corresponding to the phoneme, and video frames in which the media data is in the time interval; determining a first expression parameter corresponding to the phoneme according to the video frames, the first expression parameter being used for identifying a change degree of a facial expression of the speaker when giving the phoneme relative to the first expression base; determining a second expression parameter corresponding to the phoneme according to the preset relationship and the second expression base; and determining the mapping relationship in a function form according to the first expression parameter and the second expression parameter. 4. The method according to claim 1 , wherein the determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information comprises: determining, according to the target text information and the media data, the acoustic feature and an expression feature corresponding to the target text information, the expression feature identifying a facial expression of the speaker when the speaker says the target text information; and determining the target expression parameter according to the first expression base and the expression feature. 5. A computing device comprising a processor and a memory coupled to the processor, wherein the memory stores a plurality of computer programs that, when executed by the processor, cause the computing device to perform a plurality of operations including: obtaining simultaneously captured multi-modal media data including a facial expression and a corresponding speech of a speaker; determining a first expression base of a first animation character corresponding to the speaker according to the facial expression; determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information, the acoustic feature characterizing a sound of the speaker when speaking the target text information, and the target expression parameter characterizing a facial expression of the speaker when speaking the target text information based on the first expression base; determining a mapping relationship between the first expression base of the first animation character and the second expression base of the second animation character, wherein an expression parameter of the first expression base is a function of an expression parameter of the second expression base; and driving a second animation character having a second expression base to simulate a sound of the speaker speaking the target text information based on the acoustic feature and a facial expression of the second animation character based on the mapping relationship, the target expression parameter and the second expression base simultaneously. 6. The computing device according to claim 5 , wherein the determining a first expression base of a first animation character corresponding to the speaker according to the facial expression comprises: determining the first expression base of the first animation character and a face-to-parameter translation parameter of the first animation character according to the facial expression, the face-to-parameter translation parameter being used for identifying a change degree of a face shape of the first animation character relative to a face-to-parameter translation base corresponding to the first animation character. 7. The computing device according to claim 6 , wherein the second expression base is generated according to a preset relationship between the second expression base and a phoneme, and the determining a mapping relationship between an expression parameter corresponding to the first expression base and an expression parameter corresponding to the second expression base comprises: determining, according to the media data, a phoneme identified by the speech, a time interval corresponding to the phoneme, and video frames in which the media data is in the time interval; determining a first expression parameter corresponding to the phoneme according to the video frames, the first expression parameter being used for identifying a change degree of a facial expression of the speaker when giving the phoneme relative to the first expression base; determining a second expression parameter corresponding to the phoneme according to the preset relationship and the second expression base; and determining the mapping relationship in a function form according to the first expression parameter and the second expression parameter. 8. The computing device according to claim 5 , wherein the determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information comprises: determining, according to the target text information and the media data, the acoustic feature and an expression feature corresponding to the target text information, the expression feature identifying a facial expression of the speaker when the speaker says the target text information; an

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G06V40/174
Facial expression recognition · CPC title
G06V20/46
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
G10L2015/025
Phonemes, fenemes or fenones being the recognition units · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G06T13/205
driven by audio data · CPC title

Patent family

Related publications grouped by family.

View patent family 68666304

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12112417B2 cover?: This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After targe…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06T13/40. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Artificial intelligence-based animation character drive method and related apparatus

System Providing Expressive and Emotive Text-to-Speech

Method and apparatus for controlling mouth shape changes of three-dimensional virtual portrait

Generating videos with a character indicating a region of an image

Animated chat presence

Systems and methods for speech animation using visemes with phonetic boundary context

Frequently asked questions