Artificial intelligence-based animation character drive method and related apparatus
US-11605193-B2 · Mar 14, 2023 · US
US12112417B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12112417-B2 |
| Application number | US-202218080655-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 13, 2022 |
| Priority date | Sep 2, 2019 |
| Publication date | Oct 8, 2024 |
| Grant date | Oct 8, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This application discloses an artificial intelligence (AI) based animation character drive method. A first expression base of a first animation character corresponding to a speaker is determined by acquiring media data including a facial expression change when the speaker says a speech, and the first expression base may reflect different expressions of the first animation character. After target text information is obtained, an acoustic feature and a target expression parameter corresponding to the target text information are determined according to the target text information, the foregoing acquired media data, and the first expression base. A second animation character having a second expression base may be driven according to the acoustic feature and the target expression parameter, so that the second animation character may simulate the speaker's sound and facial expression when saying the target text information, thereby improving experience of interaction between the user and the animation character.
Opening claim text (preview).
What is claimed is: 1. An animation character drive method performed by a computing device, the method comprising: obtaining simultaneously captured multi-modal media data including a facial expression and a corresponding speech of a speaker; determining a first expression base of a first animation character corresponding to the speaker according to the facial expression; determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information, the acoustic feature characterizing a sound of the speaker when speaking the target text information, and the target expression parameter characterizing a facial expression of the speaker when speaking the target text information based on the first expression base; determining a mapping relationship between the first expression base of the first animation character and the second expression base of the second animation character, wherein an expression parameter of the first expression base is a function of an expression parameter of the second expression base; and driving a second animation character having a second expression base to simulate a sound of the speaker speaking the target text information based on the acoustic feature and a facial expression of the second animation character based on the mapping relationship, the target expression parameter and the second expression base simultaneously. 2. The method according to claim 1 , wherein the determining a first expression base of a first animation character corresponding to the speaker according to the facial expression comprises: determining the first expression base of the first animation character and a face-to-parameter translation parameter of the first animation character according to the facial expression, the face-to-parameter translation parameter being used for identifying a change degree of a face shape of the first animation character relative to a face-to-parameter translation base corresponding to the first animation character. 3. The method according to claim 1 , wherein the second expression base is generated according to a preset relationship between the second expression base and a phoneme, and the determining a mapping relationship between an expression parameter corresponding to the first expression base and an expression parameter corresponding to the second expression base comprises: determining, according to the media data, a phoneme identified by the speech, a time interval corresponding to the phoneme, and video frames in which the media data is in the time interval; determining a first expression parameter corresponding to the phoneme according to the video frames, the first expression parameter being used for identifying a change degree of a facial expression of the speaker when giving the phoneme relative to the first expression base; determining a second expression parameter corresponding to the phoneme according to the preset relationship and the second expression base; and determining the mapping relationship in a function form according to the first expression parameter and the second expression parameter. 4. The method according to claim 1 , wherein the determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information comprises: determining, according to the target text information and the media data, the acoustic feature and an expression feature corresponding to the target text information, the expression feature identifying a facial expression of the speaker when the speaker says the target text information; and determining the target expression parameter according to the first expression base and the expression feature. 5. A computing device comprising a processor and a memory coupled to the processor, wherein the memory stores a plurality of computer programs that, when executed by the processor, cause the computing device to perform a plurality of operations including: obtaining simultaneously captured multi-modal media data including a facial expression and a corresponding speech of a speaker; determining a first expression base of a first animation character corresponding to the speaker according to the facial expression; determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information, the acoustic feature characterizing a sound of the speaker when speaking the target text information, and the target expression parameter characterizing a facial expression of the speaker when speaking the target text information based on the first expression base; determining a mapping relationship between the first expression base of the first animation character and the second expression base of the second animation character, wherein an expression parameter of the first expression base is a function of an expression parameter of the second expression base; and driving a second animation character having a second expression base to simulate a sound of the speaker speaking the target text information based on the acoustic feature and a facial expression of the second animation character based on the mapping relationship, the target expression parameter and the second expression base simultaneously. 6. The computing device according to claim 5 , wherein the determining a first expression base of a first animation character corresponding to the speaker according to the facial expression comprises: determining the first expression base of the first animation character and a face-to-parameter translation parameter of the first animation character according to the facial expression, the face-to-parameter translation parameter being used for identifying a change degree of a face shape of the first animation character relative to a face-to-parameter translation base corresponding to the first animation character. 7. The computing device according to claim 6 , wherein the second expression base is generated according to a preset relationship between the second expression base and a phoneme, and the determining a mapping relationship between an expression parameter corresponding to the first expression base and an expression parameter corresponding to the second expression base comprises: determining, according to the media data, a phoneme identified by the speech, a time interval corresponding to the phoneme, and video frames in which the media data is in the time interval; determining a first expression parameter corresponding to the phoneme according to the video frames, the first expression parameter being used for identifying a change degree of a facial expression of the speaker when giving the phoneme relative to the first expression base; determining a second expression parameter corresponding to the phoneme according to the preset relationship and the second expression base; and determining the mapping relationship in a function form according to the first expression parameter and the second expression parameter. 8. The computing device according to claim 5 , wherein the determining, according to the media data and the first expression base, an acoustic feature and a target expression parameter of the first animation character speaking target text information comprises: determining, according to the target text information and the media data, the acoustic feature and an expression feature corresponding to the target text information, the expression feature identifying a facial expression of the speaker when the speaker says the target text information; an
Facial expression recognition · CPC title
Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames · CPC title
Phonemes, fenemes or fenones being the recognition units · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
driven by audio data · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.