Method and apparatus for controlling mouth shape changes of three-dimensional virtual portrait

US11308671B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11308671-B2
Application numberUS-201916721772-A
CountryUS
Kind codeB2
Filing dateDec 19, 2019
Priority dateJun 28, 2019
Publication dateApr 19, 2022
Grant dateApr 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to a method and apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, relating to the field of cloud computing. The method may include: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for controlling mouth shape changes of a three-dimensional virtual portrait, comprising: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence, wherein the generating, based on the at least one speech segment, the mouth shape control parameter sequence for the to-be-played speech comprises: generating, for a speech segment of the at least one speech segment, a phoneme information sequence of the speech segment; inputting the phoneme information sequence composed of a plurality of pieces of phoneme information into a pre-established mouth shape key point predicting model to obtain a mouth shape key point information sequence composed of a plurality of pieces of mouth shape key point information, wherein the pre-established mouth shape key point predicting model is used to characterize a corresponding relationship between the phoneme information sequence and the mouth shape key point information sequence, wherein the mouth shape key point information indicates position information of a preset number of face key points related to a mouth shape, wherein inputting the phoneme information sequence composed of the plurality of pieces of phoneme information into the pre-established mouth shape key point predicting model to obtain the mouth shape key point information sequence composed of the plurality of pieces of mouth shape key point information comprises: outputting, by the pre-established mouth shape key point predicting model, a first piece of mouth shape key point information by using a first piece of phoneme information as a first input, and outputting, by the pre-established mouth shape key point predicting model, a second piece of mouth shape key point information by using a second piece of phoneme information and the first piece of mouth shape key point information as a second input; and generating, based on the mouth shape key point information sequence, the mouth shape control parameter sequence. 2. The method according to claim 1 , wherein the generating, based on the at least one speech segment, the mouth shape control parameter sequence for the to-be-played speech comprises: generating, based on the at least one speech segment, a two-dimensional feature matrix sequence, and inputting the two-dimensional feature matrix sequence into a pre-established convolutional neural network to obtain the mouth shape control parameter sequence, wherein the pre-established convolutional neural network is used to characterize corresponding relationships between two-dimensional feature matrices and mouth shape control parameters, wherein the generating, based on the at least one speech segment, the two-dimensional feature matrix sequence comprises: generating, for the speech segment of the at least one speech segment, at least one two-dimensional feature matrix for the speech segment; and splicing, based on an order of the at least one speech segment in the to-be-played speech, the generated at least one two-dimensional feature matrix into the two-dimensional feature matrix sequence. 3. The method according to claim 2 , wherein the generating, for the speech segment of the at least one speech segment, the at least one two-dimensional feature matrix for the speech segment comprises: dividing the speech segment into a preset number of speech sub-segments, wherein two adjacent speech sub-segments partially overlap; extracting, for a speech sub-segment in the preset number of speech sub-segments, a feature of the speech sub-segment to obtain a speech feature vector for the speech sub-segment; and generating, based on obtained preset number of speech feature vectors, the at least one two-dimensional feature matrix for the speech segment. 4. The method according to claim 1 , wherein the generating, based on the mouth shape key point information sequence, the mouth shape control parameter sequence comprises: obtaining, for mouth shape key point information in the mouth shape key point information sequence, at least one mouth shape control parameter corresponding to the mouth shape key point information based on a pre-established corresponding relationship between sample mouth shape key point information and a sample mouth shape control parameter; and generating the mouth shape control parameter sequence based on the obtained at least one mouth shape control parameter. 5. The method according to claim 1 , wherein the pre-established mouth shape key point predicting model is a recurrent neural network, and a loop body of the recurrent neural network is a long short-term memory. 6. The method according to claim 1 , wherein the pre-established mouth shape key point predicting model is a table storing a plurality of corresponding relationship between phoneme information sequences and mouth shape key point information sequences, wherein the table is determined based on statistics of a large number of the phoneme information sequences and the mouth shape key point information sequences. 7. The method according to claim 1 , wherein the pre-established mouth shape key point predicting model comprises a first sub-model and a second sub-model, wherein outputting, by the pre-established mouth shape key point predicting model, the first piece of mouth shape key point information by using the first piece of phoneme information as the first input, and outputting, by the pre-established mouth shape key point predicting model, the second piece of mouth shape key point information by using the second piece of phoneme information and the first piece of mouth shape key point information as the second input comprises: outputting, by the first sub-model, the first piece of mouth shape key point information by inputting the first piece of phoneme information into the first sub-model; and outputting, by the second sub-model, the second piece of mouth shape key point information by inputting the second piece of phoneme information and the first piece of mouth shape key point information into the second sub-model. 8. The method according to claim 1 , wherein the first piece of phoneme information is generated from a first speech segment of the speech segment, and the second piece of phoneme information is generated from a second speech segment of the speech segment, wherein the first speech segment is acquired before a second speed segment is acquired, and a part of the first speech segment is identical to a part of the second speech segment. 9. An apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, comprising: at least one processor; and a memory storing instructions, the instructions when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence, wherein the generating, based on the at least one speech segment, the mouth shape control parameter sequence for the to-be-

Assignees

Inventors

Classifications

  • G10L21/10Primary

    Transforming into visible information · CPC title

  • Combinations of networks · CPC title

  • G06N3/044Primary

    Recurrent networks, e.g. Hopfield networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11308671B2 cover?
Embodiments of the present disclosure relate to a method and apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, relating to the field of cloud computing. The method may include: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at lea…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L21/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).