Who is the assignee on this patent?

Beijing Baidu Netcom Sci & Tech Co Ltd

What technology area does this patent fall under?

Primary CPC classification G10L21/10. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for controlling mouth shape changes of three-dimensional virtual portrait

US11308671B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11308671-B2
Application number	US-201916721772-A
Country	US
Kind code	B2
Filing date	Dec 19, 2019
Priority date	Jun 28, 2019
Publication date	Apr 19, 2022
Grant date	Apr 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to a method and apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, relating to the field of cloud computing. The method may include: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for controlling mouth shape changes of a three-dimensional virtual portrait, comprising: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence, wherein the generating, based on the at least one speech segment, the mouth shape control parameter sequence for the to-be-played speech comprises: generating, for a speech segment of the at least one speech segment, a phoneme information sequence of the speech segment; inputting the phoneme information sequence composed of a plurality of pieces of phoneme information into a pre-established mouth shape key point predicting model to obtain a mouth shape key point information sequence composed of a plurality of pieces of mouth shape key point information, wherein the pre-established mouth shape key point predicting model is used to characterize a corresponding relationship between the phoneme information sequence and the mouth shape key point information sequence, wherein the mouth shape key point information indicates position information of a preset number of face key points related to a mouth shape, wherein inputting the phoneme information sequence composed of the plurality of pieces of phoneme information into the pre-established mouth shape key point predicting model to obtain the mouth shape key point information sequence composed of the plurality of pieces of mouth shape key point information comprises: outputting, by the pre-established mouth shape key point predicting model, a first piece of mouth shape key point information by using a first piece of phoneme information as a first input, and outputting, by the pre-established mouth shape key point predicting model, a second piece of mouth shape key point information by using a second piece of phoneme information and the first piece of mouth shape key point information as a second input; and generating, based on the mouth shape key point information sequence, the mouth shape control parameter sequence. 2. The method according to claim 1 , wherein the generating, based on the at least one speech segment, the mouth shape control parameter sequence for the to-be-played speech comprises: generating, based on the at least one speech segment, a two-dimensional feature matrix sequence, and inputting the two-dimensional feature matrix sequence into a pre-established convolutional neural network to obtain the mouth shape control parameter sequence, wherein the pre-established convolutional neural network is used to characterize corresponding relationships between two-dimensional feature matrices and mouth shape control parameters, wherein the generating, based on the at least one speech segment, the two-dimensional feature matrix sequence comprises: generating, for the speech segment of the at least one speech segment, at least one two-dimensional feature matrix for the speech segment; and splicing, based on an order of the at least one speech segment in the to-be-played speech, the generated at least one two-dimensional feature matrix into the two-dimensional feature matrix sequence. 3. The method according to claim 2 , wherein the generating, for the speech segment of the at least one speech segment, the at least one two-dimensional feature matrix for the speech segment comprises: dividing the speech segment into a preset number of speech sub-segments, wherein two adjacent speech sub-segments partially overlap; extracting, for a speech sub-segment in the preset number of speech sub-segments, a feature of the speech sub-segment to obtain a speech feature vector for the speech sub-segment; and generating, based on obtained preset number of speech feature vectors, the at least one two-dimensional feature matrix for the speech segment. 4. The method according to claim 1 , wherein the generating, based on the mouth shape key point information sequence, the mouth shape control parameter sequence comprises: obtaining, for mouth shape key point information in the mouth shape key point information sequence, at least one mouth shape control parameter corresponding to the mouth shape key point information based on a pre-established corresponding relationship between sample mouth shape key point information and a sample mouth shape control parameter; and generating the mouth shape control parameter sequence based on the obtained at least one mouth shape control parameter. 5. The method according to claim 1 , wherein the pre-established mouth shape key point predicting model is a recurrent neural network, and a loop body of the recurrent neural network is a long short-term memory. 6. The method according to claim 1 , wherein the pre-established mouth shape key point predicting model is a table storing a plurality of corresponding relationship between phoneme information sequences and mouth shape key point information sequences, wherein the table is determined based on statistics of a large number of the phoneme information sequences and the mouth shape key point information sequences. 7. The method according to claim 1 , wherein the pre-established mouth shape key point predicting model comprises a first sub-model and a second sub-model, wherein outputting, by the pre-established mouth shape key point predicting model, the first piece of mouth shape key point information by using the first piece of phoneme information as the first input, and outputting, by the pre-established mouth shape key point predicting model, the second piece of mouth shape key point information by using the second piece of phoneme information and the first piece of mouth shape key point information as the second input comprises: outputting, by the first sub-model, the first piece of mouth shape key point information by inputting the first piece of phoneme information into the first sub-model; and outputting, by the second sub-model, the second piece of mouth shape key point information by inputting the second piece of phoneme information and the first piece of mouth shape key point information into the second sub-model. 8. The method according to claim 1 , wherein the first piece of phoneme information is generated from a first speech segment of the speech segment, and the second piece of phoneme information is generated from a second speech segment of the speech segment, wherein the first speech segment is acquired before a second speed segment is acquired, and a part of the first speech segment is identical to a part of the second speech segment. 9. An apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, comprising: at least one processor; and a memory storing instructions, the instructions when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at least one speech segment, a mouth shape control parameter sequence for the to-be-played speech; and controlling, in response to playing the to-be-played speech, a preset mouth shape of the three-dimensional virtual portrait to change based on the mouth shape control parameter sequence, wherein the generating, based on the at least one speech segment, the mouth shape control parameter sequence for the to-be-

Assignees

Beijing Baidu Netcom Sci & Tech Co Ltd

Inventors

Classifications

G10L21/10Primary
Transforming into visible information · CPC title
G06N3/045
Combinations of networks · CPC title
G06N3/044Primary
Recurrent networks, e.g. Hopfield networks · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

View patent family 68019899

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11308671B2 cover?: Embodiments of the present disclosure relate to a method and apparatus for controlling mouth shape changes of a three-dimensional virtual portrait, relating to the field of cloud computing. The method may include: acquiring a to-be-played speech; sliding a preset time window at a preset step length in the to-be-played speech to obtain at least one speech segment; generating, based on the at lea…
Who is the assignee on this patent?: Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L21/10. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

End-to-end speech recognition

Deployed end-to-end speech recognition

Display apparatus of front-of-the-eye mounted type

Frequently asked questions