Data processing method, and storage medium and electronic device thereof
US-2024339107-A1 · Oct 10, 2024 · US
US2021090551A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021090551-A1 |
| Application number | US-202017029960-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 23, 2020 |
| Priority date | Sep 23, 2019 |
| Publication date | Mar 25, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An emotional speech generating method and apparatus capable of adjusting an emotional intensity is disclosed. The emotional speech generating method includes generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group, determining an internal distance between weight vectors included in a same emotion group, determining an external distance between weight vectors included in a same emotion group and weight vectors included in another emotion group, determining a representative weight vector of each of the emotion groups based on the internal distance and the external distance, generating a style embedding by applying the representative weight vector of each of the emotion groups to a style token including prosodic information for expressing an emotion, and generating an emotional speech expressing the emotion using the style embedding.
Opening claim text (preview).
What is claimed is: 1 . An emotional speech generating method, comprising: generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group; determining an internal distance which is a distance between weight vectors included in a same emotion group; determining an external distance which is a distance between weight vectors included in a same emotion group and weight vectors included in another emotion group; determining a representative weight vector of each of the emotion groups based on the internal distance and the external distance; generating a style embedding by applying the representative weight vector to a style token including prosodic information for expressing an emotion; and generating an emotional speech expressing the emotion using the style embedding. 2 . The emotional speech generating method of claim 1 , wherein the representative weight vector is a weight vector having a smallest sum of internal distances and a greatest sum of external distances among weight vectors included in each of the emotion groups. 3 . The emotional speech generating method of claim 1 , further comprising: receiving a text; and determining a text emotion which is an emotion corresponding to the text by analyzing the text, wherein the generating of the style embedding comprises: generating the style embedding using a representative weight vector of a text emotion group corresponding to the text emotion among the emotion groups. 4 . An emotional speech generating method, comprising: generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group; identifying, from among the emotion groups, a neutral emotion group corresponding to a neutral emotion and a target emotion group corresponding to an emotion to be expressed in an emotional speech; generating anew emotion group with an emotional intensity adjusted from the target emotion group by using a representative weight vector of the neutral emotion group and the target emotion group; determining a representative weight vector of the new emotion group based on an internal distance between weight vectors included in the new emotion group, and an external distance between the weight vectors included in the new emotion group and weight vectors included in the neutral emotion group or the target emotion group; generating a style embedding by applying the representative weight vector of the new emotion group to a style token including prosodic information for expressing an emotion; and generating the emotional speech expressing the emotion using the style embedding. 5 . The emotional speech generating method of claim 4 , wherein the generating of the new emotion group comprises: generating new weight vectors by interpolating, at a nonlinear interpolation ratio, the representative weight vector of the neutral emotion group and the weight vectors included in the target emotion group; and generating the new emotion group by grouping the generated new weight vectors. 6 . The emotional speech generating method of claim 5 , further comprising: receiving a text; and determining an emotional intensity corresponding to the text by analyzing the text, wherein the generating of the new emotion group comprises: determining the nonlinear interpolation ratio based on the emotional intensity. 7 . The emotional speech generating method of claim 4 , wherein the representative weight vector of the neutral emotion group is determined based on an internal distance between the weight vectors included in the neutral emotion group, and an external distance between the weight vectors included in the neutral emotion group and weight vectors included in another emotion group. 8 . The emotional speech generating method of claim 7 , wherein the representative weight vector of the neutral emotion group is a weight vector having a smallest sum of internal distances and a greatest sum of external distances among the weight vectors included in the neutral emotion group. 9 . The emotional speech generating method of claim 4 , further comprising: receiving a text; and determining a text emotion which is an emotion corresponding to the text by analyzing the text, wherein the identifying of the target emotion group comprises: identifying, as the target emotion group, an emotion group representing the text emotion from among the emotion groups. 10 . The emotional speech generating method of claim 4 , wherein the representative weight vector of the new emotion group is a weight vector having a smallest sum of internal distances and a greatest sum of external distances among the weight vectors included in the new emotion group. 11 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the emotional speech generating method of claim 1 . 12 . An emotional speech generating apparatus, comprising: an emotion vector generator; and an emotional speech generator, wherein the emotion vector generator is configured to: generate emotion groups by grouping weight vectors representing a same emotion into a same emotion group: identify, from among the emotion groups, a neutral emotion group corresponding to a neutral emotion and a target emotion group corresponding to an emotion to be expressed in an emotional speech; generate a new emotion group with an emotional intensity adjusted from the target emotion group by using a representative weight vector of the neutral emotion group and the target emotion group; determine a representative weight vector of the new emotion group based on an internal distance between weight vectors included in the new emotion group, and an external distance between the weight vectors included in the new emotion group and weight vectors included in the neutral emotion group or the target emotion group; and generate a style embedding by applying the representative weight vector of the new emotion group to a style token including prosodic information for expressing an emotion, and the emotional speech generator is configured to: generate an emotional speech expressing the emotion using the style embedding. 13 . The emotional speech generating apparatus of claim 12 , wherein the emotion vector generator is configured to: generate new weight vectors by interpolating the representative weight vector of the neutral emotion group and the weight vectors included in the target emotion group based on a nonlinear interpolation ratio; and generate the new emotion group by grouping the generated new weight vectors. 14 . The emotional speech generating apparatus of claim 13 , further comprising: an emotion identifier configured to receive a text, and determine an emotional intensity corresponding to the text by analyzing the text, wherein the emotion vector generator is configured to determine the nonlinear interpolation ratio based on the determined emotional intensity. 15 . The emotional speech generating apparatus of claim 12 , wherein the representative weight vector of the neutral emotion group is determined based on an internal distance between the weight vectors included in the neutral emotion group and an external distance between the weight vectors included in the neutral emotion group and weight vectors included in another emotion group.
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
Semantic analysis · CPC title
Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title
Voice editing, e.g. manipulating the voice of the synthesiser · CPC title
for estimating an emotional state · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.