Emotional speech generating method and apparatus for controlling emotional intensity

US2021090551A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021090551-A1
Application numberUS-202017029960-A
CountryUS
Kind codeA1
Filing dateSep 23, 2020
Priority dateSep 23, 2019
Publication dateMar 25, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An emotional speech generating method and apparatus capable of adjusting an emotional intensity is disclosed. The emotional speech generating method includes generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group, determining an internal distance between weight vectors included in a same emotion group, determining an external distance between weight vectors included in a same emotion group and weight vectors included in another emotion group, determining a representative weight vector of each of the emotion groups based on the internal distance and the external distance, generating a style embedding by applying the representative weight vector of each of the emotion groups to a style token including prosodic information for expressing an emotion, and generating an emotional speech expressing the emotion using the style embedding.

First claim

Opening claim text (preview).

What is claimed is: 1 . An emotional speech generating method, comprising: generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group; determining an internal distance which is a distance between weight vectors included in a same emotion group; determining an external distance which is a distance between weight vectors included in a same emotion group and weight vectors included in another emotion group; determining a representative weight vector of each of the emotion groups based on the internal distance and the external distance; generating a style embedding by applying the representative weight vector to a style token including prosodic information for expressing an emotion; and generating an emotional speech expressing the emotion using the style embedding. 2 . The emotional speech generating method of claim 1 , wherein the representative weight vector is a weight vector having a smallest sum of internal distances and a greatest sum of external distances among weight vectors included in each of the emotion groups. 3 . The emotional speech generating method of claim 1 , further comprising: receiving a text; and determining a text emotion which is an emotion corresponding to the text by analyzing the text, wherein the generating of the style embedding comprises: generating the style embedding using a representative weight vector of a text emotion group corresponding to the text emotion among the emotion groups. 4 . An emotional speech generating method, comprising: generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group; identifying, from among the emotion groups, a neutral emotion group corresponding to a neutral emotion and a target emotion group corresponding to an emotion to be expressed in an emotional speech; generating anew emotion group with an emotional intensity adjusted from the target emotion group by using a representative weight vector of the neutral emotion group and the target emotion group; determining a representative weight vector of the new emotion group based on an internal distance between weight vectors included in the new emotion group, and an external distance between the weight vectors included in the new emotion group and weight vectors included in the neutral emotion group or the target emotion group; generating a style embedding by applying the representative weight vector of the new emotion group to a style token including prosodic information for expressing an emotion; and generating the emotional speech expressing the emotion using the style embedding. 5 . The emotional speech generating method of claim 4 , wherein the generating of the new emotion group comprises: generating new weight vectors by interpolating, at a nonlinear interpolation ratio, the representative weight vector of the neutral emotion group and the weight vectors included in the target emotion group; and generating the new emotion group by grouping the generated new weight vectors. 6 . The emotional speech generating method of claim 5 , further comprising: receiving a text; and determining an emotional intensity corresponding to the text by analyzing the text, wherein the generating of the new emotion group comprises: determining the nonlinear interpolation ratio based on the emotional intensity. 7 . The emotional speech generating method of claim 4 , wherein the representative weight vector of the neutral emotion group is determined based on an internal distance between the weight vectors included in the neutral emotion group, and an external distance between the weight vectors included in the neutral emotion group and weight vectors included in another emotion group. 8 . The emotional speech generating method of claim 7 , wherein the representative weight vector of the neutral emotion group is a weight vector having a smallest sum of internal distances and a greatest sum of external distances among the weight vectors included in the neutral emotion group. 9 . The emotional speech generating method of claim 4 , further comprising: receiving a text; and determining a text emotion which is an emotion corresponding to the text by analyzing the text, wherein the identifying of the target emotion group comprises: identifying, as the target emotion group, an emotion group representing the text emotion from among the emotion groups. 10 . The emotional speech generating method of claim 4 , wherein the representative weight vector of the new emotion group is a weight vector having a smallest sum of internal distances and a greatest sum of external distances among the weight vectors included in the new emotion group. 11 . A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the emotional speech generating method of claim 1 . 12 . An emotional speech generating apparatus, comprising: an emotion vector generator; and an emotional speech generator, wherein the emotion vector generator is configured to: generate emotion groups by grouping weight vectors representing a same emotion into a same emotion group: identify, from among the emotion groups, a neutral emotion group corresponding to a neutral emotion and a target emotion group corresponding to an emotion to be expressed in an emotional speech; generate a new emotion group with an emotional intensity adjusted from the target emotion group by using a representative weight vector of the neutral emotion group and the target emotion group; determine a representative weight vector of the new emotion group based on an internal distance between weight vectors included in the new emotion group, and an external distance between the weight vectors included in the new emotion group and weight vectors included in the neutral emotion group or the target emotion group; and generate a style embedding by applying the representative weight vector of the new emotion group to a style token including prosodic information for expressing an emotion, and the emotional speech generator is configured to: generate an emotional speech expressing the emotion using the style embedding. 13 . The emotional speech generating apparatus of claim 12 , wherein the emotion vector generator is configured to: generate new weight vectors by interpolating the representative weight vector of the neutral emotion group and the weight vectors included in the target emotion group based on a nonlinear interpolation ratio; and generate the new emotion group by grouping the generated new weight vectors. 14 . The emotional speech generating apparatus of claim 13 , further comprising: an emotion identifier configured to receive a text, and determine an emotional intensity corresponding to the text by analyzing the text, wherein the emotion vector generator is configured to determine the nonlinear interpolation ratio based on the determined emotional intensity. 15 . The emotional speech generating apparatus of claim 12 , wherein the representative weight vector of the neutral emotion group is determined based on an internal distance between the weight vectors included in the neutral emotion group and an external distance between the weight vectors included in the neutral emotion group and weight vectors included in another emotion group.

Assignees

Inventors

Classifications

  • G10L13/08Primary

    Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

  • Semantic analysis · CPC title

  • G10L13/027Primary

    Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title

  • Voice editing, e.g. manipulating the voice of the synthesiser · CPC title

  • for estimating an emotional state · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021090551A1 cover?
An emotional speech generating method and apparatus capable of adjusting an emotional intensity is disclosed. The emotional speech generating method includes generating emotion groups by grouping weight vectors representing a same emotion into a same emotion group, determining an internal distance between weight vectors included in a same emotion group, determining an external distance between …
Who is the assignee on this patent?
Electronics & Telecommunications Res Inst, Univ Yonsei Iacf
What technology area does this patent fall under?
Primary CPC classification G10L13/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Mar 25 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).