Speech synthesis device, speech synthesis method, and speech synthesis program

US9520125B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9520125-B2
Application numberUS-201214131409-A
CountryUS
Kind codeB2
Filing dateJun 8, 2012
Priority dateJul 11, 2011
Publication dateDec 13, 2016
Grant dateDec 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There are provided a speech synthesis device, a speech synthesis method and a speech synthesis program which can represent a phoneme as a duration shorter than a duration upon modeling according to a statistical method. A speech synthesis device 80 according to the present invention includes a phoneme boundary updating means 81 which, by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updates a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme.

First claim

Opening claim text (preview).

The invention claimed is: 1. A speech synthesis device comprising: hardware including a processor, wherein the processor is configured to: by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, update a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and calculate a duration of each phoneme based on the updated phoneme boundary position, and generate synthesized speech based on the calculated duration of phoneme, wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, the processor is configured to determine whether a state before and after the phoneme boundary indicated a voiced state or an unvoiced state by using the voiced utterance likelihood index, and wherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, the processor is configured to update the phoneme boundary position to move in a predetermined direction according to the state. 2. The speech synthesis device according to claim 1 , wherein the processor is further configured to specify whether or not each state which represents the phoneme indicates a voiced state or an unvoiced state, and, when one of the neighboring phonemes indicates the unvoiced sound and other one of the phonemes indicates a voiced sound, determine a moving direction of a phoneme boundary position according to a rule set in advance based on a relationship between the voiced state and the unvoiced state. 3. The speech synthesis device according to claim 2 , wherein the processor is further configured to specify as the voiced state a state which represents a phoneme when the voiced utterance likelihood index exceeds a threshold set in advance, and specify as the unvoiced state a state which represents a phoneme when the voiced utterance likelihood index is the threshold set in advance or less. 4. The speech synthesis device according to claim 1 , wherein the processor is further configured to update the phoneme boundary position based on a difference between voiced utterance likelihood indices of neighboring states. 5. The speech synthesis device according to claim 4 , wherein, when the difference between the voiced utterance likelihood index of one of the neighboring states and the voiced utterance likelihood index of the other state exceeds the threshold set in advance, the processor is further configured to determine as the phoneme boundary position a position between the one state and the other state. 6. The speech synthesis device according to claim 1 , wherein the processor is further configured to calculate a duration of the phoneme based on the updated phoneme boundary position. 7. The speech synthesis device according to claim 1 , wherein the processor is further configured to update the phoneme boundary position in units of a length corresponding to a width of a state. 8. The speech synthesis device according to claim 1 , wherein the processor is further configured to determine whether or not the voiced utterance likelihood index of each state is adequate and change the voiced utterance likelihood index which is determined to be inadequate to an adequate value. 9. The speech synthesis device according to claim 8 , wherein, when voiced utterance likelihood determination information which is a result of determining the voiced state or the unvoiced state based on the voiced utterance likelihood index is switched two or more times in one phoneme or when the voiced utterance likelihood determination information of a target phoneme indicates information different from phonetic piece information which is information set in advance as information indicating a property of the phoneme, the processor is further configured to determine that the voiced utterance likelihood index is inadequate. 10. A speech synthesis method comprising, by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updating a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and calculating a duration of each phoneme based on the updated phoneme boundary position, and generating synthesized speech based on the calculated duration of phoneme, wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, determining whether a state before and after the phoneme boundary indicates a voiced state or an unvoiced state by using the voiced utterance likelihood index, and wherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, updating the phoneme boundary position to move in a predetermined direction according to the state. 11. The speech synthesis method according to claim 10 , further comprising specifying whether or not each state which represents the phoneme indicates a voiced state or an unvoiced state, and, when one of the neighboring phonemes indicates the unvoiced sound and other one of the phonemes indicates a voiced sound, determining a moving direction of a phoneme boundary position according to a rule set in advance based on a relationship between the voiced state and the unvoiced state. 12. The speech synthesis method according to claim 10 , further comprising updating the phoneme boundary position based on a difference between voiced utterance likelihood indices of neighboring states. 13. A non-transitory computer readable information recording medium storing a speech synthesis program that, when executed by a processor, performs a method for: by using a voiced utterance likelihood index which is an index indicating a degree of voiced utterance likelihood of each state which represents a phoneme modeled by a statistical method, updating a phoneme boundary position which is a boundary with other phonemes neighboring to the phoneme; and calculating a duration of each phoneme based on the updated phoneme boundary position, and generating synthesized speech based on the calculated duration of phoneme, wherein, when a phoneme before and after a phoneme boundary is an unvoiced sound and a voiced sound, determining whether a state before and after the phoneme boundary indicates a voiced state or an unvoiced state by using the voiced utterance likelihood index, and wherein, when the state before and after the phoneme boundary are both determined as voiced state or an unvoiced state, updating the phoneme boundary position to move in a predetermined direction according to the state. 14. The non-transitory computer readable information recording medium according to claim 13 , specifying whether or not each state which represents the phoneme indicates a voiced state or an unvoiced state, and, when one of the neighboring phonemes indicates the unvoiced sound and other one of the phonemes indicates a voiced sound, determining a moving direction of a phoneme boundary position according to a rule set in advance based on a relationship between the voiced state and the unvoiced state. 15. The non-transitory computer readable information recording medium according to claim 13 , further comprising updating the phoneme boundary position based on a difference between voiced utterance likelihood indices of neighboring states.

Assignees

Inventors

Classifications

  • Duration · CPC title

  • G10L13/08Primary

    Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title

  • G10L15/08Primary

    Speech classification or search · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9520125B2 cover?
There are provided a speech synthesis device, a speech synthesis method and a speech synthesis program which can represent a phoneme as a duration shorter than a duration upon modeling according to a statistical method. A speech synthesis device 80 according to the present invention includes a phoneme boundary updating means 81 which, by using a voiced utterance likelihood index which is an…
Who is the assignee on this patent?
Mitsui Yasuyuki, Kato Masanori, Kondo Reishi, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L13/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).