Prosody generator, speech synthesizer, prosody generating method and prosody generating program

US9324316B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9324316-B2
Application numberUS-201214004148-A
CountryUS
Kind codeB2
Filing dateMay 10, 2012
Priority dateMay 30, 2011
Publication dateApr 26, 2016
Grant dateApr 26, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means 81 divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means 82 extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means 81 . A prosody information generating method selecting means 83 selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.

First claim

Opening claim text (preview).

The invention claimed is: 1. A prosody generator, comprising: a data dividing unit implemented at least by a hardware including a processor and which divides into subspaces the data space of a learning database as an assembly of learning data indicative of feature quantities of speech waveforms; a density information extracting unit implemented at least by a hardware including a processor and which extracts density information indicative of a density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing unit, a prosody information generating method selecting unit implemented at least by a hardware including a processor and which selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics, wherein the prosody information generating method selecting unit selects the second method when the density information indicates the density state is sparse; and an output unit which outputs a generated synthetic speech based on the prosody information. 2. The prosody generator according to claim 1 , further comprising: a prosody generation model preparing unit implemented at least by a hardware including a processor and which prepares a prosody generation model representative of relations between speech and the prosody information by use of a learning database used to generate the density information. 3. The prosody generator according to claim 1 , wherein the prosody information generating method selecting unit selects either the first method or the second method in accordance with a condition prepared on a basis of the density information. 4. The prosody generator according to claim 1 , wherein the density information extracting unit extracts the density information using as the feature quantities a number of morae or accent positions in accent phrases. 5. The prosody generator according to claim 1 , wherein the density information extracting unit obtains variances of the feature quantities indicated by the learning data as the density information. 6. The prosody generator according to claim 1 , wherein the prosody information includes information that designates a sound pitch and a tempo of a synthesized speech. 7. The prosody generator according to claim 1 , wherein the prosody information includes a time change of a fundamental frequency as a feature quantity representative of prosody. 8. The prosody generator according to claim 1 , wherein the density information extracting unit determines the density state based on linguistic information including at least one of mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence. 9. The prosody generator according to claim 1 , wherein the density information extracting unit determines the density state based on linguistic information including mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence. 10. A speech synthesizer, comprising: a data dividing unit implemented at least by a hardware including a processor and which divides into subspaces the data space of a learning database as an assembly of learning data indicative of feature quantities of speech waveforms; a density information extracting unit implemented at least by a hardware including a processor and which extracts density information indicative of a density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing unit; a prosody information generating method selecting unit implemented at least by a hardware including a processor and which selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics; a prosody generating unit implemented at least by a hardware including a processor and which generates the prosody information by the prosody information generating method selected by the prosody information generating method selecting unit; a waveform generating unit implemented at least by a hardware including a processor and which generates a speech waveform using the prosody information, wherein the prosody information generating method selecting unit selects the second method when the density information indicates the density state is sparse; and an output unit which outputs a generated synthetic speech based on the speech waveform using the prosody information. 11. The speech synthesizer according to claim 10 , wherein the prosody information includes information that designates a sound pitch and a tempo of a synthesized speech. 12. The speech synthesizer according to claim 10 , wherein the prosody information includes a time change of a fundamental frequency as a feature quantity representative of prosody. 13. The speech synthesizer according to claim 10 , wherein the density information extracting unit determines the density state based on linguistic information including mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence. 14. A prosody generating method, implemented by a processor, the method comprising: dividing into subspaces the data space of a learning database as an assembly of learning data indicative of feature quantities of speech waveforms; extracting density information indicative of a density state in terms of information quantity of the learning data in each of the subspaces obtained by the division selecting either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics; in the selecting either the first method or the second method, selecting the second method when the density information indicates the density state is sparse; and outputting a generated synthetic speech based on the prosody information. 15. The prosody generating method according to claim 14 , wherein the prosody information includes information that designates a sound pitch and a tempo of a synthesized speech. 16. The prosody generating method according to claim 14 , wherein the prosody information includes a time change of a fundamental frequency as a feature quantity representative of prosody. 17. The prosody generating method according to claim 14 , wherein, in the extracting density information, the density state is determined based on linguistic information including mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence.

Assignees

Inventors

Classifications

  • G10L13/10Primary

    Prosody rules derived from text; Stress or intonation · CPC title

  • G10L13/027Primary

    Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9324316B2 cover?
There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means 81 divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extract…
Who is the assignee on this patent?
Mitsui Yasuyuki, Kondo Reishi, Kato Masanori, and 1 more
What technology area does this patent fall under?
Primary CPC classification G10L13/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 26 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).