Data processing method, and storage medium and electronic device thereof
US-2024339107-A1 · Oct 10, 2024 · US
US9324316B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9324316-B2 |
| Application number | US-201214004148-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 10, 2012 |
| Priority date | May 30, 2011 |
| Publication date | Apr 26, 2016 |
| Grant date | Apr 26, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
There is provided a prosody generator that generates prosody information for implementing highly natural speech synthesis without unnecessarily collecting large quantities of learning data. A data dividing means 81 divides into subspaces the data space of a learning database as an assembly of learning data indicative of the feature quantities of speech waveforms. A density information extracting means 82 extracts density information indicative of the density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing means 81 . A prosody information generating method selecting means 83 selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics.
Opening claim text (preview).
The invention claimed is: 1. A prosody generator, comprising: a data dividing unit implemented at least by a hardware including a processor and which divides into subspaces the data space of a learning database as an assembly of learning data indicative of feature quantities of speech waveforms; a density information extracting unit implemented at least by a hardware including a processor and which extracts density information indicative of a density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing unit, a prosody information generating method selecting unit implemented at least by a hardware including a processor and which selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics, wherein the prosody information generating method selecting unit selects the second method when the density information indicates the density state is sparse; and an output unit which outputs a generated synthetic speech based on the prosody information. 2. The prosody generator according to claim 1 , further comprising: a prosody generation model preparing unit implemented at least by a hardware including a processor and which prepares a prosody generation model representative of relations between speech and the prosody information by use of a learning database used to generate the density information. 3. The prosody generator according to claim 1 , wherein the prosody information generating method selecting unit selects either the first method or the second method in accordance with a condition prepared on a basis of the density information. 4. The prosody generator according to claim 1 , wherein the density information extracting unit extracts the density information using as the feature quantities a number of morae or accent positions in accent phrases. 5. The prosody generator according to claim 1 , wherein the density information extracting unit obtains variances of the feature quantities indicated by the learning data as the density information. 6. The prosody generator according to claim 1 , wherein the prosody information includes information that designates a sound pitch and a tempo of a synthesized speech. 7. The prosody generator according to claim 1 , wherein the prosody information includes a time change of a fundamental frequency as a feature quantity representative of prosody. 8. The prosody generator according to claim 1 , wherein the density information extracting unit determines the density state based on linguistic information including at least one of mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence. 9. The prosody generator according to claim 1 , wherein the density information extracting unit determines the density state based on linguistic information including mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence. 10. A speech synthesizer, comprising: a data dividing unit implemented at least by a hardware including a processor and which divides into subspaces the data space of a learning database as an assembly of learning data indicative of feature quantities of speech waveforms; a density information extracting unit implemented at least by a hardware including a processor and which extracts density information indicative of a density state in terms of information quantity of the learning data in each of the subspaces divided by the data dividing unit; a prosody information generating method selecting unit implemented at least by a hardware including a processor and which selects either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics; a prosody generating unit implemented at least by a hardware including a processor and which generates the prosody information by the prosody information generating method selected by the prosody information generating method selecting unit; a waveform generating unit implemented at least by a hardware including a processor and which generates a speech waveform using the prosody information, wherein the prosody information generating method selecting unit selects the second method when the density information indicates the density state is sparse; and an output unit which outputs a generated synthetic speech based on the speech waveform using the prosody information. 11. The speech synthesizer according to claim 10 , wherein the prosody information includes information that designates a sound pitch and a tempo of a synthesized speech. 12. The speech synthesizer according to claim 10 , wherein the prosody information includes a time change of a fundamental frequency as a feature quantity representative of prosody. 13. The speech synthesizer according to claim 10 , wherein the density information extracting unit determines the density state based on linguistic information including mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence. 14. A prosody generating method, implemented by a processor, the method comprising: dividing into subspaces the data space of a learning database as an assembly of learning data indicative of feature quantities of speech waveforms; extracting density information indicative of a density state in terms of information quantity of the learning data in each of the subspaces obtained by the division selecting either a first method or a second method as a prosody information generating method based on the density information, the first method involving generating the prosody information using a statistical technique, the second method involving generating the prosody information using rules based on heuristics; in the selecting either the first method or the second method, selecting the second method when the density information indicates the density state is sparse; and outputting a generated synthetic speech based on the prosody information. 15. The prosody generating method according to claim 14 , wherein the prosody information includes information that designates a sound pitch and a tempo of a synthesized speech. 16. The prosody generating method according to claim 14 , wherein the prosody information includes a time change of a fundamental frequency as a feature quantity representative of prosody. 17. The prosody generating method according to claim 14 , wherein, in the extracting density information, the density state is determined based on linguistic information including mora counts of accent phrases, relative positions of accent nuclei, and distinction of whether a given sentence is an interrogative sentence.
Prosody rules derived from text; Stress or intonation · CPC title
Concept to speech synthesisers; Generation of natural phrases from machine-based concepts (generation of parameters for speech synthesis out of text G10L13/08) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.