Providing text to speech from digital content on an electronic device
US-8990087-B1 · Mar 24, 2015 · US
US9798653B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9798653-B1 |
| Application number | US-77399810-A |
| Country | US |
| Kind code | B1 |
| Filing date | May 5, 2010 |
| Priority date | May 5, 2010 |
| Publication date | Oct 24, 2017 |
| Grant date | Oct 24, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Adapted speech models produce fluent synthesized speech in a voice that sounds as if the speaker were fluent in a language in which the speaker is actually non-fluent. A full speech model is obtained based on fluent speech in the language spoken by a first person who is fluent in the language. A limited set of utterances is obtained in the language spoken by a second person who is non-fluent in the language but able to speak the limited set of utterances in the language. The full speech model of the first person is then processed with the limited set of utterances of the second person to produce an adapted speech model. The adapted speech model may be stored to a multi-lingual speech model as a child node of a root with an associated language selection question and branches pointed to the adapted speech model and other speech models, respectively.
Opening claim text (preview).
What is claimed is: 1. A system comprising: data storage for storing: a full speech model based on speech in a language spoken by a first person who is fluent in the language, a limited set of utterances in a fluent language of a second person based on speech spoken by the second person who is non-fluent in the language spoken by the first person, and a full speech model of the second person based on speech by the second person, and a processor configured to implement: a cross-language speech adapter that processes the full speech model based on speech in the language spoken by the first person and the limited set of utterances in the fluent language of the second person based on speech spoken by the second person who is non-fluent in the language spoken by the first person and outputs an adapted speech model, the processing including applying at least one transformation to the full speech model according to the limited set of utterances to produce the adapted speech model, and a tree combination unit the tree combination unit combining the full speech model of the second person based on speech by the second person and the adapted speech model with Text-to Speech (TTS) engine files of the adapted speech model and the full speech model of the second person, wherein the transformation includes a plurality of: (1) a constrained maximum likelihood linear regression (CMLLR) transformation, (2) a MLLR adaptation of the mean (MLLRMEAN) transformation, (3) a variance MLLR (MLLRVAR) transformation, and (4) a maximum a posteriori (MAP) linear regression transformation. 2. The system of claim 1 further comprising a text-to-speech (TTS) engine. 3. The system of claim 2 wherein the text-to-speech (TTS) engine outputs fluid synthesized speech. 4. The system of claim 3 wherein the text-to-speech (TTS) engine receives a multi-lingual phoneme stream. 5. The system of claim 4 wherein the multi-lingual phoneme stream was transformed from multi-lingual text by a text processor. 6. A method comprising: receiving at an input interface of a computer system having at least a processor and a memory in addition to the input and output interface, a full speech model based on speech in a language spoken by a first person who is fluent in the language; receiving at the input interface, a limited set of utterances in a fluent language of a second person based on speech spoken by the second person who is non-fluent in the language spoken by the first person; applying, in the computer system, a transformation technique with an adaptation module to the full speech model according to the limited set of utterances to produce a plurality of adapted speech models, wherein a cross-language speech adapter processes the full speech model based on speech in the language spoken by the first person and the limited set of utterances in the fluent language of the second person based on speech spoken by the second person who is non-fluent in the language spoken by the first person and outputs an adapted speech model, the processing including applying at least one transformation to the full speech model according to the limited set of utterances to produce the adapted speech model; and synthesizing, in the computer system, speech using each of the plurality of adapted speech models to generate a plurality of synthesized speech samples, wherein the transformation technique includes a plurality of: (1) a constrained maximum likelihood linear regression (CMLLR) transformation, (2) a MLLR adaptation of the mean (MLLRMEAN) transformation, (3) a variance MLLR (MLLRVAR) transformation, and (4) a maximum a posteriori (MAP) linear regression transformation. 7. The method of claim 6 wherein a plurality of speech samples are presented to the adaptation module for selection of one by the plurality of transformations that produced a synthesized speech sample having a voice that most closely resembles the voice of a second person and sounds as if the second person were fluent in the language.
Handling natural language data (speech analysis or synthesis, speech recognition G10L) · CPC title
Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation · CPC title
Detection of language · CPC title
Accessing, addressing or allocating within memory systems or architectures (digital input from, or digital output to record carriers, e.g. to disk storage units, G06F3/06) · CPC title
Speech synthesis; Text to speech systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.