Language models using spoken language modeling
US-2024386885-A1 · Nov 21, 2024 · US
US9922641B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9922641-B1 |
| Application number | US-201213665390-A |
| Country | US |
| Kind code | B1 |
| Filing date | Oct 31, 2012 |
| Priority date | Oct 1, 2012 |
| Publication date | Mar 20, 2018 |
| Grant date | Mar 20, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The subject matter of the disclosure is embodied in a method that includes receiving input speech data from a speaker in a first language, and estimating, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data. The method also includes accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language. The method further includes modifying the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generating speech data in the second language using the speaker-specific speech model.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving input speech data from a speaker in a first language; estimating, by a processor, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying, by a processor, cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. 2. The method of claim 1 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 3. The method of claim 2 , wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers. 4. The method of claim 1 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 5. The method of claim 4 further comprising training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 6. The method of claim 5 , further comprising estimating the second speaker transform from the speech data of the second speaker. 7. The method of claim 1 , wherein generating the speech in the second language comprises: generating transcription data from the input speech data; translating the transcription data from the first language to the second language; and generating the speech based on the translated data. 8. The method of claim 1 , wherein generating the speech in the second language comprises: accessing text data in the second language; and generating the speech based on the accessed text data. 9. A system comprising: a speech synthesis engine including a processor, the speech synthesis engine configured to: receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. 10. The system of claim 9 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 11. The system of claim 10 , comprising a training engine configured to estimate a plurality of speech parameters of the universal speech model, based on speech from the plurality of speakers. 12. The system of claim 10 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 13. The system of claim 12 comprising a training engine configured to train the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 14. The system of claim 13 , wherein the training engine is configured to estimate the second speaker transform from the speech data of the second speaker. 15. The system of claim 9 comprising: a speech recognition engine configured to generate transcription data from the input speech data; and a translation engine configured to translate the transcription data from the first language to the second language, and provide the translated data to the speech synthesis engine for generating the speech data in the second language. 16. The system of claim 9 wherein the speech synthesis engine is configured to access text data in second language, and generate the speech based on the accessed speech data. 17. A computer program product comprising computer readable instructions encoded on a storage device, the instructions configured to cause one or more processors to: receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. 18. The computer program product of claim 17 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 19. The computer program product of claim 18 , wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers. 20. The computer program product of claim 17 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 21. The computer program product of claim 20 , wherein the computer readable instructions include instructions for training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 22. The computer program product of claim 21 , wherein the computer readable instructions include instructions for estimating the second speaker transform from the speech data of the second speaker. 23. The computer program product of claim 17 , wherein the computer readable instructions include instructions for: generating transcription data from the input speech data; translating the transcription data from the first language to the second language; and generating the speech based on the translated data. 24. The computer program product of claim 17 , wherein the computer readable instructions includes instructions for: accessing text data in the second language; and generating the speech based on the accessed text data. 25. A method comprising: receiving input speech data from a speaker in a first language; estimating, by a processor, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, wherein the speaker transform is one of a linear transform and a non-linear transform; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying the speaker-independent speech model using the estimated speaker transform coefficients to obtain a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. 26. The method of claim 25 , wherein the speaker specific speech model includes a set of adapted coefficients obtained by applying the speaker tra
to the speaker · CPC title
Phonemes, fenemes or fenones being the recognition units · CPC title
Training · CPC title
Feature extraction for speech recognition; Selection of recognition unit · CPC title
Voice conversion or morphing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.