Cross-lingual speaker adaptation for multi-lingual speech synthesis

US9922641B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9922641-B1
Application numberUS-201213665390-A
CountryUS
Kind codeB1
Filing dateOct 31, 2012
Priority dateOct 1, 2012
Publication dateMar 20, 2018
Grant dateMar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The subject matter of the disclosure is embodied in a method that includes receiving input speech data from a speaker in a first language, and estimating, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data. The method also includes accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language. The method further includes modifying the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generating speech data in the second language using the speaker-specific speech model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving input speech data from a speaker in a first language; estimating, by a processor, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying, by a processor, cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. 2. The method of claim 1 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 3. The method of claim 2 , wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers. 4. The method of claim 1 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 5. The method of claim 4 further comprising training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 6. The method of claim 5 , further comprising estimating the second speaker transform from the speech data of the second speaker. 7. The method of claim 1 , wherein generating the speech in the second language comprises: generating transcription data from the input speech data; translating the transcription data from the first language to the second language; and generating the speech based on the translated data. 8. The method of claim 1 , wherein generating the speech in the second language comprises: accessing text data in the second language; and generating the speech based on the accessed text data. 9. A system comprising: a speech synthesis engine including a processor, the speech synthesis engine configured to: receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. 10. The system of claim 9 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 11. The system of claim 10 , comprising a training engine configured to estimate a plurality of speech parameters of the universal speech model, based on speech from the plurality of speakers. 12. The system of claim 10 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 13. The system of claim 12 comprising a training engine configured to train the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 14. The system of claim 13 , wherein the training engine is configured to estimate the second speaker transform from the speech data of the second speaker. 15. The system of claim 9 comprising: a speech recognition engine configured to generate transcription data from the input speech data; and a translation engine configured to translate the transcription data from the first language to the second language, and provide the translated data to the speech synthesis engine for generating the speech data in the second language. 16. The system of claim 9 wherein the speech synthesis engine is configured to access text data in second language, and generate the speech based on the accessed speech data. 17. A computer program product comprising computer readable instructions encoded on a storage device, the instructions configured to cause one or more processors to: receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. 18. The computer program product of claim 17 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 19. The computer program product of claim 18 , wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers. 20. The computer program product of claim 17 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 21. The computer program product of claim 20 , wherein the computer readable instructions include instructions for training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 22. The computer program product of claim 21 , wherein the computer readable instructions include instructions for estimating the second speaker transform from the speech data of the second speaker. 23. The computer program product of claim 17 , wherein the computer readable instructions include instructions for: generating transcription data from the input speech data; translating the transcription data from the first language to the second language; and generating the speech based on the translated data. 24. The computer program product of claim 17 , wherein the computer readable instructions includes instructions for: accessing text data in the second language; and generating the speech based on the accessed text data. 25. A method comprising: receiving input speech data from a speaker in a first language; estimating, by a processor, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, wherein the speaker transform is one of a linear transform and a non-linear transform; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying the speaker-independent speech model using the estimated speaker transform coefficients to obtain a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. 26. The method of claim 25 , wherein the speaker specific speech model includes a set of adapted coefficients obtained by applying the speaker tra

Assignees

Inventors

Classifications

  • to the speaker · CPC title

  • Phonemes, fenemes or fenones being the recognition units · CPC title

  • G10L15/063Primary

    Training · CPC title

  • Feature extraction for speech recognition; Selection of recognition unit · CPC title

  • Voice conversion or morphing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9922641B1 cover?
The subject matter of the disclosure is embodied in a method that includes receiving input speech data from a speaker in a first language, and estimating, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data. The method also includes accessing a speaker-independent speech model for generating speech data in a second la…
Who is the assignee on this patent?
Google Inc, Google Llc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).