What technology area does this patent fall under?

Primary CPC classification G10L15/063. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Cross-lingual speaker adaptation for multi-lingual speech synthesis

US9922641B1 · US · B1

Patent metadata
Field	Value
Publication number	US-9922641-B1
Application number	US-201213665390-A
Country	US
Kind code	B1
Filing date	Oct 31, 2012
Priority date	Oct 1, 2012
Publication date	Mar 20, 2018
Grant date	Mar 20, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The subject matter of the disclosure is embodied in a method that includes receiving input speech data from a speaker in a first language, and estimating, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data. The method also includes accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language. The method further includes modifying the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generating speech data in the second language using the speaker-specific speech model.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving input speech data from a speaker in a first language; estimating, by a processor, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying, by a processor, cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. 2. The method of claim 1 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 3. The method of claim 2 , wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers. 4. The method of claim 1 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 5. The method of claim 4 further comprising training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 6. The method of claim 5 , further comprising estimating the second speaker transform from the speech data of the second speaker. 7. The method of claim 1 , wherein generating the speech in the second language comprises: generating transcription data from the input speech data; translating the transcription data from the first language to the second language; and generating the speech based on the translated data. 8. The method of claim 1 , wherein generating the speech in the second language comprises: accessing text data in the second language; and generating the speech based on the accessed text data. 9. A system comprising: a speech synthesis engine including a processor, the speech synthesis engine configured to: receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify the speaker-independent speech model using the speaker transform to obtain a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. 10. The system of claim 9 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 11. The system of claim 10 , comprising a training engine configured to estimate a plurality of speech parameters of the universal speech model, based on speech from the plurality of speakers. 12. The system of claim 10 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 13. The system of claim 12 comprising a training engine configured to train the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 14. The system of claim 13 , wherein the training engine is configured to estimate the second speaker transform from the speech data of the second speaker. 15. The system of claim 9 comprising: a speech recognition engine configured to generate transcription data from the input speech data; and a translation engine configured to translate the transcription data from the first language to the second language, and provide the translated data to the speech synthesis engine for generating the speech data in the second language. 16. The system of claim 9 wherein the speech synthesis engine is configured to access text data in second language, and generate the speech based on the accessed speech data. 17. A computer program product comprising computer readable instructions encoded on a storage device, the instructions configured to cause one or more processors to: receive input speech data from a speaker in a first language, estimate, based on a universal speech model, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, access a speaker-independent speech model for generating speech data in a second language that is different from the first language, modify cepstral coefficients of the speaker-independent speech model using the estimated speaker transform coefficients to obtain cepstral coefficients of a speaker-specific speech model, and generate speech data in the second language using the speaker-specific speech model. 18. The computer program product of claim 17 , wherein the universal speech model includes a Gaussian mixture model that represents a plurality of speakers speaking one or more languages. 19. The computer program product of claim 18 , wherein the universal speech model includes a plurality of speech parameters estimated based on speech from the plurality of speakers. 20. The computer program product of claim 17 , wherein the speaker-independent speech model includes a plurality of hidden Markov models (HMMs). 21. The computer program product of claim 20 , wherein the computer readable instructions include instructions for training the plurality of HMMs by normalizing speech data from a second speaker speaking the second language, and by using a second speaker transform that represents speaker characteristics of the second speaker. 22. The computer program product of claim 21 , wherein the computer readable instructions include instructions for estimating the second speaker transform from the speech data of the second speaker. 23. The computer program product of claim 17 , wherein the computer readable instructions include instructions for: generating transcription data from the input speech data; translating the transcription data from the first language to the second language; and generating the speech based on the translated data. 24. The computer program product of claim 17 , wherein the computer readable instructions includes instructions for: accessing text data in the second language; and generating the speech based on the accessed text data. 25. A method comprising: receiving input speech data from a speaker in a first language; estimating, by a processor, a set of speaker transform coefficients representing speaker characteristics associated with the input speech data, wherein the speaker transform is one of a linear transform and a non-linear transform; accessing a speaker-independent speech model for generating speech data in a second language that is different from the first language; modifying the speaker-independent speech model using the estimated speaker transform coefficients to obtain a speaker-specific speech model; and generating speech data in the second language using the speaker-specific speech model. 26. The method of claim 25 , wherein the speaker specific speech model includes a set of adapted coefficients obtained by applying the speaker tra

Assignees

Inventors

Chun Byung Ha

Classifications

G10L15/07
to the speaker · CPC title
G10L2015/025
Phonemes, fenemes or fenones being the recognition units · CPC title
G10L15/063Primary
Training · CPC title
G10L15/02
Feature extraction for speech recognition; Selection of recognition unit · CPC title
G10L2021/0135
Voice conversion or morphing · CPC title

Patent family

Related publications grouped by family.

View patent family 61600120

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9922641B1 cover?: The subject matter of the disclosure is embodied in a method that includes receiving input speech data from a speaker in a first language, and estimating, based on a universal speech model, a speaker transform representing speaker characteristics associated with the input speech data. The method also includes accessing a speaker-independent speech model for generating speech data in a second la…
Who is the assignee on this patent?: Google Inc, Google Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 20 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).