Voice conversion method and system

US8930183B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-8930183-B2
Application numberUS-201113217628-A
CountryUS
Kind codeB2
Filing dateAug 25, 2011
Priority dateMar 29, 2011
Publication dateJan 6, 2015
Grant dateJan 6, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of converting speech from the characteristics of a first voice to the characteristics of a second voice, the method comprising: receiving a speech input from a first voice, dividing said speech input into a plurality of frames; mapping the speech from the first voice to a second voice; and outputting the speech in the second voice, wherein mapping the speech from the first voice to the second voice comprises, deriving kernels demonstrating the similarity between speech features derived from the frames of the speech input from the first voice and stored frames of training data for said first voice, the training data corresponding to different text to that of the speech input and wherein the mapping step uses a plurality of kernels derived for each frame of input speech with a plurality of stored frames of training data of the first voice.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method of converting speech from the characteristics of a first voice to the characteristics of a second voice, the method comprising: receiving a speech input from a first voice, dividing said speech input into a plurality of frames; in a processor, mapping the speech from the first voice to a second voice using a Gaussian process; and outputting the speech in the second voice, wherein mapping the speech from the first voice to the second voice comprises, deriving kernels demonstrating the similarity between speech features derived from the frames of the speech input from the first voice and stored frames of training data for said first voice, the training data corresponding to different text to that of the speech input and wherein the mapping step uses a plurality of kernels derived for each frame of input speech with a plurality of stored frames of training data of the first voice and using said plurality of kernels to define a non-parametric Gaussian process prior for said mapping. 2. A method according to claim 1 , wherein kernels are derived for both static and dynamic speech features. 3. A method according to claim 1 , wherein the speech to be output is determined according to a Gaussian Process predictive distribution: p ( y t |x t ,x*,y *, )= (μ( x t ),Σ( x t )), where y t is the speech vector for frame t to be output, x t is the speech vector for the input speech for frame t, x*, y* is {x 1 *, y 1 *}, . . . , {x N *, y N *}, where x t * is the t-th frame of training data for the first voice and y t * is the t-th frame of training data for the second voice, M denotes the model, μ(x t ) and Σ(x t ) are the mean and variance of the predictive distribution for given x t . 4. A method according to claim 3 , wherein μ ⁡ ( x t ) = m ⁡ ( x t ) + k t T ⁡ [ K * + σ 2 ⁢ I ] - 1 ⁢ ( y * - μ * ) , ⁢ ∑ ( x t ) = k ⁡ ( x t , x t ) + σ 2 - k t T ⁢ { K * + σ 2 ⁢ I ] - 1 ⁢ k t , ⁢ where μ * = [ m ⁡ ( x 1 * ) ⁢ ⁢ m ⁡ ( x 2 * ) ⁢ ⁢ … ⁢ ⁢ m ⁡ ( x N * ) ] T

Assignees

Inventors

Classifications

  • characterised by the process used · CPC title

  • Voice editing, e.g. manipulating the voice of the synthesiser · CPC title

  • Voice conversion or morphing · CPC title

  • Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility (G10L19/00 takes precedence) · CPC title

  • Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US8930183B2 cover?
A method of converting speech from the characteristics of a first voice to the characteristics of a second voice, the method comprising: receiving a speech input from a first voice, dividing said speech input into a plurality of frames; mapping the speech from the first voice to a second voice; and outputting the speech in the second voice, wherein mapping the speech from the fir…
Who is the assignee on this patent?
Chun Byung Ha, Gales Mark John Francis, Toshiba Kk
What technology area does this patent fall under?
Primary CPC classification G10L21/003. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 06 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).