Methods and Systems for Voice Conversion

US2016005403A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016005403-A1
Application numberUS-201514631464-A
CountryUS
Kind codeA1
Filing dateFeb 25, 2015
Priority dateJul 3, 2014
Publication dateJan 7, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: receiving, by a device that includes one or more processors, data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice; receiving an input indicative of speech associated with second voice characteristics of a second voice; mapping at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice; comparing, based on the mapping, the first voice characteristics with the second voice characteristics, wherein the comparison includes vocal tract characteristics, nasal cavity characteristics, and voicing characteristics associated with a glottal formant or a spectral tilt between spectral features of a first speech sound of the first voice and corresponding spectral features of a second speech sound of the second voice; determining, based on the comparison, a given representation configured to associate the first voice characteristics with the second voice characteristics; and providing, based on the given representation, an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics of the second voice. 2 . The method of claim 1 , wherein the vocal tract characteristics are associated with one or more of a vocal tract length or a vocal tract shape. 3 . The method of claim 1 , further comprising: determining acoustic feature representations for the one or more speech sounds of the first voice and the at least one portion of the speech of the second voice; and identifying, based on the acoustic feature representations, the first voice characteristics and the second voice characteristics. 4 . The method of claim 3 , wherein the glottal formant or the spectral tilt of the voicing characteristics are indicated by the acoustic feature representations, and wherein the nasal cavity characteristics are associated with spectral nulls indicated by the acoustic feature representations. 5 . The method of claim 1 , further comprising: determining one or more statistical models associated with the one or more speech sounds of the first voice, wherein the one or more statistical models are indicative of the first voice characteristics; and modifying, based on the given representation, the one or more statistical models such that the one or more statistical models are indicative of the second voice characteristics, wherein providing the output is based on the modification of the one or more statistical models. 6 . The method of claim 5 , wherein the one or more statistical models include a Hidden Markov Model (HMM) or a Deep Neural Network (DNN). 7 . The method of claim 1 , further comprising: receiving a speech corpus that includes the plurality of speech sounds associated with the first voice characteristics, wherein receiving the data includes receiving the speech corpus; and modifying, based on the given representation, the speech corpus to include the one or more speech sounds associated with the second voice characteristics, wherein providing the output is based on the modified speech corpus. 8 . The method of claim 1 , further comprising: determining, based on the comparison, a distortion representation that includes a frequency-warping component and a frequency-weighting component, wherein the frequency-warping component is configured to associate given vocal tract characteristics of the first voice with corresponding vocal tract characteristics of the second voice, and wherein the frequency-weighting component is configured to associate given voicing characteristics of the first voice with corresponding voicing characteristics of the second voice, and wherein the frequency-weighting component is also configured to associate given nasal cavity characteristics of the first voice with corresponding nasal cavity characteristics of the second voice, and wherein determining the given representation is based on the distortion representation. 9 . The method of claim 8 , further comprising: modifying the frequency-weighting component based on a smoothing modulation factor. 10 . The method of claim 1 , further comprising: determining the one or more speech sounds from within the plurality of speech sounds of the first voice based on an association between the one or more speech sounds and a linguistic term, wherein the linguistic term includes one or more of a phoneme or text; and determining the at least one portion of the speech of the second voice based on the at least one portion of the speech being associated also with the linguistic term, and wherein the mapping is based on the determination of the one or more speech sounds and the at least one portion of the speech, and wherein a given representation is associated with the linguistic term. 11 . The method of claim 10 , further comprising: determining a first vector that includes representations of the one or more speech sounds of the first voice, and a second vector that includes representations of the at least one portion of the speech of the second voice; and determining a third vector that includes association probabilities between the first vector and the second vector, wherein the mapping is based on the third vector. 12 . The method of claim 11 , wherein determining the mapping comprises a linear regression that includes repeating the determining of the third vector and the mapping based on the third vector until convergence of the linear regression. 13 . A method comprising: receiving, by a device that includes one or more processors, data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice; receiving a request for provision of speech content, wherein the request is indicative of the speech content having second voice characteristics of a second voice; determining, from within the plurality of speech sounds of the first voice, a sequence of speech sounds that corresponds to the speech content indicated by the request; receiving a plurality of representations configured to associate the first voice characteristics with the second voice characteristics, wherein the plurality of representations are indicative of a comparison between the first voice characteristics and the second voice characteristics, and wherein the comparison includes vocal tract characteristics, nasal cavity characteristics, and voicing characteristics associated with a glottal formant or a spectral tilt between spectral features of the first voice characteristics and corresponding spectral features of the second voice characteristics; modifying, based on the plurality of representations, the sequence of speech sounds of the first voice to have the second voice characteristics of the second voice; and providing, based on the modification, the speech content having the second voice characteristics of the second voice. 14 . The method of claim 13 , further comprising: determining, from within a plurality of linguistic terms, a sequence of linguistic terms that corresponds to the speech content indicated by the request, wherein the plurality of speech sounds of the first voice are associated with the plurality of linguistic terms, and wherein the plurality of linguistic terms include one or more of a phoneme or text, and wherein receiving the plurality of representations is based on an association between the plurality of linguistic terms and the plurality of representations. 15 . The method of claim 13 , wherein the vocal tract characteristics are associated with

Assignees

Inventors

Classifications

  • Voice conversion or morphing · CPC title

  • for modelling vocal tract parameters · CPC title

  • Decision making techniques; Pattern matching strategies · CPC title

  • G10L15/07Primary

    to the speaker · CPC title

  • G10L17/00Primary

    Speaker identification or verification techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016005403A1 cover?
A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/07. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 07 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).