Method and System for Non-Parametric Voice Conversion
US-2015127350-A1 · May 7, 2015 · US
US2016005403A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016005403-A1 |
| Application number | US-201514631464-A |
| Country | US |
| Kind code | A1 |
| Filing date | Feb 25, 2015 |
| Priority date | Jul 3, 2014 |
| Publication date | Jan 7, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A device may receive data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice. The device may receive an input indicative of speech associated with second voice characteristics of a second voice. The device may map at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice. The device may compare the first voice characteristics with the second voice characteristics based on the map. The comparison may include vocal tract characteristics, nasal cavity characteristics, and voicing characteristics. The device may determine a given representation configured to associate the first voice characteristics with the second voice characteristics. The device may provide an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics based on the given representation.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: receiving, by a device that includes one or more processors, data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice; receiving an input indicative of speech associated with second voice characteristics of a second voice; mapping at least one portion of the speech of the second voice to one or more speech sounds of the plurality of speech sounds of the first voice; comparing, based on the mapping, the first voice characteristics with the second voice characteristics, wherein the comparison includes vocal tract characteristics, nasal cavity characteristics, and voicing characteristics associated with a glottal formant or a spectral tilt between spectral features of a first speech sound of the first voice and corresponding spectral features of a second speech sound of the second voice; determining, based on the comparison, a given representation configured to associate the first voice characteristics with the second voice characteristics; and providing, based on the given representation, an output indicative of pronunciations of the one or more speech sounds of the first voice according to the second voice characteristics of the second voice. 2 . The method of claim 1 , wherein the vocal tract characteristics are associated with one or more of a vocal tract length or a vocal tract shape. 3 . The method of claim 1 , further comprising: determining acoustic feature representations for the one or more speech sounds of the first voice and the at least one portion of the speech of the second voice; and identifying, based on the acoustic feature representations, the first voice characteristics and the second voice characteristics. 4 . The method of claim 3 , wherein the glottal formant or the spectral tilt of the voicing characteristics are indicated by the acoustic feature representations, and wherein the nasal cavity characteristics are associated with spectral nulls indicated by the acoustic feature representations. 5 . The method of claim 1 , further comprising: determining one or more statistical models associated with the one or more speech sounds of the first voice, wherein the one or more statistical models are indicative of the first voice characteristics; and modifying, based on the given representation, the one or more statistical models such that the one or more statistical models are indicative of the second voice characteristics, wherein providing the output is based on the modification of the one or more statistical models. 6 . The method of claim 5 , wherein the one or more statistical models include a Hidden Markov Model (HMM) or a Deep Neural Network (DNN). 7 . The method of claim 1 , further comprising: receiving a speech corpus that includes the plurality of speech sounds associated with the first voice characteristics, wherein receiving the data includes receiving the speech corpus; and modifying, based on the given representation, the speech corpus to include the one or more speech sounds associated with the second voice characteristics, wherein providing the output is based on the modified speech corpus. 8 . The method of claim 1 , further comprising: determining, based on the comparison, a distortion representation that includes a frequency-warping component and a frequency-weighting component, wherein the frequency-warping component is configured to associate given vocal tract characteristics of the first voice with corresponding vocal tract characteristics of the second voice, and wherein the frequency-weighting component is configured to associate given voicing characteristics of the first voice with corresponding voicing characteristics of the second voice, and wherein the frequency-weighting component is also configured to associate given nasal cavity characteristics of the first voice with corresponding nasal cavity characteristics of the second voice, and wherein determining the given representation is based on the distortion representation. 9 . The method of claim 8 , further comprising: modifying the frequency-weighting component based on a smoothing modulation factor. 10 . The method of claim 1 , further comprising: determining the one or more speech sounds from within the plurality of speech sounds of the first voice based on an association between the one or more speech sounds and a linguistic term, wherein the linguistic term includes one or more of a phoneme or text; and determining the at least one portion of the speech of the second voice based on the at least one portion of the speech being associated also with the linguistic term, and wherein the mapping is based on the determination of the one or more speech sounds and the at least one portion of the speech, and wherein a given representation is associated with the linguistic term. 11 . The method of claim 10 , further comprising: determining a first vector that includes representations of the one or more speech sounds of the first voice, and a second vector that includes representations of the at least one portion of the speech of the second voice; and determining a third vector that includes association probabilities between the first vector and the second vector, wherein the mapping is based on the third vector. 12 . The method of claim 11 , wherein determining the mapping comprises a linear regression that includes repeating the determining of the third vector and the mapping based on the third vector until convergence of the linear regression. 13 . A method comprising: receiving, by a device that includes one or more processors, data indicative of a plurality of speech sounds associated with first voice characteristics of a first voice; receiving a request for provision of speech content, wherein the request is indicative of the speech content having second voice characteristics of a second voice; determining, from within the plurality of speech sounds of the first voice, a sequence of speech sounds that corresponds to the speech content indicated by the request; receiving a plurality of representations configured to associate the first voice characteristics with the second voice characteristics, wherein the plurality of representations are indicative of a comparison between the first voice characteristics and the second voice characteristics, and wherein the comparison includes vocal tract characteristics, nasal cavity characteristics, and voicing characteristics associated with a glottal formant or a spectral tilt between spectral features of the first voice characteristics and corresponding spectral features of the second voice characteristics; modifying, based on the plurality of representations, the sequence of speech sounds of the first voice to have the second voice characteristics of the second voice; and providing, based on the modification, the speech content having the second voice characteristics of the second voice. 14 . The method of claim 13 , further comprising: determining, from within a plurality of linguistic terms, a sequence of linguistic terms that corresponds to the speech content indicated by the request, wherein the plurality of speech sounds of the first voice are associated with the plurality of linguistic terms, and wherein the plurality of linguistic terms include one or more of a phoneme or text, and wherein receiving the plurality of representations is based on an association between the plurality of linguistic terms and the plurality of representations. 15 . The method of claim 13 , wherein the vocal tract characteristics are associated with
Related publications grouped by family.
Answers are generated from the same data shown on this page.