Synthesizing speech from text using neural networks
US-2020051583-A1 · Feb 13, 2020 · US
US11636283B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11636283-B2 |
| Application number | US-202016889125-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 1, 2020 |
| Priority date | Sep 27, 2018 |
| Publication date | Apr 25, 2023 |
| Grant date | Apr 25, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A variational autoencoder (VAE) neural network system, comprising an encoder neural network to encode an input data item to define a posterior distribution for a set of latent variables, and a decoder neural network to generate an output data item representing values of a set of latent variables sampled from the posterior distribution. The system is configured for training with an objective function including a term dependent on a difference between the posterior distribution and a prior distribution. The prior and posterior distributions are arranged so that they cannot be matched to one another. The VAE system may be used for compressing and decompressing data.
Opening claim text (preview).
The invention claimed is: 1. A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a variational autoencoder neural network system, the variational autoencoder neural network system comprising: a subsystem configured to determine respective values of a set of latent variables for each of a plurality of time steps; and a decoder neural network configured to receive the respective values of the set of latent variables for the time steps and to generate an output data item representing the respective values of the set of latent variables, wherein the output data item comprises a sequence of output data item values that each correspond to a respective one of the plurality of time steps, and wherein the decoder neural network is an autoregressive neural network that is configured to generate each output data item value in the sequence conditional upon (i) any previously generated output data item values in the sequence and (ii) only the values of the set of latent variables for time steps that correspond to output data item values that have yet to be generated. 2. The system of claim 1 , wherein the variational autoencoder neural network system further comprises: an encoder neural network configured to encode an input data item to determine a sequence comprising a respective set of parameters for each of a plurality of time steps that defines a respective posterior distribution for the set of latent variables for the time step, wherein determining respective values of a set of latent variables for each of the plurality of time steps comprises sampling from the respective posterior distributions for the sets of latent variables for the time steps. 3. The system of claim 2 , wherein the variational autoencoder neural network system is configured for training with an objective function which has a first term dependent upon a difference between the input data item and the output data item and a second term dependent upon a difference between the respective posterior distributions and a set of second, prior distributions of the set of latent variables, and wherein a structure of the prior distributions is different to a structure of the posterior distributions such that the posterior distributions cannot be matched to the prior distributions. 4. The system of claim 3 wherein each posterior distribution and each prior distribution each comprise a multivariate Gaussian distribution and wherein a variance of the posterior distributions is a factor of a different to a variance of the prior distributions, where α≠1. 5. The system of claim 3 wherein the prior distribution comprises an autoregressive distribution such that at each time step the prior distribution depends on the prior distribution at a previous time step. 6. The system of claim 5 wherein the respective prior distribution at each time step t is defined by a sum of a times the values of the set of latent variables at a previous time step t−1 and a noise component, where |α|<1. 7. The system as claimed in claim 1 wherein determining respective values of a set of latent variables for each of a plurality of time steps comprises, for each time step, sampling the values of the set of latent variables for the time step from a prior distribution for the time step. 8. The system of claim 1 , wherein determining respective values of a set of latent variables for each of a plurality of time steps comprises, for each time step, sampling the values of the set of latent variables for the time step from an auxiliary prior distribution for the time step generated by an auxiliary neural network. 9. The system of claim 8 , wherein the auxiliary neural network has been trained jointly with the decoder neural network and an encoder neural network. 10. One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one or more computers to implement a variational autoencoder neural network system, the variational autoencoder neural network system comprising: a subsystem configured to determine respective values of a set of latent variables for each of a plurality of time steps; and a decoder neural network configured to receive the respective values of the set of latent variables for the time steps and to generate an output data item representing the values of the set of latent variables, wherein the output data item comprises a sequence of output data item values that each correspond to a respective one of the plurality of time steps, and wherein the decoder neural network is an autoregressive neural network that is configured to generate each output data item value in the sequence conditional upon (i) any previously generated output data item values in the sequence and (ii) only the values of the set of latent variables for time steps that correspond to output data item values that have yet to be generated. 11. The non-transitory computer-readable storage media of claim 10 , wherein the variational autoencoder neural network system further comprises: an encoder neural network configured to encode an input data item to determine a sequence comprising a respective set of parameters for each of a plurality of time steps that defines a respective posterior distribution for the set of latent variables for the time step, wherein determining respective values of a set of latent variables for each of the plurality of time steps comprises sampling from the respective posterior distributions for the sets of latent variables for the time steps. 12. A method performed by one or more computers, the method comprising: determining respective values of a set of latent variables for each of a plurality of time steps; and processing the respective values of the set of latent variables for each of the plurality of time steps using a decoder neural network configured to receive the respective values of the set of latent variables for the time steps and to generate an output data item representing the values of the set of latent variables, wherein the output data item comprises a sequence of output data item values that each correspond to a respective one of the plurality of time steps, and wherein the decoder neural network is an autoregressive neural network that is configured to generate each output data item value in the sequence conditional upon (i) any previously generated output data item values in the sequence and (ii) only the values of the set of latent variables for time steps that correspond to output data item values that have yet to be generated. 13. The method of claim 12 , wherein determining respective values of a set of latent variables for each of the plurality of time steps comprises: processing an input data item using an encoder neural network configured to encode the input data item to determine a sequence comprising a respective set of parameters for each of a plurality of time steps that defines a respective posterior distribution for the set of latent variables for the time step, and sampling from the respective posterior distributions for the sets of latent variables for the time steps to determine the respective values of the set of latent variables for the time steps. 14. The method of claim 13 , further comprising training the encoder neural network and the decoder neural network to optimize an objective function which has a first term dependent upon a difference between the input data item and the output data item and a second term dependent upon a difference between the respective posterior
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Generative networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.