Factorized variational autoencoders
US-2019026631-A1 · Jan 24, 2019 · US
US10679129B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10679129-B2 |
| Application number | US-201816124977-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 7, 2018 |
| Priority date | Sep 28, 2017 |
| Publication date | Jun 9, 2020 |
| Grant date | Jun 9, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Computer systems and methods generate a stochastic categorical autoencoder learning network (SCAN). The SCAN is trained to have an encoder network that outputs, subject to one or more constraints, parameters for parametric probability distributions of sample random variables from input data. The parameters comprise measures of central tendency and measures of dispersion. The one or more constraints comprise a first constraint that constrains a measure of a magnitude of a vector of the measures of central tendency as compared to a measure of a magnitude of a vector of the measures of dispersion. Thereafter, the sample random variables are generated from the parameters and a decoder is trained to output the input data from the sample random variables.
Opening claim text (preview).
What is claimed is: 1. A computer system for generating a stochastic categorical autoencoder network, the computer system comprising: a set of one or more processor cores; and computer memory in communication with the set of processor cores, wherein the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to train the stochastic categorical autoencoder network by performing steps that comprise: training an encoder network to output, subject to one or more constraints, parameters for parametric probability distributions of sample random variables from input data, wherein: the parameters comprise measures of central tendency and measures of dispersion; latent variables for the parametric probability distributions are unregularized such that the measures of central tendency tend to grow larger in magnitude relative to the measures of dispersion, subject to the one or more constraints; and the one or more constraints comprise a first constraint that constrains a measure of a magnitude of a vector of the measures of central tendency such that the measure of the magnitude of the vector of the measures of central tendency cannot grow arbitrarily large relative to a measure of a magnitude of a vector of the measures of dispersion; generating the sample random variables from the parameters; and training a decoder to output the input data from the sample random variables. 2. The computer system of claim 1 , wherein: the encoder comprises a neural network; and the decoder comprises a neural network. 3. The computer system of claim 2 , wherein the first constraint is that the measure of the magnitude of the vector of the measures of central tendency must be less than or equal to a first threshold value and the measure of the magnitude of the vector the measures of dispersion must be greater than or equal to a second threshold value. 4. The computer system of claim 3 , wherein: the measures of central tendency comprise means; and the measures of dispersion comprise standard deviations. 5. The computer system of claim 3 , wherein the first threshold value is equal to the second threshold value. 6. The computer system of claim 2 , wherein: the measure of the magnitude of the vector the measure of dispersion is a pre-specified value; and the encoder is trained to generate the measures of central tendency based on the pre-specified value for the magnitude of the vector the measure of dispersion. 7. The computer system of claim 2 , wherein: the measure of the magnitude of the vector of the measures of central tendency comprises a norm measure; and the measure of the magnitude of the vector of the measures of dispersion comprises a norm measure. 8. The computer system of claim 7 , wherein: the measures of central tendency comprise means; and the measures of dispersion comprise standard deviations. 9. The computer system of claim 7 , wherein the first constraint is that the measure of the magnitude of the vector of the measures of central tendency must be less than or equal to a first threshold value and the measure of the magnitude of the vector the measures of dispersion must be greater than or equal to a second threshold value. 10. The computer system of claim 7 , wherein the norm measure for the measure of the magnitude of the vector of the measures of central tendency is different from the norm measure for the measure of the magnitude of the vector of the measures of dispersion. 11. The computer system of claim 7 , wherein each of the norm measures for the measure of the magnitude of the vector of the measure of central tendency and the measure of the magnitude of the vector of the measures of dispersion comprises a norm measure selected from the group consisting of a sup norm, a L1 norm and a L2 norm. 12. The computer system of claim 2 , wherein the measures of central tendency comprise means. 13. The computer system of claim 12 , wherein the measures of dispersion comprise standard deviations. 14. The computer system of claim 1 , wherein the probability distributions comprise independent Gaussian probability distributions. 15. The computer system of claim 1 , wherein the probability distributions comprise Bernoulli distributions. 16. The computer system of claim 1 , wherein the probability distributions comprise Poisson distributions. 17. The computer of claim 1 , wherein the probability distributions comprise uniform distributions. 18. The computer system of claim 2 , wherein the computer memory stores software that when executed by the set of processor cores, further causes the set of processor cores to augment a selected set of data by training, at least once, the stochastic categorical autoencoder network with the selected set of data to produce the augmented data, wherein training the stochastic categorical autoencoder network comprises training the stochastic categorical autoencoder network with a number of hyperparameters, including: a first hyperparameter that controls soft-tying nodes in the stochastic categorical autoencoder network; and a second hyperparameter that controls influence weights for data examples in the selected set of data. 19. The computer system of claim 18 , wherein the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to augment the selected set of data by repetitively training the stochastic categorical autoencoder network with selected set of data, with each repetitive training after a first training using at least one different hyperparameter than an immediately prior training. 20. The computer system of claim 2 , wherein the computer memory stores software that when executed by the set of processor cores, causes the set of processor cores to: implement a degradation regression system that is trained to estimate an amount of degradation in a pattern that is due to noise; and implement a denoising system that is trained to remove noise in the output of the decoder, wherein training the stochastic categorical autoencoder network comprises back-propagation through the degradation regression system and through the denoising system. 21. A method for generating a stochastic categorical autoencoder network, the method comprising training, with a computer system comprising one or more processor cores, the stochastic categorical autoencoder network, wherein training the stochastic categorical autoencoder network comprises: training an encoder network to output, subject to one or more constraints, parameters for parametric probability distributions of sample random variables from input data, wherein: the parameters comprise measures of central tendency and measures of dispersion; latent variables for the parametric probability distributions are unregularized such that the measures of central tendency tend to grow larger in magnitude relative to the measures of dispersion, subject to the one or more constraints; and the one or more constraints comprise a first constraint that constrains a measure of a magnitude of a vector of the measures of central tendency such that the measure of the magnitude of the vector of the measures of central tendency cannot grow arbitrarily large relative to a measure of a magnitude of a vector of the measures of dispersion; generating the sample random variables from the parameters; and training a decoder to output the input data from the sample random variables. 22. The
Physics · mapped topic
Backpropagation, e.g. using gradient descent · CPC title
Physics · mapped topic
Physics · mapped topic
Non-supervised learning, e.g. competitive learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.