Article

Stochastic backpropagation and approximate inference in deep generative models

Authors:

Danilo Jimenez Rezende,

Shakir Mohamed,

Daan WierstraAuthors Info & Claims

ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32

Pages II-1278 - II-1286

Published: 21 June 2014 Publication History

Abstract

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation - rules for gradient backpropagation through stochastic variables - and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several real-world data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for high-dimensional data visualisation.

References

[1]

Bartholomew, D. J. and Knott, M. Latent variable models and factor analysis, volume 7 of Kendall's library of statistics. Arnold, 2nd edition, 1999.

[2]

Beal, M. J. Variational Algorithms for approximate Bayesian inference. PhD thesis, University of Cambridge, 2003.

[3]

Bengio, Y. and Thibodeau-Laufer, É. Deep generative stochastic networks trainable by backprop. Technical report, University of Montreal, 2013.

[4]

Bengio, Y., Yao, L., Alain, G., and Vincent, P. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems (NIPS), pp. 1-9, 2013.

[5]

Blei, D. M., Jordan, M. I., and Paisley, J.W. Variational Bayesian inference with stochastic search. In Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1367-1374, 2012.

[6]

Bonnet, G. Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire. Annales des Télécommunications, 19(9-10):203-220, 1964.

[7]

Damianou, A. C. and Lawrence, N. D. Deep Gaussian processes. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2013.

[8]

Dayan, P., Hinton, G. E., Neal, R. M., and Zemel, R. S. The Helmholtz machine. Neural computation, 7(5):889-904, September 1995.

[9]

Dayan, P. Helmholtz machines and wake-sleep learning. Handbook of Brain Theory and Neural Network. MIT Press, Cambridge, MA, 44(0), 2000.

[10]

Frey, B. J. Variational inference for continuous sigmoidal Bayesian networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 1996.

[11]

Frey, B. J. and Hinton, G. E. Variational learning in nonlinear Gaussian belief networks. Neural Computation, 11(1):193- 213, January 1999.

[12]

Gershman, S. J. and Goodman, N. D. Amortized inference in probabilistic reasoning. In Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014.

[13]

Graves, A. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems 24 (NIPS), pp. 2348-2356, 2011.

[14]

Gregor, K., Mnih, A., and Wierstra, D. Deep autoregressive networks. In Proceedings of the International Conference on Machine Learning (ICML), October 2014.

[15]

Hoffman, M., Blei, D. M., Wang, C., and Paisley, J. Stochastic variational inference. Journal of Machine Learning Research, 14:1303-1347, May 2013.

[16]

Honkela, A. and Valpola, H. Unsupervised variational Bayesian learning of nonlinear models. In Advances in Neural Information Processing Systems (NIPS), 2004.

[17]

Kingma, D. P. and Welling, M. Auto-encoding variational Bayes. Proceedings of the International Conference on Learning Representations (ICLR), 2014.

[18]

Lappalainen, H. and Honkela, A. Bayesian non-linear independent component analysis by multi-layer perceptrons. In Advances in independent component analysis (ICA), pp. 93-121. Springer, 2000.

[19]

Larochelle, H. and Murray, I. The neural autoregressive distribution estimator. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.

[20]

Lawrence, N. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. The Journal of Machine Learning Research, 6:1783-1816, 2005.

[21]

Little, R. J. and Rubin, D. B. Statistical analysis with missing data, volume 539. Wiley New York, 1987.

[22]

Magdon-Ismail, M. and Purnell, J. T. Approximating the covariance matrix of GMMs with low-rank perturbations. In Proceedings of the 11th international conference on Intelligent data engineering and automated learning (IDEAL), pp. 300- 307, 2010.

[23]

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

[24]

Opper, M. and Archambeau, C. The variational Gaussian approximation revisited. Neural computation, 21(3):786-92, March 2009.

[25]

Price, R. A useful theorem for nonlinear devices having Gaussian inputs. IEEE Transactions on Information Theory, 4(2):69-72, 1958.

[26]

Ranganath, R., Gerrish, S., and Blei, D. M. Black box variational inference. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), October 2014.

[27]

Salimans, T. and Knowles, D. A. On using control variates with stochastic approximation for variational bayes and its connection to stochastic linear regression. ArXiv preprint. arXiv:1401.1022, October 2014.

[28]

Saul, L. K., Jaakkola, T., and Jordan, M. I. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research (JAIR), 4:61-76, 1996.

[29]

Uria, B., Murray, I., and Larochelle, H. A deep and tractable density estimator. In Proceedings of the International Conference on Machine Learning (ICML), 2014.

[30]

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:3371-3408, 2010.

[31]

Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.

[32]

Wilson, J. R. Variance reduction techniques for digital simulation. American Journal of Mathematical and Management Sciences, 4(3):277-312, 1984.

Cited By

Burroni JDomke JSheldon DKiyavash NMooij J(2024)Sample average approximation for black-box variational inferenceProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702699(471-498)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702699
Zhu QLin WSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Switched flow matchingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694655(62443-62475)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694655
Zhou MZheng HWang ZYin MHuang HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Score identity distillationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694648(62307-62331)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694648
Show More Cited By

Stochastic backpropagation and approximate inference in deep generative models
1. Computing methodologies
2. Mathematics of computing

Recommendations

Mean-field variational approximate Bayesian inference for latent variable models

The ill-posed nature of missing variable models offers a challenging testing ground for new computational techniques. This is the case for the mean-field variational Bayesian inference. The behavior of this approach in the setting of the Bayesian probit ...
Stochastic variational inference

We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet ...
Stochastic gradient descent as approximate Bayesian inference

Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32

June 2014

2786 pages

Publisher

JMLR.org

Publication History

Published: 21 June 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

415
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Burroni JDomke JSheldon DKiyavash NMooij J(2024)Sample average approximation for black-box variational inferenceProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702699(471-498)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.5555/3702676.3702699
Zhu QLin WSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Switched flow matchingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694655(62443-62475)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694655
Zhou MZheng HWang ZYin MHuang HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Score identity distillationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694648(62307-62331)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694648
Xu YCorso GJaakkola TVahdat AKreis KSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)DisCo-DiffProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694331(54933-54961)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694331
Wu KGardner JSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Understanding stochastic natural gradient variational inferenceProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694258(53398-53421)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694258
Westny TMohammadi AJung DFrisk ESalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stability-informed initialization of neural ordinary differential equationsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694240(52903-52914)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694240
Wang XLin BLiu DChen YXu CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Bridging data gaps in diffusion models with adversarial noise-based transfer learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694158(50944-50959)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694158
Sinelnikov MHaussmann MLähdesmäki HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Latent variable model for high-dimensional point process with structured missingnessProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693922(45525-45543)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693922
Matias AMattos CGomes JMesquita DSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Amortized variational deep kernel learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693496(35063-35078)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693496
Mamaghan ATigas PJohansson KGal YAnnadani YBauer SSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Challenges and considerations in the evaluation of Bayesian causal discoveryProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693004(23215-23237)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693004
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents