skip to main content
10.5555/3044805.3045035guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Stochastic backpropagation and approximate inference in deep generative models

Published: 21 June 2014 Publication History

Abstract

We marry ideas from deep neural networks and approximate Bayesian inference to derive a generalised class of deep, directed generative models, endowed with a new algorithm for scalable inference and learning. Our algorithm introduces a recognition model to represent an approximate posterior distribution and uses this for optimisation of a variational lower bound. We develop stochastic backpropagation - rules for gradient backpropagation through stochastic variables - and derive an algorithm that allows for joint optimisation of the parameters of both the generative and recognition models. We demonstrate on several real-world data sets that by using stochastic backpropagation and variational inference, we obtain models that are able to generate realistic samples of data, allow for accurate imputations of missing data, and provide a useful tool for high-dimensional data visualisation.

References

[1]
Bartholomew, D. J. and Knott, M. Latent variable models and factor analysis, volume 7 of Kendall's library of statistics. Arnold, 2nd edition, 1999.
[2]
Beal, M. J. Variational Algorithms for approximate Bayesian inference. PhD thesis, University of Cambridge, 2003.
[3]
Bengio, Y. and Thibodeau-Laufer, É. Deep generative stochastic networks trainable by backprop. Technical report, University of Montreal, 2013.
[4]
Bengio, Y., Yao, L., Alain, G., and Vincent, P. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems (NIPS), pp. 1-9, 2013.
[5]
Blei, D. M., Jordan, M. I., and Paisley, J.W. Variational Bayesian inference with stochastic search. In Proceedings of the 29th International Conference on Machine Learning (ICML), pp. 1367-1374, 2012.
[6]
Bonnet, G. Transformations des signaux aléatoires a travers les systèmes non linéaires sans mémoire. Annales des Télécommunications, 19(9-10):203-220, 1964.
[7]
Damianou, A. C. and Lawrence, N. D. Deep Gaussian processes. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2013.
[8]
Dayan, P., Hinton, G. E., Neal, R. M., and Zemel, R. S. The Helmholtz machine. Neural computation, 7(5):889-904, September 1995.
[9]
Dayan, P. Helmholtz machines and wake-sleep learning. Handbook of Brain Theory and Neural Network. MIT Press, Cambridge, MA, 44(0), 2000.
[10]
Frey, B. J. Variational inference for continuous sigmoidal Bayesian networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 1996.
[11]
Frey, B. J. and Hinton, G. E. Variational learning in nonlinear Gaussian belief networks. Neural Computation, 11(1):193- 213, January 1999.
[12]
Gershman, S. J. and Goodman, N. D. Amortized inference in probabilistic reasoning. In Proceedings of the 36th Annual Conference of the Cognitive Science Society, 2014.
[13]
Graves, A. Practical variational inference for neural networks. In Advances in Neural Information Processing Systems 24 (NIPS), pp. 2348-2356, 2011.
[14]
Gregor, K., Mnih, A., and Wierstra, D. Deep autoregressive networks. In Proceedings of the International Conference on Machine Learning (ICML), October 2014.
[15]
Hoffman, M., Blei, D. M., Wang, C., and Paisley, J. Stochastic variational inference. Journal of Machine Learning Research, 14:1303-1347, May 2013.
[16]
Honkela, A. and Valpola, H. Unsupervised variational Bayesian learning of nonlinear models. In Advances in Neural Information Processing Systems (NIPS), 2004.
[17]
Kingma, D. P. and Welling, M. Auto-encoding variational Bayes. Proceedings of the International Conference on Learning Representations (ICLR), 2014.
[18]
Lappalainen, H. and Honkela, A. Bayesian non-linear independent component analysis by multi-layer perceptrons. In Advances in independent component analysis (ICA), pp. 93-121. Springer, 2000.
[19]
Larochelle, H. and Murray, I. The neural autoregressive distribution estimator. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2011.
[20]
Lawrence, N. Probabilistic non-linear principal component analysis with Gaussian process latent variable models. The Journal of Machine Learning Research, 6:1783-1816, 2005.
[21]
Little, R. J. and Rubin, D. B. Statistical analysis with missing data, volume 539. Wiley New York, 1987.
[22]
Magdon-Ismail, M. and Purnell, J. T. Approximating the covariance matrix of GMMs with low-rank perturbations. In Proceedings of the 11th international conference on Intelligent data engineering and automated learning (IDEAL), pp. 300- 307, 2010.
[23]
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.
[24]
Opper, M. and Archambeau, C. The variational Gaussian approximation revisited. Neural computation, 21(3):786-92, March 2009.
[25]
Price, R. A useful theorem for nonlinear devices having Gaussian inputs. IEEE Transactions on Information Theory, 4(2):69-72, 1958.
[26]
Ranganath, R., Gerrish, S., and Blei, D. M. Black box variational inference. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), October 2014.
[27]
Salimans, T. and Knowles, D. A. On using control variates with stochastic approximation for variational bayes and its connection to stochastic linear regression. ArXiv preprint. arXiv:1401.1022, October 2014.
[28]
Saul, L. K., Jaakkola, T., and Jordan, M. I. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research (JAIR), 4:61-76, 1996.
[29]
Uria, B., Murray, I., and Larochelle, H. A deep and tractable density estimator. In Proceedings of the International Conference on Machine Learning (ICML), 2014.
[30]
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. The Journal of Machine Learning Research, 11:3371-3408, 2010.
[31]
Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.
[32]
Wilson, J. R. Variance reduction techniques for digital simulation. American Journal of Mathematical and Management Sciences, 4(3):277-312, 1984.

Cited By

View all
  1. Stochastic backpropagation and approximate inference in deep generative models

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32
      June 2014
      2786 pages

      Publisher

      JMLR.org

      Publication History

      Published: 21 June 2014

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 09 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Sample average approximation for black-box variational inferenceProceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence10.5555/3702676.3702699(471-498)Online publication date: 15-Jul-2024
      • (2024)Switched flow matchingProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694655(62443-62475)Online publication date: 21-Jul-2024
      • (2024)Score identity distillationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694648(62307-62331)Online publication date: 21-Jul-2024
      • (2024)DisCo-DiffProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694331(54933-54961)Online publication date: 21-Jul-2024
      • (2024)Understanding stochastic natural gradient variational inferenceProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694258(53398-53421)Online publication date: 21-Jul-2024
      • (2024)Stability-informed initialization of neural ordinary differential equationsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694240(52903-52914)Online publication date: 21-Jul-2024
      • (2024)Bridging data gaps in diffusion models with adversarial noise-based transfer learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694158(50944-50959)Online publication date: 21-Jul-2024
      • (2024)Latent variable model for high-dimensional point process with structured missingnessProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693922(45525-45543)Online publication date: 21-Jul-2024
      • (2024)Amortized variational deep kernel learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693496(35063-35078)Online publication date: 21-Jul-2024
      • (2024)Challenges and considerations in the evaluation of Bayesian causal discoveryProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693004(23215-23237)Online publication date: 21-Jul-2024
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media