skip to main content
10.5555/3045390.3045502guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Dropout as a Bayesian approximation: representing model uncertainty in deep learning

Published: 19 June 2016 Publication History

Abstract

Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs - extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and nonlinearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

References

[1]
Anjos, O, Iglesias, C, Peres, F, Martínez, J, García, Á, and Taboada, J. Neural networks applied to discriminate botanical origin of honeys. Food chemistry, 175: 128-136, 2015.
[2]
Baldi, P, Sadowski, P, and Whiteson, D. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5, 2014.
[3]
Barber, D and Bishop, C M. Ensemble learning in Bayesian neural networks. NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES, 168:215-238, 1998.
[4]
Bergmann, S, Stelzer, S, and Strassburger, S. On the use of artificial neural networks in simulation-based manufacturing control. Journal of Simulation, 8(1):76-90, 2014.
[5]
Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.
[6]
Blei, D M, Jordan, M I, and Paisley, J W. Variational Bayesian inference with stochastic search. In ICML, 2012.
[7]
Blundell, C, Cornebise, J, Kavukcuoglu, K, and Wierstra, D. Weight uncertainty in neural networks. ICML, 2015.
[8]
Chen, W, Wilson, J T, Tyree, S, Weinberger, K Q, and Chen, Y. Compressing neural networks with the hashing trick. In ICML-15, 2015.
[9]
Chollet, François. Keras. https://github.com/fchollet/keras, 2015.
[10]
Damianou, A and Lawrence, N. Deep Gaussian processes. In AISTATS, 2013.
[11]
Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature, 521(7553), 2015.
[12]
Graves, A. Practical variational inference for neural networks. In NIPS, 2011.
[13]
Hernández-Lobato, J M and Adams, R P. Probabilistic backpropagation for scalable learning of bayesian neural networks. In ICML-15, 2015.
[14]
Herzog, S and Ostwald, D. Experimental biology: Sometimes Bayesian statistics are better. Nature, 494, 2013.
[15]
Hinton, G E and Van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, 1993.
[16]
Hoffman, M D, Blei, D M, Wang, C, and Paisley, J. Stochastic variational inference. The Journal of Machine Learning Research, 14(1):1303-1347, 2013.
[17]
Jia, Y, Shelhamer, E, Donahue, J, Karayev, S, Long, J, Girshick, R, Guadarrama, S, and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.
[18]
Karpathy, A and authors. A Javascript implementation of neural networks. https://github.com/karpathy/convnetjs, 2014-2015.
[19]
Keeling, C D, Whorf, T P, and the Carbon Dioxide Research Group. Atmospheric CO2 concentrations (ppmv) derived from in situ air samples collected at Mauna Loa Observatory, Hawaii, 2004.
[20]
Kingma, D P and Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.
[21]
Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[22]
Krzywinski, M and Altman, N. Points of significance: Importance of being uncertain. Nature methods, 10(9), 2013.
[23]
Lean, J. Solar irradiance reconstruction. NOAA/NGDC Paleoclimatology Program, USA, 2004.
[24]
LeCun, Y and Cortes, C. The mnist database of handwritten digits, 1998.
[25]
LeCun, Y, Bottou, L, Bengio, Y, and Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
[26]
Linda, O, Vollmer, T, and Manic, M. Neural network based intrusion detection system for critical infrastructures. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on. IEEE, 2009.
[27]
MacKay, D J C. A practical Bayesian framework for back-propagation networks. Neural computation, 4(3), 1992.
[28]
Maeda, S. A Bayesian encourages dropout. arXiv preprint arXiv:1412.7003, 2014.
[29]
Mnih, V, Kavukcuoglu, K, Silver, D, Rusu, A A, Veness, J, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
[30]
Neal, R M. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995.
[31]
Nuzzo, Regina. Statistical errors. Nature, 506(13):150- 152, 2014.
[32]
Rasmussen, C E and Williams, C K I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2006.
[33]
Rezende, D J, Mohamed, S, and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.
[34]
Snoek, Jasper and authors. Spearmint. https://github.com/JasperSnoek/spearmint, 2015.
[35]
Snoek, Jasper, Larochelle, Hugo, and Adams, Ryan P. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951-2959, 2012.
[36]
Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I, and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 2014.
[37]
Szepesvári, C. Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 4(1), 2010.
[38]
Thompson, WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.
[39]
Titsias, M and Lázaro-Gredilla, M. Doubly stochastic variational Bayes for non-conjugate inference. In ICML, 2014.
[40]
Trafimow, D and Marks, M. Editorial. Basic and Applied Social Psychology, 37(1), 2015.
[41]
Wan, L, Zeiler, M, Zhang, S, LeCun, Y, and Fergus, R. Regularization of neural networks using dropconnect. In ICML-13, 2013.
[42]
Wang, S and Manning, C. Fast dropout training. ICML, 2013.
[43]
Williams, C K I. Computing with infinite networks. NIPS, 1997.

Cited By

View all
  • (2024)Reassuring, Misleading, Debunking: Comparing Effects of XAI Methods on Human DecisionsACM Transactions on Interactive Intelligent Systems10.1145/366564714:3(1-36)Online publication date: 22-May-2024
  • (2024)Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681106(2888-2897)Online publication date: 28-Oct-2024
  • (2024)Uncertainty-Aware Pseudo-Labeling and Dual Graph Driven Network for Incomplete Multi-View Multi-Label ClassificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680932(6656-6665)Online publication date: 28-Oct-2024
  • Show More Cited By
  1. Dropout as a Bayesian approximation: representing model uncertainty in deep learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48
    June 2016
    3077 pages

    Publisher

    JMLR.org

    Publication History

    Published: 19 June 2016

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 06 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media