Article

Dropout as a Bayesian approximation: representing model uncertainty in deep learning

Authors:

Zoubin GhahramaniAuthors Info & Claims

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

Pages 1050 - 1059

Published: 19 June 2016 Publication History

Abstract

Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs - extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and nonlinearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

References

[1]

Anjos, O, Iglesias, C, Peres, F, Martínez, J, García, Á, and Taboada, J. Neural networks applied to discriminate botanical origin of honeys. Food chemistry, 175: 128-136, 2015.

[2]

Baldi, P, Sadowski, P, and Whiteson, D. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5, 2014.

[3]

Barber, D and Bishop, C M. Ensemble learning in Bayesian neural networks. NATO ASI SERIES F COMPUTER AND SYSTEMS SCIENCES, 168:215-238, 1998.

[4]

Bergmann, S, Stelzer, S, and Strassburger, S. On the use of artificial neural networks in simulation-based manufacturing control. Journal of Simulation, 8(1):76-90, 2014.

[5]

Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.

[6]

Blei, D M, Jordan, M I, and Paisley, J W. Variational Bayesian inference with stochastic search. In ICML, 2012.

[7]

Blundell, C, Cornebise, J, Kavukcuoglu, K, and Wierstra, D. Weight uncertainty in neural networks. ICML, 2015.

[8]

Chen, W, Wilson, J T, Tyree, S, Weinberger, K Q, and Chen, Y. Compressing neural networks with the hashing trick. In ICML-15, 2015.

[9]

Chollet, François. Keras. https://github.com/fchollet/keras, 2015.

[10]

Damianou, A and Lawrence, N. Deep Gaussian processes. In AISTATS, 2013.

[11]

Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature, 521(7553), 2015.

[12]

Graves, A. Practical variational inference for neural networks. In NIPS, 2011.

[13]

Hernández-Lobato, J M and Adams, R P. Probabilistic backpropagation for scalable learning of bayesian neural networks. In ICML-15, 2015.

[14]

Herzog, S and Ostwald, D. Experimental biology: Sometimes Bayesian statistics are better. Nature, 494, 2013.

[15]

Hinton, G E and Van Camp, D. Keeping the neural networks simple by minimizing the description length of the weights. In Proceedings of the sixth annual conference on Computational learning theory, 1993.

[16]

Hoffman, M D, Blei, D M, Wang, C, and Paisley, J. Stochastic variational inference. The Journal of Machine Learning Research, 14(1):1303-1347, 2013.

[17]

Jia, Y, Shelhamer, E, Donahue, J, Karayev, S, Long, J, Girshick, R, Guadarrama, S, and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

[18]

Karpathy, A and authors. A Javascript implementation of neural networks. https://github.com/karpathy/convnetjs, 2014-2015.

[19]

Keeling, C D, Whorf, T P, and the Carbon Dioxide Research Group. Atmospheric CO2 concentrations (ppmv) derived from in situ air samples collected at Mauna Loa Observatory, Hawaii, 2004.

[20]

Kingma, D P and Welling, M. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114, 2013.

[21]

Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.

[22]

Krzywinski, M and Altman, N. Points of significance: Importance of being uncertain. Nature methods, 10(9), 2013.

[23]

Lean, J. Solar irradiance reconstruction. NOAA/NGDC Paleoclimatology Program, USA, 2004.

[24]

LeCun, Y and Cortes, C. The mnist database of handwritten digits, 1998.

[25]

LeCun, Y, Bottou, L, Bengio, Y, and Haffner, P. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.

[26]

Linda, O, Vollmer, T, and Manic, M. Neural network based intrusion detection system for critical infrastructures. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on. IEEE, 2009.

[27]

MacKay, D J C. A practical Bayesian framework for back-propagation networks. Neural computation, 4(3), 1992.

[28]

Maeda, S. A Bayesian encourages dropout. arXiv preprint arXiv:1412.7003, 2014.

[29]

Mnih, V, Kavukcuoglu, K, Silver, D, Rusu, A A, Veness, J, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.

[30]

Neal, R M. Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995.

[31]

Nuzzo, Regina. Statistical errors. Nature, 506(13):150- 152, 2014.

[32]

Rasmussen, C E and Williams, C K I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, 2006.

[33]

Rezende, D J, Mohamed, S, and Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In ICML, 2014.

[34]

Snoek, Jasper and authors. Spearmint. https://github.com/JasperSnoek/spearmint, 2015.

[35]

Snoek, Jasper, Larochelle, Hugo, and Adams, Ryan P. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951-2959, 2012.

[36]

Srivastava, N, Hinton, G, Krizhevsky, A, Sutskever, I, and Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 2014.

[37]

Szepesvári, C. Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 4(1), 2010.

[38]

Thompson, WR. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.

[39]

Titsias, M and Lázaro-Gredilla, M. Doubly stochastic variational Bayes for non-conjugate inference. In ICML, 2014.

[40]

Trafimow, D and Marks, M. Editorial. Basic and Applied Social Psychology, 37(1), 2015.

[41]

Wan, L, Zeiler, M, Zhang, S, LeCun, Y, and Fergus, R. Regularization of neural networks using dropconnect. In ICML-13, 2013.

[42]

Wang, S and Manning, C. Fast dropout training. ICML, 2013.

[43]

Williams, C K I. Computing with infinite networks. NIPS, 1997.

Cited By

Humer CHinterreiter ALeichtmann BMara MStreit M(2024)Reassuring, Misleading, Debunking: Comparing Effects of XAI Methods on Human DecisionsACM Transactions on Interactive Intelligent Systems10.1145/366564714:3(1-36)Online publication date: 22-May-2024
https://dl.acm.org/doi/10.1145/3665647
Liu LLiu LCui YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681106(2888-2897)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681106
Xie WLu XLiu YLong JZhang BZhao SWen JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Uncertainty-Aware Pseudo-Labeling and Dual Graph Driven Network for Incomplete Multi-View Multi-Label ClassificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680932(6656-6665)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680932
Show More Cited By

Dropout as a Bayesian approximation: representing model uncertainty in deep learning
1. Computing methodologies

Recommendations

Uncertainty propagation for dropout-based Bayesian neural networks
Abstract
Uncertainty evaluation is a core technique when deep neural networks (DNNs) are used in real-world problems. In practical applications, we often encounter unexpected samples that have not seen in the training process. Not only ...
Bayesian Generative Adversarial Nets with Dropout Inference
CODS-COMAD '21: Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD)

Generative adversarial networks are one of the most popular approaches to generate new data from complex high-dimensional data distributions. They have revolutionized the area of generative models by creating quality samples that highly resemble the ...
Towards dropout training for convolutional neural networks

Recently, dropout has seen increasing use in deep learning. For deep convolutional neural networks, dropout is known to work well in fully-connected layers. However, its effect in convolutional and pooling layers is still not clear. This paper ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48

June 2016

3077 pages

Publisher

JMLR.org

Publication History

Published: 19 June 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

339
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Humer CHinterreiter ALeichtmann BMara MStreit M(2024)Reassuring, Misleading, Debunking: Comparing Effects of XAI Methods on Human DecisionsACM Transactions on Interactive Intelligent Systems10.1145/366564714:3(1-36)Online publication date: 22-May-2024
https://dl.acm.org/doi/10.1145/3665647
Liu LLiu LCui YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681106(2888-2897)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681106
Xie WLu XLiu YLong JZhang BZhao SWen JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Uncertainty-Aware Pseudo-Labeling and Dual Graph Driven Network for Incomplete Multi-View Multi-Label ClassificationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680932(6656-6665)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680932
Zhang YZong RShang LYue ZZeng HLiu YWang D(2024)Tripartite Intelligence: Synergizing Deep Neural Network, Large Language Model, and Human Intelligence for Public Health Misinformation Detection (Archival Full Paper)Proceedings of the ACM Collective Intelligence Conference10.1145/3643562.3672613(63-75)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3643562.3672613
Kochanthara SSingh TForrai ACleophas L(2024)Safety of Perception Systems for Automated Driving: A Case Study on ApolloACM Transactions on Software Engineering and Methodology10.1145/363196933:3(1-28)Online publication date: 15-Mar-2024
https://dl.acm.org/doi/10.1145/3631969
George Dong CDavid Li ZNathan Zheng LChen WEmma Zhang WSerra ESpezzano F(2024)Boosting Certificate Robustness for Time Series Classification with Efficient Self-EnsembleProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679748(477-486)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679748
Paliwal CMajumder AKaveri SChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Predictive Relevance Uncertainty for Recommendation SystemsProceedings of the ACM Web Conference 202410.1145/3589334.3645689(3900-3909)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645689
Liu WLiu CSajedi SSu HLiang XZheng M(2024)An audio‐based risky flight detection framework for quadrotorsIET Cyber-Systems and Robotics10.1049/csy2.121056:1Online publication date: 11-Jan-2024
https://dl.acm.org/doi/10.1049/csy2.12105
Kim GYang SKim DKim SChoi JKu MLim SPark H(2024)Bayesian-based uncertainty-aware tool-wear prediction model in end-milling process of titanium alloyApplied Soft Computing10.1016/j.asoc.2023.110922148:COnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.asoc.2023.110922
Abduallah YWang JWang HJing J(2024)A transformer-based framework for predicting geomagnetic indices with uncertainty quantificationJournal of Intelligent Information Systems10.1007/s10844-023-00828-762:4(887-903)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s10844-023-00828-7
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents