The Loss Surface of Deep and Wide Neural Networks

Quynh Nguyen, Matthias Hein
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:2603-2612, 2017.

Abstract

While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.

Cite this Paper


BibTeX
@InProceedings{pmlr-v70-nguyen17a, title = {The Loss Surface of Deep and Wide Neural Networks}, author = {Quynh Nguyen and Matthias Hein}, booktitle = {Proceedings of the 34th International Conference on Machine Learning}, pages = {2603--2612}, year = {2017}, editor = {Precup, Doina and Teh, Yee Whye}, volume = {70}, series = {Proceedings of Machine Learning Research}, month = {06--11 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v70/nguyen17a/nguyen17a.pdf}, url = {https://proceedings.mlr.press/v70/nguyen17a.html}, abstract = {While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.} }
Endnote
%0 Conference Paper %T The Loss Surface of Deep and Wide Neural Networks %A Quynh Nguyen %A Matthias Hein %B Proceedings of the 34th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2017 %E Doina Precup %E Yee Whye Teh %F pmlr-v70-nguyen17a %I PMLR %P 2603--2612 %U https://proceedings.mlr.press/v70/nguyen17a.html %V 70 %X While the optimization problem behind deep neural networks is highly non-convex, it is frequently observed in practice that training deep networks seems possible without getting stuck in suboptimal points. It has been argued that this is the case as all local minima are close to being globally optimal. We show that this is (almost) true, in fact almost all local minima are globally optimal, for a fully connected network with squared loss and analytic activation function given that the number of hidden units of one layer of the network is larger than the number of training points and the network structure from this layer on is pyramidal.
APA
Nguyen, Q. & Hein, M.. (2017). The Loss Surface of Deep and Wide Neural Networks. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:2603-2612 Available from https://proceedings.mlr.press/v70/nguyen17a.html.

Related Material