skip to main content
research-article

Model complexity of deep learning: a survey

Published: 01 October 2021 Publication History

Abstract

Model complexity is a fundamental problem in deep learning. In this paper, we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process, and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization, model optimization, and model selection and design. We conclude by proposing several interesting future directions.

References

[1]
Adams RA and Fournier JJ Sobolev spaces 2003 Amsterdam Elsevier
[2]
Allen-Zhu Z, Li Y, Liang Y (2019) Learning and generalization in overparameterized neural networks, going beyond two layers. In: Advances in neural information processing systems, pp 6155–6166
[3]
Arora R, Basu A, Mianjy P, Mukherjee A (2018) Understanding deep neural networks with rectified linear units. In: International conference on learning representations
[4]
Arora S and Barak B Computational complexity: a modern approach 2009 Cambridge Cambridge University Press
[5]
Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, pp 2654–2662
[6]
Balasubramanian V Statistical inference, Occams razor, and statistical mechanics on the space of probability distributions Neural Comput 1997 9 2 349-368
[7]
Barron AR Universal approximation bounds for superpositions of a sigmoidal function IEEE Trans Inf Theory 1993 39 3 930-945
[8]
Bartlett PL, Boucheron S, and Lugosi G Model selection and error estimation Mach Learn 2002 48 1–3 85-113
[9]
Bartlett PL, Foster DJ, Telgarsky MJ (2017) Spectrally-normalized margin bounds for neural networks. In: Advances in neural information processing systems, pp 6240–6249
[10]
Bartlett PL, Harvey N, and Liaw C A Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks J Mach Learn Res 2019 20 63 1-17
[11]
Bartlett PL, Maiorov V, and Meir R Almost linear vc-dimension bounds for piecewise polynomial networks Neural Comput 1998 10 8 2159-2173
[12]
Bartlett PL and Mendelson S Rademacher and Gaussian complexities: risk bounds and structural results J Mach Learn Res 2002 3 Nov 463-482
[13]
Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: International conference on algorithmic learning theory, Springer, pp 18–36
[14]
Bianchini M and Scarselli F On the complexity of neural network classifiers: a comparison between shallow and deep architectures IEEE Trans Neural Netw Learn Syst 2014 25 8 1553-1565
[15]
Bianchini M, Scarselli F (2014) On the complexity of shallow and deep neural network classifiers. In: ESANN
[16]
Bohanec M and Bratko I Trading accuracy for simplicity in decision trees Mach Learn 1994 15 3 223-250
[17]
Bonaccorso G Machine learning algorithms 2017 Birmingham Packt Publishing Ltd
[18]
Bredon GE Topology and geometry 2013 Berlin Springer
[19]
Breiman L, Friedman J, Stone CJ, and Olshen RA Classification and regression trees 1984 Boca Raton CRC Press
[20]
Buhrman H and De Wolf R Complexity measures and decision tree complexity: a survey Theoret Comput Sci 2002 288 1 21-43
[21]
Bulso N, Marsili M, and Roudi Y On the complexity of logistic regression models Neural Comput 2019 31 8 1592-1623
[22]
Cano JR Analysis of data complexity measures for classification Expert Syst Appl 2013 40 12 4820-4831
[23]
Carothers NL Real analysis 2000 Cambridge Cambridge University Press
[24]
Carroll JD and Chang JJ Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition Psychometrika 1970 35 3 283-319
[25]
Cheng Y, Wang D, Zhou P, and Zhang T Model compression and acceleration for deep neural networks: the principles, progress, and challenges IEEE Signal Process Mag 2018 35 1 126-136
[26]
Cherkassky V, Shao X, Mulier FM, and Vapnik VN Model complexity control for regression using vc generalization bounds IEEE Trans Neural Netw 1999 10 5 1075-1089
[27]
Cohen N, Shashua A (2016) Convolutional rectifier networks as generalized tensor decompositions. In: International conference on machine learning, pp 955–963
[28]
Cook S, Dwork C, and Reischuk R Upper and lower time bounds for parallel random access machines without simultaneous writes SIAM J Comput 1986 15 1 87-97
[29]
Cybenko G Approximation by superpositions of a sigmoidal function Math Control Signals Syst 1989 2 4 303-314
[30]
Delalleau O, Bengio Y (2011) Shallow vs. deep sum-product networks. In: Advances in neural information processing systems, pp 666–674
[31]
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255
[32]
Du J The weight of models and complexity Complexity 2016 21 3 21-35
[33]
Frieden B Science from fisher information: a unification 2004 Cambridge Cambridge univ. Press
[34]
Goodfellow I, Bengio Y, and Courville A Deep learning 2016 New York MIT press
[35]
Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learning, PMLR, pp 1319–1327
[36]
Gühring I, Kutyniok G, Petersen P (2019) Complexity bounds for approximations with deep relu neural networks in sobolev norms
[37]
Hanin B, Rolnick D (2019) Complexity of linear regions in deep networks. In: International conference on machine learning, PMLR, pp 2596–2604
[38]
Hanin B, Sellke M (2017) Approximating continuous functions by relu nets of minimal width. arXiv preprint arXiv:1710.11278
[39]
Hayou S, Doucet A, and Rousseau J On the selection of initialization and activation function for deep neural networks STAT 2018 1050 7
[40]
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
[41]
Hinton G, Vinyals O, and Dean J Distilling the knowledge in a neural network STAT 2015 1050 9
[42]
Höge M, Wöhling T, and Nowak W A primer for model selection: the decisive role of model complexity Water Resour Res 2018 54 3 1688-1715
[43]
Hornik K, Stinchcombe M, and White H Multilayer feedforward networks are universal approximators Neural Netw 1989 2 5 359-366
[44]
Hu X, Liu W, Bian J, Pei J (2020) Measuring model complexity of neural networks with curve activation functions. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1521–1531
[45]
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, proceedings of machine learning research, pp 448–456
[46]
Kakade SM, Sridharan K, Tewari A (2009) On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Advances in neural information processing systems, pp 793–800
[47]
Kalimeris D, Kaplun G, Nakkiran P, Edelman B, Yang T, Barak B, Zhang H (2019) Sgd on neural networks learns functions of increasing complexity. In: Advances in neural information processing systems, pp 3491–3501
[48]
Kalman BL, Kwasny SC (1992) Why tanh: choosing a sigmoidal function. In: Proceedings 1992 of IJCNN international joint conference on neural networks, vol 4, IEEE, pp 578–581
[49]
Kawaguchi K, Kaelbling LP, Bengio Y (2017) Generalization in deep learning. arXiv preprint arXiv:1710.05468
[50]
Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: International conference on learning representations
[51]
Khrulkov V, Novikov A, Oseledets I (2018) Expressive power of recurrent neural networks. In: International conference on learning representations
[52]
Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319
[53]
Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319
[54]
Kuurkova V Constructive lower bounds on model complexity of shallow perceptron networks Neural Comput Appl 2018 29 7 305-315
[55]
Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018) Phrase-based and neural unsupervised machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 5039–5049
[56]
Landsberg JM Tensors: geometry and applications Rep Theory 2012 381 402 3
[57]
Laredo D, Ma SF, Leylaz G, Schütze O, and Sun JQ Automatic model selection for fully connected neural networks Int J Dyn Control 2020 8 4 1063-1079
[58]
LeCun Y, Bengio Y, and Hinton G Deep learning Nature 2015 521 7553 436-444
[59]
Li L (2006) Data complexity in machine learning and novel classification algorithms. PhD thesis, California Institute of Technology
[60]
Liang T, Poggio T, Rakhlin A, Stokes J (2019) Fisher-rao metric, geometry, and complexity of neural networks. In: The 22nd international conference on artificial intelligence and statistics, PMLR, pp 888–896
[61]
Lim TS, Loh WY, and Shih YS A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms Mach Learn 2000 40 3 203-228
[62]
Liu C, Zoph B, Neumann M, Shlens J, Hua W, Li LJ, Fei-Fei L, Yuille A, Huang J, Murphy K (2018) Progressive neural architecture search. In: Proceedings of the European conference on computer vision (ECCV), pp 19–34
[63]
Liu H, Simonyan K, Yang Y (2019) Darts: differentiable architecture search. In: International conference on learning representations
[64]
Lu Z, Pu H, Wang F, Hu Z, Wang L (2017) The expressive power of neural networks: a view from the width. In: Advances in neural information processing systems, pp 6231–6239
[65]
Lundqvist S, Oneto A, Reznick B, and Shapiro B On generic and maximal k-ranks of binary forms J Pure Appl Algebra 2019 223 5 2062-2079
[66]
Maass W Neural nets with superlinear vc-dimension Neural Comput 1994 6 5 877-884
[67]
Mhaskar H, Liao Q, Poggio T (2017). When and why are deep networks better than shallow ones? In: Proceedings of the thirty-first AAAI conference on artificial intelligence, AAAI’17, AAAI Press, pp 2343–2349
[68]
Michel B, Nouy A (2020) Learning with tree tensor networks: complexity estimates and model selection. arXiv preprint arXiv:2007.01165
[69]
Mohri M, Rostamizadeh A, and Talwalkar A Foundations of machine learning 2018 New York MIT press
[70]
Montufar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems, pp 2924–2932
[71]
Murphy KP Machine learning: a probabilistic perspective 2012 New York MIT press
[72]
Myung IJ The importance of complexity in model selection J Math Psychol 2000 44 1 190-204
[73]
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning, pp 807–814
[74]
Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I (2020) Deep double descent: where bigger models and more data hurt. In: International conference on learning representations
[75]
Neyshabur B (2017) Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953
[76]
Neyshabur B, Bhojanapalli S, McAllester D, Srebro N (2017) Exploring generalization in deep learning. In: Advances in neural information processing systems, pp 5947–5956
[77]
Neyshabur B, Li Z, Bhojanapalli S, LeCun Y, Srebro N (2018) The role of over-parametrization in generalization of neural networks. In: International conference on learning representations
[78]
Neyshabur B, Tomioka R, Srebro N (2015) In search of the real inductive bias: on the role of implicit regularization in deep learning. In: International conference on learning representations (workshop)
[79]
Neyshabur B, Tomioka R, Srebro N (2015) Norm-based capacity control in neural networks. In: Conference on learning theory, pp 1376–1401
[80]
Nisan N and Szegedy M On the degree of Boolean functions as real polynomials Comput Complex 1994 4 4 301-313
[81]
Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations
[82]
Nwankpa C, Ijomah W, Gachagan A, Marshall S (2021) Activation functions: comparison of trends in practice and research for deep learning. In: International conference on computational sciences and technology (INCCST) pp 124–133
[83]
Oseledets IV Tensor-train decomposition SIAM J Sci Comput 2011 33 5 2295-2317
[84]
Pérez Arribas I (2017) Sobolev spaces and partial differential equations
[85]
Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameters sharing. In: Proceedings of the 35th international conference on machine learning, proceedings of machine learning research, vol 80, pp 4095–4104
[86]
Poggio T, Kawaguchi K, Liao Q, Miranda B, Rosasco L, Boix X, Hidary J, Mhaskar H (2017) Theory of deep learning iii: explaining the non-overfitting puzzle. Massachusetts Institute of Technology CBMM Memo No. 73
[87]
Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems, pp 3360–3368
[88]
Radosavovic I, Johnson J, Xie S, Lo WY, Dollár P (2019) On network design spaces for visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1882–1890
[89]
Raghu M, Poole B, Kleinberg J, Ganguli S, Dickstein JS (2017) On the expressive power of deep neural networks. In: Proceedings of the 34th international conference on machine learning, vol. 70, JMLR, pp 2847–2854
[90]
Rasmussen CE, Ghahramani Z (2001) Occams razor. In: Advances in neural information processing systems, pp 294–300
[91]
Rebentrost P, Gupt B, and Bromley TR Quantum computational finance: Monte carlo pricing of financial derivatives Phys Rev A 2018 98 2 022321
[92]
Serra T, Tjandraatmadja C, Ramalingam S (2018) Bounding and counting linear regions of deep neural networks. In: International conference on machine learning, PMLR, pp 4558–4566
[93]
Spiegelhalter DJ, Best NG, Carlin BP, and Van Der Linde A Bayesian measures of model complexity and fit J R Stat Soc Ser B (Stat Methodol) 2002 64 4 583-639
[94]
Sun R Optimization for deep learning: theory and algorithms J Oper Res Soc China 2020 8 2 249-294
[95]
Tan Y and Wang J A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension IEEE Trans Knowl Data Eng 2004 16 4 385-395
[96]
Vapnik V The nature of statistical learning theory 2013 Berlin Springer
[97]
Xu H and Mannor S Robustness and generalization Mach Learn 2012 86 3 391-423
[98]
Yao ACC Decision tree complexity and Betti numbers J Comput Syst Sci 1997 55 1 36-43
[99]
Yin D, Kannan R, Bartlett P (2019) Rademacher complexity for adversarially robust generalization. In: International conference on machine learning, PMLR, pp 7085–7094
[100]
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Understanding deep learning requires rethinking generalization. In: International conference on learning representations
[101]
Zheng S, Meng Q, Zhang H, Chen W, Yu N, and Liu TY Capacity control of Relu neural networks by basis-path norm Proc AAAI Conf Artif Intell 2019 33 5925-5932
[102]
Ziegler GM Lectures on polytopes 2012 Berlin Springer
[103]
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. In: International conference on learning representations

Cited By

View all

Index Terms

  1. Model complexity of deep learning: a survey
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Knowledge and Information Systems
          Knowledge and Information Systems  Volume 63, Issue 10
          Oct 2021
          226 pages

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 01 October 2021
          Accepted: 09 August 2021
          Revision received: 02 August 2021
          Received: 08 March 2021

          Author Tags

          1. Deep learning
          2. Deep neural network
          3. Model complexity
          4. Expressive capacity

          Qualifiers

          • Research-article

          Funding Sources

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 29 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media