research-article

Model complexity of deep learning: a survey

Authors:

Jiang BianAuthors Info & Claims

Knowledge and Information Systems, Volume 63, Issue 10

Pages 2585 - 2619

https://doi.org/10.1007/s10115-021-01605-0

Published: 01 October 2021 Publication History

Abstract

Model complexity is a fundamental problem in deep learning. In this paper, we conduct a systematic overview of the latest studies on model complexity in deep learning. Model complexity of deep learning can be categorized into expressive capacity and effective model complexity. We review the existing studies on those two categories along four important factors, including model framework, model size, optimization process, and data complexity. We also discuss the applications of deep learning model complexity including understanding model generalization, model optimization, and model selection and design. We conclude by proposing several interesting future directions.

References

[1]

Adams RA and Fournier JJ Sobolev spaces 2003 Amsterdam Elsevier

[2]

Allen-Zhu Z, Li Y, Liang Y (2019) Learning and generalization in overparameterized neural networks, going beyond two layers. In: Advances in neural information processing systems, pp 6155–6166

[3]

Arora R, Basu A, Mianjy P, Mukherjee A (2018) Understanding deep neural networks with rectified linear units. In: International conference on learning representations

[4]

Arora S and Barak B Computational complexity: a modern approach 2009 Cambridge Cambridge University Press

[5]

Ba J, Caruana R (2014) Do deep nets really need to be deep? In: Advances in neural information processing systems, pp 2654–2662

[6]

Balasubramanian V Statistical inference, Occams razor, and statistical mechanics on the space of probability distributions Neural Comput 1997 9 2 349-368

[7]

Barron AR Universal approximation bounds for superpositions of a sigmoidal function IEEE Trans Inf Theory 1993 39 3 930-945

[8]

Bartlett PL, Boucheron S, and Lugosi G Model selection and error estimation Mach Learn 2002 48 1–3 85-113

[9]

Bartlett PL, Foster DJ, Telgarsky MJ (2017) Spectrally-normalized margin bounds for neural networks. In: Advances in neural information processing systems, pp 6240–6249

[10]

Bartlett PL, Harvey N, and Liaw C A Nearly-tight vc-dimension and pseudodimension bounds for piecewise linear neural networks J Mach Learn Res 2019 20 63 1-17

[11]

Bartlett PL, Maiorov V, and Meir R Almost linear vc-dimension bounds for piecewise polynomial networks Neural Comput 1998 10 8 2159-2173

[12]

Bartlett PL and Mendelson S Rademacher and Gaussian complexities: risk bounds and structural results J Mach Learn Res 2002 3 Nov 463-482

[13]

Bengio Y, Delalleau O (2011) On the expressive power of deep architectures. In: International conference on algorithmic learning theory, Springer, pp 18–36

[14]

Bianchini M and Scarselli F On the complexity of neural network classifiers: a comparison between shallow and deep architectures IEEE Trans Neural Netw Learn Syst 2014 25 8 1553-1565

[15]

Bianchini M, Scarselli F (2014) On the complexity of shallow and deep neural network classifiers. In: ESANN

[16]

Bohanec M and Bratko I Trading accuracy for simplicity in decision trees Mach Learn 1994 15 3 223-250

[17]

Bonaccorso G Machine learning algorithms 2017 Birmingham Packt Publishing Ltd

[18]

Bredon GE Topology and geometry 2013 Berlin Springer

[19]

Breiman L, Friedman J, Stone CJ, and Olshen RA Classification and regression trees 1984 Boca Raton CRC Press

[20]

Buhrman H and De Wolf R Complexity measures and decision tree complexity: a survey Theoret Comput Sci 2002 288 1 21-43

[21]

Bulso N, Marsili M, and Roudi Y On the complexity of logistic regression models Neural Comput 2019 31 8 1592-1623

[22]

Cano JR Analysis of data complexity measures for classification Expert Syst Appl 2013 40 12 4820-4831

[23]

Carothers NL Real analysis 2000 Cambridge Cambridge University Press

[24]

Carroll JD and Chang JJ Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition Psychometrika 1970 35 3 283-319

[25]

Cheng Y, Wang D, Zhou P, and Zhang T Model compression and acceleration for deep neural networks: the principles, progress, and challenges IEEE Signal Process Mag 2018 35 1 126-136

[26]

Cherkassky V, Shao X, Mulier FM, and Vapnik VN Model complexity control for regression using vc generalization bounds IEEE Trans Neural Netw 1999 10 5 1075-1089

[27]

Cohen N, Shashua A (2016) Convolutional rectifier networks as generalized tensor decompositions. In: International conference on machine learning, pp 955–963

[28]

Cook S, Dwork C, and Reischuk R Upper and lower time bounds for parallel random access machines without simultaneous writes SIAM J Comput 1986 15 1 87-97

[29]

Cybenko G Approximation by superpositions of a sigmoidal function Math Control Signals Syst 1989 2 4 303-314

[30]

Delalleau O, Bengio Y (2011) Shallow vs. deep sum-product networks. In: Advances in neural information processing systems, pp 666–674

[31]

Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, IEEE, pp 248–255

[32]

Du J The weight of models and complexity Complexity 2016 21 3 21-35

[33]

Frieden B Science from fisher information: a unification 2004 Cambridge Cambridge univ. Press

[34]

Goodfellow I, Bengio Y, and Courville A Deep learning 2016 New York MIT press

[35]

Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: International conference on machine learning, PMLR, pp 1319–1327

[36]

Gühring I, Kutyniok G, Petersen P (2019) Complexity bounds for approximations with deep relu neural networks in sobolev norms

[37]

Hanin B, Rolnick D (2019) Complexity of linear regions in deep networks. In: International conference on machine learning, PMLR, pp 2596–2604

[38]

Hanin B, Sellke M (2017) Approximating continuous functions by relu nets of minimal width. arXiv preprint arXiv:1710.11278

[39]

Hayou S, Doucet A, and Rousseau J On the selection of initialization and activation function for deep neural networks STAT 2018 1050 7

[40]

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

[41]

Hinton G, Vinyals O, and Dean J Distilling the knowledge in a neural network STAT 2015 1050 9

[42]

Höge M, Wöhling T, and Nowak W A primer for model selection: the decisive role of model complexity Water Resour Res 2018 54 3 1688-1715

[43]

Hornik K, Stinchcombe M, and White H Multilayer feedforward networks are universal approximators Neural Netw 1989 2 5 359-366

[44]

Hu X, Liu W, Bian J, Pei J (2020) Measuring model complexity of neural networks with curve activation functions. In: Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1521–1531

[45]

Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning, proceedings of machine learning research, pp 448–456

[46]

Kakade SM, Sridharan K, Tewari A (2009) On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Advances in neural information processing systems, pp 793–800

[47]

Kalimeris D, Kaplun G, Nakkiran P, Edelman B, Yang T, Barak B, Zhang H (2019) Sgd on neural networks learns functions of increasing complexity. In: Advances in neural information processing systems, pp 3491–3501

[48]

Kalman BL, Kwasny SC (1992) Why tanh: choosing a sigmoidal function. In: Proceedings 1992 of IJCNN international joint conference on neural networks, vol 4, IEEE, pp 578–581

[49]

Kawaguchi K, Kaelbling LP, Bengio Y (2017) Generalization in deep learning. arXiv preprint arXiv:1710.05468

[50]

Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: International conference on learning representations

[51]

Khrulkov V, Novikov A, Oseledets I (2018) Expressive power of recurrent neural networks. In: International conference on learning representations

[52]

Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319

[53]

Kileel J, Trager M, Bruna J (2019) On the expressive power of deep polynomial neural networks. In: Advances in neural information processing systems, pp 10310–10319

[54]

Kuurkova V Constructive lower bounds on model complexity of shallow perceptron networks Neural Comput Appl 2018 29 7 305-315

[55]

Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018) Phrase-based and neural unsupervised machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing, pp 5039–5049

[56]

Landsberg JM Tensors: geometry and applications Rep Theory 2012 381 402 3

[57]

Laredo D, Ma SF, Leylaz G, Schütze O, and Sun JQ Automatic model selection for fully connected neural networks Int J Dyn Control 2020 8 4 1063-1079

[58]

LeCun Y, Bengio Y, and Hinton G Deep learning Nature 2015 521 7553 436-444

[59]

Li L (2006) Data complexity in machine learning and novel classification algorithms. PhD thesis, California Institute of Technology

[60]

Liang T, Poggio T, Rakhlin A, Stokes J (2019) Fisher-rao metric, geometry, and complexity of neural networks. In: The 22nd international conference on artificial intelligence and statistics, PMLR, pp 888–896

[61]

Lim TS, Loh WY, and Shih YS A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms Mach Learn 2000 40 3 203-228

[62]

Liu C, Zoph B, Neumann M, Shlens J, Hua W, Li LJ, Fei-Fei L, Yuille A, Huang J, Murphy K (2018) Progressive neural architecture search. In: Proceedings of the European conference on computer vision (ECCV), pp 19–34

[63]

Liu H, Simonyan K, Yang Y (2019) Darts: differentiable architecture search. In: International conference on learning representations

[64]

Lu Z, Pu H, Wang F, Hu Z, Wang L (2017) The expressive power of neural networks: a view from the width. In: Advances in neural information processing systems, pp 6231–6239

[65]

Lundqvist S, Oneto A, Reznick B, and Shapiro B On generic and maximal k-ranks of binary forms J Pure Appl Algebra 2019 223 5 2062-2079

[66]

Maass W Neural nets with superlinear vc-dimension Neural Comput 1994 6 5 877-884

[67]

Mhaskar H, Liao Q, Poggio T (2017). When and why are deep networks better than shallow ones? In: Proceedings of the thirty-first AAAI conference on artificial intelligence, AAAI’17, AAAI Press, pp 2343–2349

[68]

Michel B, Nouy A (2020) Learning with tree tensor networks: complexity estimates and model selection. arXiv preprint arXiv:2007.01165

[69]

Mohri M, Rostamizadeh A, and Talwalkar A Foundations of machine learning 2018 New York MIT press

[70]

Montufar GF, Pascanu R, Cho K, Bengio Y (2014) On the number of linear regions of deep neural networks. In: Advances in neural information processing systems, pp 2924–2932

[71]

Murphy KP Machine learning: a probabilistic perspective 2012 New York MIT press

[72]

Myung IJ The importance of complexity in model selection J Math Psychol 2000 44 1 190-204

[73]

Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning, pp 807–814

[74]

Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I (2020) Deep double descent: where bigger models and more data hurt. In: International conference on learning representations

[75]

Neyshabur B (2017) Implicit regularization in deep learning. arXiv preprint arXiv:1709.01953

[76]

Neyshabur B, Bhojanapalli S, McAllester D, Srebro N (2017) Exploring generalization in deep learning. In: Advances in neural information processing systems, pp 5947–5956

[77]

Neyshabur B, Li Z, Bhojanapalli S, LeCun Y, Srebro N (2018) The role of over-parametrization in generalization of neural networks. In: International conference on learning representations

[78]

Neyshabur B, Tomioka R, Srebro N (2015) In search of the real inductive bias: on the role of implicit regularization in deep learning. In: International conference on learning representations (workshop)

[79]

Neyshabur B, Tomioka R, Srebro N (2015) Norm-based capacity control in neural networks. In: Conference on learning theory, pp 1376–1401

[80]

Nisan N and Szegedy M On the degree of Boolean functions as real polynomials Comput Complex 1994 4 4 301-313

[81]

Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J (2018) Sensitivity and generalization in neural networks: an empirical study. In: International conference on learning representations

[82]

Nwankpa C, Ijomah W, Gachagan A, Marshall S (2021) Activation functions: comparison of trends in practice and research for deep learning. In: International conference on computational sciences and technology (INCCST) pp 124–133

[83]

Oseledets IV Tensor-train decomposition SIAM J Sci Comput 2011 33 5 2295-2317

[84]

Pérez Arribas I (2017) Sobolev spaces and partial differential equations

[85]

Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameters sharing. In: Proceedings of the 35th international conference on machine learning, proceedings of machine learning research, vol 80, pp 4095–4104

[86]

Poggio T, Kawaguchi K, Liao Q, Miranda B, Rosasco L, Boix X, Hidary J, Mhaskar H (2017) Theory of deep learning iii: explaining the non-overfitting puzzle. Massachusetts Institute of Technology CBMM Memo No. 73

[87]

Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems, pp 3360–3368

[88]

Radosavovic I, Johnson J, Xie S, Lo WY, Dollár P (2019) On network design spaces for visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1882–1890

[89]

Raghu M, Poole B, Kleinberg J, Ganguli S, Dickstein JS (2017) On the expressive power of deep neural networks. In: Proceedings of the 34th international conference on machine learning, vol. 70, JMLR, pp 2847–2854

[90]

Rasmussen CE, Ghahramani Z (2001) Occams razor. In: Advances in neural information processing systems, pp 294–300

[91]

Rebentrost P, Gupt B, and Bromley TR Quantum computational finance: Monte carlo pricing of financial derivatives Phys Rev A 2018 98 2 022321

[92]

Serra T, Tjandraatmadja C, Ramalingam S (2018) Bounding and counting linear regions of deep neural networks. In: International conference on machine learning, PMLR, pp 4558–4566

[93]

Spiegelhalter DJ, Best NG, Carlin BP, and Van Der Linde A Bayesian measures of model complexity and fit J R Stat Soc Ser B (Stat Methodol) 2002 64 4 583-639

[94]

Sun R Optimization for deep learning: theory and algorithms J Oper Res Soc China 2020 8 2 249-294

[95]

Tan Y and Wang J A support vector machine with a hybrid kernel and minimal Vapnik-Chervonenkis dimension IEEE Trans Knowl Data Eng 2004 16 4 385-395

[96]

Vapnik V The nature of statistical learning theory 2013 Berlin Springer

[97]

Xu H and Mannor S Robustness and generalization Mach Learn 2012 86 3 391-423

[98]

Yao ACC Decision tree complexity and Betti numbers J Comput Syst Sci 1997 55 1 36-43

[99]

Yin D, Kannan R, Bartlett P (2019) Rademacher complexity for adversarially robust generalization. In: International conference on machine learning, PMLR, pp 7085–7094

[100]

Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Understanding deep learning requires rethinking generalization. In: International conference on learning representations

[101]

Zheng S, Meng Q, Zhang H, Chen W, Yu N, and Liu TY Capacity control of Relu neural networks by basis-path norm Proc AAAI Conf Artif Intell 2019 33 5925-5932

[102]

Ziegler GM Lectures on polytopes 2012 Berlin Springer

[103]

Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. In: International conference on learning representations

Cited By

Li YQi SWang XZhang JCui L(2024)A Novel Tree-Based Method for Interpretable Reinforcement LearningACM Transactions on Knowledge Discovery from Data10.1145/369546418:9(1-22)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3695464
Okolo CLin H(2024)Explainable AI in Practice: Practitioner Perspectives on AI for Social Good and User Engagement in the Global SouthProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3689904.3694707(1-16)Online publication date: 29-Oct-2024
https://dl.acm.org/doi/10.1145/3689904.3694707
Yu WLuo SYu ZCong G(2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677138
Show More Cited By

Index Terms

Model complexity of deep learning: a survey
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
    2. Machine learning approaches
      1. Neural networks
  2. Modeling and simulation
2. Theory of computation
  1. Design and analysis of algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

Survey of Deep Learning Paradigms for Speech Processing
Abstract
Over the past decades, a particular focus is given to research on machine learning techniques for speech processing applications. However, in the past few years, research has focused on using deep learning for speech processing applications. This ...
Complexity measurement of fundamental pseudo-independent models

Pseudo-independent (PI) models are a special class of probabilistic domain model (PDM) where a set of marginally independent domain variables shows collective dependency, a special type of dependency associated with the scope of a set of variables in a ...
Deep Learning Model Selection of Suboptimal Complexity

We consider the problem of model selection for deep learning models of suboptimal complexity. The complexity of a model is understood as the minimum description length of the combination of the sample and the classification or regression model. ...

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems

Knowledge and Information Systems Volume 63, Issue 10

Oct 2021

226 pages

ISSN:0219-1377

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 October 2021

Accepted: 09 August 2021

Revision received: 02 August 2021

Received: 08 March 2021

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YQi SWang XZhang JCui L(2024)A Novel Tree-Based Method for Interpretable Reinforcement LearningACM Transactions on Knowledge Discovery from Data10.1145/369546418:9(1-22)Online publication date: 9-Sep-2024
https://dl.acm.org/doi/10.1145/3695464
Okolo CLin H(2024)Explainable AI in Practice: Practitioner Perspectives on AI for Social Good and User Engagement in the Global SouthProceedings of the 4th ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization10.1145/3689904.3694707(1-16)Online publication date: 29-Oct-2024
https://dl.acm.org/doi/10.1145/3689904.3694707
Yu WLuo SYu ZCong G(2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
https://dl.acm.org/doi/10.1145/3677138
Gorbett MShirazi HRay ISerra ESpezzano F(2024)Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary VectorsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679603(674-684)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679603
Zhang JXu SYang RLi CYang L(2024)HDnGAN: A Channel Estimation Method for Time-Varying mmWave Massive MIMOIEEE Transactions on Wireless Communications10.1109/TWC.2024.345156123:11_Part_2(17189-17204)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TWC.2024.3451561
Clements JLao Y(2024)Reliable Hardware Watermarks for Deep Learning SystemsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.336024032:4(752-762)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TVLSI.2024.3360240
Zheng QJin JShen ZWu LAhmad IXiang Y(2024)Distributed Task Processing Platform for Infrastructure-Less IoT Networks: A Multi-Dimensional Optimization ApproachIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.346954535:12(2392-2404)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TPDS.2024.3469545
Li GHoang DBhardwaj KLin MWang ZMarculescu R(2024)Zero-Shot Neural Architecture Search: Challenges, Solutions, and OpportunitiesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.339542346:12(7618-7635)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1109/TPAMI.2024.3395423
Wang JLiu KGong Y(2024)Vehicle Position Prediction Using Particle Filtering Based on 3D CNN-LSTM ModelIEEE Transactions on Mobile Computing10.1109/TMC.2023.326785323:4(2992-3004)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1109/TMC.2023.3267853
Xu QZhang LQin XZhou Y(2024)A Novel Machine Learning-Based Trust Management Against Multiple Misbehaviors for Connected and Automated VehiclesIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.342242325:11(16775-16790)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1109/TITS.2024.3422423
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents