skip to main content
10.1145/3421558.3421559acmotherconferencesArticle/Chapter ViewAbstractPublication PagesipmvConference Proceedingsconference-collections
research-article

Stochastic Model Pruning via Weight Dropping Away and Back

Published: 25 November 2020 Publication History

Abstract

Deep neural networks have dramatically achieved great success on a variety of challenging tasks. However, most successful DNNs have an extremely complex structure, leading to extensive research on model compression. As a significant area of progress in model compression, traditional gradual pruning approaches involve an iterative prune-retrain procedure and may suffer from two critical issues: local importance judgment, where the pruned weights are merely unimportant in the current model; and an irretrievable pruning process, where the pruned weights have no chance to come back. Addressing these two issues, this paper proposes the Drop Pruning approach, which leverages stochastic optimization in the pruning process by introducing a drop strategy at each pruning step, namely, drop away, which stochastically deletes some unimportant weights, and drop back, which stochastically recovers some pruned weights. The suitable choice of drop probabilities decreases the model size during the pruning process and helps it flow to the target sparsity. Compared to the Bayesian approaches that stochastically train a compact model for pruning, we directly aim at stochastic gradual pruning. We provide a detailed analysis showing that the drop away and drop back approaches have individual contributions. Moreover, Drop Pruning can achieve competitive compression performance and accuracy on many benchmark tasks compared with state-of-the-art weights pruning and Bayesian training approaches.

References

[1]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep resid-ual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[2]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifi-cation with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
[3]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[4]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.
[5]
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolu-tion network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision. 1520–1528.
[6]
Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2017. A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017).
[7]
Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational dropout sparsifies deep neural networks. arXiv preprint arXiv:1701.05369 (2017).
[8]
Karen Ullrich, Edward Meeds, and Max Welling. 2017. Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008 (2017).
[9]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135–1143.
[10]
Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017).
[11]
Christos Louizos, Max Welling, and Diederik P Kingma. 2017. Learning Sparse Neural Networks through L_0 Regularization. arXiv preprint arXiv:1712.01312 (2017).
[12]
Bin Dai, Chen Zhu, and David Wipf. 2018. Compressing neural networks using the variational information bottleneck. arXiv preprint arXiv:1802.10399 (2018).
[13]
Aidan N. Gomez, Ivan Zhang, Kevin Swersky, Yarin Gal, and Geoffrey E. Hinton. 2018. Targeted Dropout. In 32nd Conference on Neural Information Processing Systems.
[14]
Durk P Kingma, Tim Salimans, and Max Welling. 2015. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems. 2575–2583.
[15]
Juho Lee, Saehoon Kim, Jaehong Yoon, Hae Beom Lee, Eunho Yang, and Sungjoo Hwang. 2018. Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout. arXiv preprint arXiv:1805.10896 (2018).
[16]
Hirotogu Akaike. 1998. Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike. Springer, 199–213.
[17]
Gideon Schwarz 1978. Estimating the dimension of a model. The annals of statistics 6, 2 (1978), 461–464.
[18]
Ingrid Daubechies, Michel Defrise, and Christine De Mol. 2004. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 57, 11 (2004), 1413–1457.
[19]
Babak Hassibi and David G Stork. 1993. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in neural information processing systems. 164–171.
[20]
Yann LeCun, John S Denker, and Sara A Solla. 1990. Optimal brain damage. In Advances in neural information processing systems. 598–605.
[21]
Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, 2016. Dsd: Dense-sparse-dense training for deep neural networks. arXiv preprint arXiv:1607.04381 (2016).
[22]
John K Karlof. 2005. Integer programming: theory and practice. CRC Press.
[23]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929–1958.
[24]
C. Sun, W. Sun, X. Wang, and Q. Zhou. 2018. Potential Game Theoretic Learning for the Minimal Weighted Vertex Cover in Distributed Networking Systems. IEEE Transactions on Cybernetics PP, 99 (2018), 1–11.
[25]
Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.
[26]
Namhoon Lee, Thalaiyasingam Ajanthan, and Philip HS Torr. 2018. SNIP: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340 (2018).
[27]
Chih-Kuan Yeh, Ian EH Yen, Hong-You Chen, Chun-Pei Yang, Shou-De Lin, and Pradeep Ravikumar. 2018. DEEP-TRIM: REVISITING L1 REGULARIZATION FOR CONNECTION PRUNING OF DEEP NETWORK. (2018).
[28]
Xin Dong, Shangyu Chen, and Sinno Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Advances in Neural Information Processing Systems. 4857–4867.
[29]
Wenyuan Zeng and Raquel Urtasun. 2018. MLPrune: Multi-Layer Pruning for Automated Neural Network Compression. (2018).
[30]
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems. 1379– 1387.
[31]
Xavier Bouthillier, Kishore Konda, Pascal Vincent, and Roland Memisevic. 2015. Dropout as data augmentation. arXiv preprint arXiv:1506.08700 (2015).
[32]
Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).
[33]
Bin Dai, Chen Zhu, and David Wipf. 2018. Compressing neural networks using the variational information bottleneck. arXiv preprint arXiv:1802.10399 (2018).
[34]
Yarin Gal, Jiri Hron, and Alex Kendall. 2017. Concrete dropout. In Advances in Neural Information Processing Systems. 3581–3590.
[35]
Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, and Dmitry P Vetrov. 2017. Structured bayesian pruning via log-normal multiplicative noise. In Ad-vances in Neural Information Processing Systems. 6775–6784.
[36]
Suraj Srinivas and R Venkatesh Babu. 2016. Generalized dropout. arXiv preprint arXiv:1611.06791 (2016).
[37]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.
[38]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[39]
Dongsoo Lee, Parichay Kapoor, and Byeongwook Kim. 2018. DeepTwist: Learning Model Compression via Occasional Weight Distortion. arXiv preprint arXiv:1810.12823 (2018).

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IPMV '20: Proceedings of the 2020 2nd International Conference on Image Processing and Machine Vision
August 2020
194 pages
ISBN:9781450388412
DOI:10.1145/3421558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Compression
  2. Deep learning
  3. Deep neural networks
  4. Neural networks pruning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IPMV 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media