research-article

Stochastic Model Pruning via Weight Dropping Away and Back

Authors:

Xueshuang Xiang,

Yang HeAuthors Info & Claims

IPMV '20: Proceedings of the 2020 2nd International Conference on Image Processing and Machine Vision

Pages 1 - 9

https://doi.org/10.1145/3421558.3421559

Published: 25 November 2020 Publication History

Abstract

Deep neural networks have dramatically achieved great success on a variety of challenging tasks. However, most successful DNNs have an extremely complex structure, leading to extensive research on model compression. As a significant area of progress in model compression, traditional gradual pruning approaches involve an iterative prune-retrain procedure and may suffer from two critical issues: local importance judgment, where the pruned weights are merely unimportant in the current model; and an irretrievable pruning process, where the pruned weights have no chance to come back. Addressing these two issues, this paper proposes the Drop Pruning approach, which leverages stochastic optimization in the pruning process by introducing a drop strategy at each pruning step, namely, drop away, which stochastically deletes some unimportant weights, and drop back, which stochastically recovers some pruned weights. The suitable choice of drop probabilities decreases the model size during the pruning process and helps it flow to the target sparsity. Compared to the Bayesian approaches that stochastically train a compact model for pruning, we directly aim at stochastic gradual pruning. We provide a detailed analysis showing that the drop away and drop back approaches have individual contributions. Moreover, Drop Pruning can achieve competitive compression performance and accuracy on many benchmark tasks compared with state-of-the-art weights pruning and Bayesian training approaches.

References

[1]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep resid-ual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[2]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classifi-cation with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.

[3]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[4]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.

[5]

Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolu-tion network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision. 1520–1528.

Digital Library

[6]

Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. 2017. A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282 (2017).

[7]

Dmitry Molchanov, Arsenii Ashukha, and Dmitry Vetrov. 2017. Variational dropout sparsifies deep neural networks. arXiv preprint arXiv:1701.05369 (2017).

[8]

Karen Ullrich, Edward Meeds, and Max Welling. 2017. Soft weight-sharing for neural network compression. arXiv preprint arXiv:1702.04008 (2017).

[9]

Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems. 1135–1143.

[10]

Michael Zhu and Suyog Gupta. 2017. To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017).

[11]

Christos Louizos, Max Welling, and Diederik P Kingma. 2017. Learning Sparse Neural Networks through L_0 Regularization. arXiv preprint arXiv:1712.01312 (2017).

[12]

Bin Dai, Chen Zhu, and David Wipf. 2018. Compressing neural networks using the variational information bottleneck. arXiv preprint arXiv:1802.10399 (2018).

[13]

Aidan N. Gomez, Ivan Zhang, Kevin Swersky, Yarin Gal, and Geoffrey E. Hinton. 2018. Targeted Dropout. In 32nd Conference on Neural Information Processing Systems.

[14]

Durk P Kingma, Tim Salimans, and Max Welling. 2015. Variational dropout and the local reparameterization trick. In Advances in Neural Information Processing Systems. 2575–2583.

[15]

Juho Lee, Saehoon Kim, Jaehong Yoon, Hae Beom Lee, Eunho Yang, and Sungjoo Hwang. 2018. Adaptive Network Sparsification via Dependent Variational Beta-Bernoulli Dropout. arXiv preprint arXiv:1805.10896 (2018).

[16]

Hirotogu Akaike. 1998. Information theory and an extension of the maximum likelihood principle. In Selected papers of hirotugu akaike. Springer, 199–213.

[17]

Gideon Schwarz 1978. Estimating the dimension of a model. The annals of statistics 6, 2 (1978), 461–464.

[18]

Ingrid Daubechies, Michel Defrise, and Christine De Mol. 2004. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 57, 11 (2004), 1413–1457.

[19]

Babak Hassibi and David G Stork. 1993. Second order derivatives for network pruning: Optimal brain surgeon. In Advances in neural information processing systems. 164–171.

[20]

Yann LeCun, John S Denker, and Sara A Solla. 1990. Optimal brain damage. In Advances in neural information processing systems. 598–605.

[21]

Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, 2016. Dsd: Dense-sparse-dense training for deep neural networks. arXiv preprint arXiv:1607.04381 (2016).

[22]

John K Karlof. 2005. Integer programming: theory and practice. CRC Press.

[23]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15, 1 (2014), 1929–1958.

Digital Library

[24]

C. Sun, W. Sun, X. Wang, and Q. Zhou. 2018. Potential Game Theoretic Learning for the Minimal Weighted Vertex Cover in Distributed Networking Systems. IEEE Transactions on Cybernetics PP, 99 (2018), 1–11.

[25]

Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S Emer. 2017. Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105, 12 (2017), 2295–2329.

[26]

Namhoon Lee, Thalaiyasingam Ajanthan, and Philip HS Torr. 2018. SNIP: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340 (2018).

[27]

Chih-Kuan Yeh, Ian EH Yen, Hong-You Chen, Chun-Pei Yang, Shou-De Lin, and Pradeep Ravikumar. 2018. DEEP-TRIM: REVISITING L1 REGULARIZATION FOR CONNECTION PRUNING OF DEEP NETWORK. (2018).

[28]

Xin Dong, Shangyu Chen, and Sinno Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. In Advances in Neural Information Processing Systems. 4857–4867.

[29]

Wenyuan Zeng and Raquel Urtasun. 2018. MLPrune: Multi-Layer Pruning for Automated Neural Network Compression. (2018).

[30]

Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient dnns. In Advances In Neural Information Processing Systems. 1379– 1387.

[31]

Xavier Bouthillier, Kishore Konda, Pascal Vincent, and Roland Memisevic. 2015. Dropout as data augmentation. arXiv preprint arXiv:1506.08700 (2015).

[32]

Geoffrey E Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).

[33]

Bin Dai, Chen Zhu, and David Wipf. 2018. Compressing neural networks using the variational information bottleneck. arXiv preprint arXiv:1802.10399 (2018).

[34]

Yarin Gal, Jiri Hron, and Alex Kendall. 2017. Concrete dropout. In Advances in Neural Information Processing Systems. 3581–3590.

[35]

Kirill Neklyudov, Dmitry Molchanov, Arsenii Ashukha, and Dmitry P Vetrov. 2017. Structured bayesian pruning via log-normal multiplicative noise. In Ad-vances in Neural Information Processing Systems. 6775–6784.

[36]

Suraj Srinivas and R Venkatesh Babu. 2016. Generalized dropout. arXiv preprint arXiv:1611.06791 (2016).

[37]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278–2324.

[38]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[39]

Dongsoo Lee, Parichay Kapoor, and Byeongwook Kim. 2018. DeepTwist: Learning Model Compression via Occasional Weight Distortion. arXiv preprint arXiv:1810.12823 (2018).

Cited By

Aswani AR CJames A(2022)Unstructured Weight Pruning in Variability-Aware Memristive Crossbar Neural Networks2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937284(3458-3462)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937284
Yan YLiu BLin WChen YLi KOu JFan C(2022)MCCP: Multi-Collaboration Channel Pruning for Model CompressionNeural Processing Letters10.1007/s11063-022-10984-655:3(2777-2797)Online publication date: 1-Aug-2022
https://doi.org/10.1007/s11063-022-10984-6

Recommendations

Filter Pruning via Probabilistic Model-based Optimization for Accelerating Deep Convolutional Neural Networks
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

Accelerating Deep Convolutional Neural Networks(CNNs) has recently received ever-increasing research focus. Among various approaches proposed in the literature, filter pruning has been regarded as a promising solution, which is due to its advantage in ...
Flexible group-level pruning of deep neural networks for on-device machine learning
DATE '20: Proceedings of the 23rd Conference on Design, Automation and Test in Europe

Network pruning is a promising compression technique to reduce computation and memory access cost of deep neural networks. Pruning techniques are classified into two types: fine-grained pruning and coarse-grained pruning. Finegrained pruning eliminates ...
Loss-aware automatic selection of structured pruning criteria for deep neural network acceleration
Abstract
Structured pruning is a well-established technique for compressing neural networks, making them suitable for deployment in resource-limited edge devices. This study presents an efficient loss-aware automatic selection of structured pruning (LAASP)...
Graphical abstract

Display Omitted
Highlights
- An efficient loss-aware structured pruning technique for slimming CNNs.
- Pruning-while-training approach replacing sequential train-prune-finetune process.
- Automatic selection of pruning criteria with layer-wise variable rate ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IPMV '20: Proceedings of the 2020 2nd International Conference on Image Processing and Machine Vision

August 2020

194 pages

ISBN:9781450388412

DOI:10.1145/3421558

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 November 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IPMV 2020

IPMV 2020: 2020 2nd International Conference on Image Processing and Machine Vision

August 5 - 7, 2020

Bangkok, Thailand

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
118
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Aswani AR CJames A(2022)Unstructured Weight Pruning in Variability-Aware Memristive Crossbar Neural Networks2022 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS48785.2022.9937284(3458-3462)Online publication date: 28-May-2022
https://doi.org/10.1109/ISCAS48785.2022.9937284
Yan YLiu BLin WChen YLi KOu JFan C(2022)MCCP: Multi-Collaboration Channel Pruning for Model CompressionNeural Processing Letters10.1007/s11063-022-10984-655:3(2777-2797)Online publication date: 1-Aug-2022
https://doi.org/10.1007/s11063-022-10984-6

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents