skip to main content
10.1145/3649153.3649183acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article
Open access

Denseflex: A Low Rank Factorization Methodology for Adaptable Dense Layers in DNNs

Published: 02 July 2024 Publication History

Abstract

Low-Rank Factorization (LRF) is a popular compression technique used in Deep Neural Networks (DNNs). LRF can reduce both the memory size and the arithmetic operations in a DNN layer by approximating a weight tensor/matrix by two or more smaller tensors/matrices. Employing LRF to DNN is a challenging task for several reasons. First, the exploration space is massive and different solutions provide different trade-offs among memory, FLOPs, inference time, and validation accuracy; second, multiple DNN layers and multiple LRF algorithms must be considered; third, every extracted solution must undergo through a calibration phase and this makes the LRF process time-consuming. In this paper, a methodology, called Denseflex, is presented that formulates the LRF problem as an inference time vs. FLOPs vs. memory vs. validation accuracy Design Space Exploration (DSE) problem. Moreover, to the best of our knowledge, this is the first work that proposes a methodology to efficiently combine two different LRF methods (Singular Value Decomposition -SVD- and Tensor Train Decomposition -TTD-) in the same framework. Denseflex is formulated as a design tool in which the user can provide specific memory, FLOPs, and/or execution time constraints and the tool will output a set of solutions that meet the given constraints avoiding the time-consuming re-training phases. Our results indicate that our approach is able to prune the design space by 62% (on average) over related works for nine DNN models (up to 88% in AlexNet), while the extracted LRF solutions exhibit both lower memory footprints and lower execution times compared to the initial model.

References

[1]
G. Lee, and K. Lee. DNN Compression by ADMM-based Joint Pruning. Journal of Knowledge-Based Systems, 2022
[2]
J. Xiao, C. Zhang, Y. Gong, M. Yin, Y. Sui, L. Xiang, D. Tao, and B. Yuan. HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks. arXiv preprint arXiv:2301.09422, 2023
[3]
X. Xu, Y. Ding, S. X. Hu, M. Niemier, J. Cong, Y. Hu, and Y. Shi. Scaling for Edge Inference of Deep Neural Networks. Journal of Nature Electronics, 2018
[4]
M. Eo, S. Kang, and W. Rhee. An Effective Low-Rank Compression with a Joint Rank Selection Followed by a Compression-Friendly Training. Journal of Neural Networks, 2023
[5]
X. Long, Z. Ben, and Y. Liu. A Survey of Related Research on Compression and Acceleration of Deep Neural Networks. Journal of Physics: Conference Series, 2019
[6]
T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran. Low-Rank Matrix Factorization for Deep Neural Network Training with High-Dimensional Output Targets. IEEE International Conference on Acoustics, Speech and Signal Processing, 2013
[7]
G.C. Marinó, A. Petrini, D. Malchiodi, and M. Frasca. Deep Neural Networks Compression: A Comparative Survey and Choice Recommendations. Journal of Neurocomputing, 2023
[8]
H. Jin, D. Wu, S. Zhang, X. Zou, S. Jin, D. Tao, Q. Liao, and W. Xia. Design of a Quantization-based DNN Delta Compression Framework for Model Snapshots and Federated Learning. Transactions on Parallel & Distributed Systems, 2023
[9]
F. Karimzadeh, and A. Raychowdhury. Towards Energy Efficient DNN Accelerator via Sparsified Gradual Knowledge Distillation. International Conference on Very Large Scale Integration, 2022
[10]
M. Riera, J. M. Arnau, and A. González. DNN Pruning with Principal Component Analysis and Connection Importance Estimation. Journal of Systems Architecture, 2022
[11]
K. Sobolev, D. Ermilov, A. H. Phan, and A. Cichocki. PARS: Proxy-based Automatic Rank Selection for Neural Network Compression via Low-Rank Weight Approximation. Journal of Mathematics, 2022
[12]
M. Kokhazadeh, G. Keramidas, V. Kelefouras, and I. Stamoulis. A Design Space Exploration method for Enabling Tensor Train Decomposition in Edge Devices. International Conference of Embedded Computer Systems: Architectures, Modeling, and Simulation, 2022
[13]
K. Lange, and K. Lange. Singular Value Decomposition. Numerical Analysis for Statisticians, 2010
[14]
A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov. Tensorizing Neural Networks. International Conference on Advances in Neural Information Processing Systems, 2015
[15]
A. Novikov, P. Izmailov, V. Khrulkov, M. Figurnov, and I. Oseledets. Tensor Train Tecomposition on Tensorflow (t3f). Journal of Machine Learning Research, 2020
[16]
J. Zhang, Q. Lei, andI. Dhillon. Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization. International Conference on Machine Learning, 2018
[17]
A. Chorti, and D. Picard. Rate Analysis and Deep Neural Network Detectors for SEFDM FTN Systems. arXiv preprint arXiv:2103.02306, 2021
[18]
J. Chee, M. Renz, A. Damle, and C. D. Sa. Pruning Neural Networks with Inter-polative Decompositions. arXiv preprint arXiv:2108.00065, 2021
[19]
T. K. Chan, C. S. Chin, and Y. Li. Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) for Sound Event Detection. arXiv preprint arXiv:2001.07874, 2020
[20]
Z. Bai, Y. Li, M. Woźniak, M. Zhou, and D. Li. DecomVQANet: Decomposing Visual Question Answering Deep Network via Tensor Decomposition and Regression. Journal of Pattern Recognition, 2021
[21]
G. Frusque, G. Michau, and O. Fink. Canonical Polyadic Decomposition and Deep Learning for Machine Fault Detection. arXiv preprint arXiv:2107.09519, 2021
[22]
Y. Idelbayev, and M. A. Carreira-Perpinán. Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer. International Conference on Computer Vision and Pattern Recognition, 2020
[23]
D. Kolosov, V. Kelefouras, P. Kourtessis and I. Mporas. Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection. Journal of IEEE Access, 2022
[24]
H. H. Phan, and N. S. Vu. Information Theory Based Pruning for CNN Compression and its Application to Image Classification and Action Recognition. International Conference on Advanced Video and Signal Based Surveillance, 2019
[25]
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016
[26]
E. Ozen, and A. Orailoglu. Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks. European Conference on Machine Learning and Knowledge Discovery in Databases, 2021
[27]
S. Han, H. Mao, and W. J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv preprint arXiv:1510.00149, 2015
[28]
Y. Wang, X. Zhang, L. Xie, J. Zhou, H. Su, B. Zhang, and X. Hu. Pruning from Scratch. Proceedings of the AAAI Conference on Artificial Intelligence, 2020
[29]
M. Lin, R. Ji, Y. Wang, Y. Zhang, B. Zhang, Y. Tian, and L. Shao. Hrank: Filter Pruning Using High-Rank Feature Map. International Conference on Computer Vision and Pattern Recognition, 2020
[30]
E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting Linear Structure within Convolutional Networks for Efficient Evaluation. International Conference on Advances in Neural Information Processing Systems, 2014
[31]
M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up Convolutional Neural Networks with Low Rank Expansions. arXiv preprint arXiv:1405.3866, 2014
[32]
C. Tai, T. Xiao, Y. Zhang, and X. Wang. Convolutional Neural Networks with Low-Rank Regularization. arXiv preprint arXiv:1511.06067, 2015
[33]
J. M. Alvarez, and M. Salzmann. Compression-Aware Training of Deep Networks. International Conference on Advances in Neural Information Processing Systems, 2017
[34]
C. Li, and C. J. Shi. Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks. European Conference on Computer Vision, 2018
[35]
H. Kim, M. U. K. Khan, and C. M. Kyung. Efficient Neural Network Compression. International Conference on Computer Vision and Pattern Recognition, 2019
[36]
Y. Xu, Y. Li, S. Zhang, W. Wen, B. Wang, W. Dai, Y. Qi, Y. Chen, W. Lin, and H. Xiong. Trained Rank Pruning for Efficient Deep Neural Networks. Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition, 2019
[37]
N. Nethercote, and J. Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. ACM Sigplan notices, 2007
[38]
Y. LeCun. LeNet-5, Convolutional Neural Networks. http://yann.lecun.com/exdb/lenet, 2015
[39]
I. V. Oseledets. Tensor-Train Decomposition. SIAM Journal on Scientific Computing, 2011
[40]
TFLite Model Benchmark Tool. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark
[41]
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner. Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE, 1998
[42]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 2012
[43]
K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. Conference on Neural Information Processing Systems, 2014
[44]
J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: a Multi-Class Classification Competition. International Joint Conference on Neural Networks, 2011
[45]
A. Anand, T. Kadian, M. K. Shetty, and A. Gupta. Explainable AI Decision Model for ECG Data of Cardiac Disorders. Biomedical Signal Processing and Control, 2022
[46]
B.U. Demirel, A. H. Dogan, and M. A. Al Faruque. Two Might Do: A Beat-by-Beat Classification of Cardiac Abnormalities Using Deep Learning with Domain-Specific Features. Computing in Cardiology, 2021

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers
May 2024
345 pages
ISBN:9798400705977
DOI:10.1145/3649153
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

  1. Compression
  2. Deep Neural Networks
  3. Design Space Exploration
  4. Singular Value Decomposition
  5. Tensor Train Decomposition

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CF '24
Sponsor:

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)131
  • Downloads (Last 6 weeks)39
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media