research-article

Open access

Denseflex: A Low Rank Factorization Methodology for Adaptable Dense Layers in DNNs

Authors:

Milad Kokhazadeh,

Georgios Keramidas,

Vasilios Kelefouras,

Iakovos StamoulisAuthors Info & Claims

CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers

Pages 21 - 31

https://doi.org/10.1145/3649153.3649183

Published: 02 July 2024 Publication History

Abstract

Low-Rank Factorization (LRF) is a popular compression technique used in Deep Neural Networks (DNNs). LRF can reduce both the memory size and the arithmetic operations in a DNN layer by approximating a weight tensor/matrix by two or more smaller tensors/matrices. Employing LRF to DNN is a challenging task for several reasons. First, the exploration space is massive and different solutions provide different trade-offs among memory, FLOPs, inference time, and validation accuracy; second, multiple DNN layers and multiple LRF algorithms must be considered; third, every extracted solution must undergo through a calibration phase and this makes the LRF process time-consuming. In this paper, a methodology, called Denseflex, is presented that formulates the LRF problem as an inference time vs. FLOPs vs. memory vs. validation accuracy Design Space Exploration (DSE) problem. Moreover, to the best of our knowledge, this is the first work that proposes a methodology to efficiently combine two different LRF methods (Singular Value Decomposition -SVD- and Tensor Train Decomposition -TTD-) in the same framework. Denseflex is formulated as a design tool in which the user can provide specific memory, FLOPs, and/or execution time constraints and the tool will output a set of solutions that meet the given constraints avoiding the time-consuming re-training phases. Our results indicate that our approach is able to prune the design space by 62% (on average) over related works for nine DNN models (up to 88% in AlexNet), while the extracted LRF solutions exhibit both lower memory footprints and lower execution times compared to the initial model.

References

[1]

G. Lee, and K. Lee. DNN Compression by ADMM-based Joint Pruning. Journal of Knowledge-Based Systems, 2022

[2]

J. Xiao, C. Zhang, Y. Gong, M. Yin, Y. Sui, L. Xiang, D. Tao, and B. Yuan. HALOC: Hardware-Aware Automatic Low-Rank Compression for Compact Neural Networks. arXiv preprint arXiv:2301.09422, 2023

[3]

X. Xu, Y. Ding, S. X. Hu, M. Niemier, J. Cong, Y. Hu, and Y. Shi. Scaling for Edge Inference of Deep Neural Networks. Journal of Nature Electronics, 2018

[4]

M. Eo, S. Kang, and W. Rhee. An Effective Low-Rank Compression with a Joint Rank Selection Followed by a Compression-Friendly Training. Journal of Neural Networks, 2023

[5]

X. Long, Z. Ben, and Y. Liu. A Survey of Related Research on Compression and Acceleration of Deep Neural Networks. Journal of Physics: Conference Series, 2019

[6]

T. N. Sainath, B. Kingsbury, V. Sindhwani, E. Arisoy, and B. Ramabhadran. Low-Rank Matrix Factorization for Deep Neural Network Training with High-Dimensional Output Targets. IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

[7]

G.C. Marinó, A. Petrini, D. Malchiodi, and M. Frasca. Deep Neural Networks Compression: A Comparative Survey and Choice Recommendations. Journal of Neurocomputing, 2023

[8]

H. Jin, D. Wu, S. Zhang, X. Zou, S. Jin, D. Tao, Q. Liao, and W. Xia. Design of a Quantization-based DNN Delta Compression Framework for Model Snapshots and Federated Learning. Transactions on Parallel & Distributed Systems, 2023

[9]

F. Karimzadeh, and A. Raychowdhury. Towards Energy Efficient DNN Accelerator via Sparsified Gradual Knowledge Distillation. International Conference on Very Large Scale Integration, 2022

[10]

M. Riera, J. M. Arnau, and A. González. DNN Pruning with Principal Component Analysis and Connection Importance Estimation. Journal of Systems Architecture, 2022

[11]

K. Sobolev, D. Ermilov, A. H. Phan, and A. Cichocki. PARS: Proxy-based Automatic Rank Selection for Neural Network Compression via Low-Rank Weight Approximation. Journal of Mathematics, 2022

[12]

M. Kokhazadeh, G. Keramidas, V. Kelefouras, and I. Stamoulis. A Design Space Exploration method for Enabling Tensor Train Decomposition in Edge Devices. International Conference of Embedded Computer Systems: Architectures, Modeling, and Simulation, 2022

Digital Library

[13]

K. Lange, and K. Lange. Singular Value Decomposition. Numerical Analysis for Statisticians, 2010

[14]

A. Novikov, D. Podoprikhin, A. Osokin, and D. P. Vetrov. Tensorizing Neural Networks. International Conference on Advances in Neural Information Processing Systems, 2015

[15]

A. Novikov, P. Izmailov, V. Khrulkov, M. Figurnov, and I. Oseledets. Tensor Train Tecomposition on Tensorflow (t3f). Journal of Machine Learning Research, 2020

[16]

J. Zhang, Q. Lei, andI. Dhillon. Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization. International Conference on Machine Learning, 2018

[17]

A. Chorti, and D. Picard. Rate Analysis and Deep Neural Network Detectors for SEFDM FTN Systems. arXiv preprint arXiv:2103.02306, 2021

[18]

J. Chee, M. Renz, A. Damle, and C. D. Sa. Pruning Neural Networks with Inter-polative Decompositions. arXiv preprint arXiv:2108.00065, 2021

[19]

T. K. Chan, C. S. Chin, and Y. Li. Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) for Sound Event Detection. arXiv preprint arXiv:2001.07874, 2020

[20]

Z. Bai, Y. Li, M. Woźniak, M. Zhou, and D. Li. DecomVQANet: Decomposing Visual Question Answering Deep Network via Tensor Decomposition and Regression. Journal of Pattern Recognition, 2021

[21]

G. Frusque, G. Michau, and O. Fink. Canonical Polyadic Decomposition and Deep Learning for Machine Fault Detection. arXiv preprint arXiv:2107.09519, 2021

[22]

Y. Idelbayev, and M. A. Carreira-Perpinán. Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer. International Conference on Computer Vision and Pattern Recognition, 2020

[23]

D. Kolosov, V. Kelefouras, P. Kourtessis and I. Mporas. Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection. Journal of IEEE Access, 2022

[24]

H. H. Phan, and N. S. Vu. Information Theory Based Pruning for CNN Compression and its Application to Image Classification and Action Recognition. International Conference on Advanced Video and Signal Based Surveillance, 2019

[25]

M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830, 2016

[26]

E. Ozen, and A. Orailoglu. Squeezing Correlated Neurons for Resource-Efficient Deep Neural Networks. European Conference on Machine Learning and Knowledge Discovery in Databases, 2021

[27]

S. Han, H. Mao, and W. J. Dally. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv preprint arXiv:1510.00149, 2015

[28]

Y. Wang, X. Zhang, L. Xie, J. Zhou, H. Su, B. Zhang, and X. Hu. Pruning from Scratch. Proceedings of the AAAI Conference on Artificial Intelligence, 2020

[29]

M. Lin, R. Ji, Y. Wang, Y. Zhang, B. Zhang, Y. Tian, and L. Shao. Hrank: Filter Pruning Using High-Rank Feature Map. International Conference on Computer Vision and Pattern Recognition, 2020

[30]

E. L. Denton, W. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. Exploiting Linear Structure within Convolutional Networks for Efficient Evaluation. International Conference on Advances in Neural Information Processing Systems, 2014

[31]

M. Jaderberg, A. Vedaldi, and A. Zisserman. Speeding up Convolutional Neural Networks with Low Rank Expansions. arXiv preprint arXiv:1405.3866, 2014

[32]

C. Tai, T. Xiao, Y. Zhang, and X. Wang. Convolutional Neural Networks with Low-Rank Regularization. arXiv preprint arXiv:1511.06067, 2015

[33]

J. M. Alvarez, and M. Salzmann. Compression-Aware Training of Deep Networks. International Conference on Advances in Neural Information Processing Systems, 2017

[34]

C. Li, and C. J. Shi. Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks. European Conference on Computer Vision, 2018

[35]

H. Kim, M. U. K. Khan, and C. M. Kyung. Efficient Neural Network Compression. International Conference on Computer Vision and Pattern Recognition, 2019

[36]

Y. Xu, Y. Li, S. Zhang, W. Wen, B. Wang, W. Dai, Y. Qi, Y. Chen, W. Lin, and H. Xiong. Trained Rank Pruning for Efficient Deep Neural Networks. Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition, 2019

[37]

N. Nethercote, and J. Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. ACM Sigplan notices, 2007

Digital Library

[38]

Y. LeCun. LeNet-5, Convolutional Neural Networks. http://yann.lecun.com/exdb/lenet, 2015

[39]

I. V. Oseledets. Tensor-Train Decomposition. SIAM Journal on Scientific Computing, 2011

[40]

TFLite Model Benchmark Tool. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark

[41]

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner. Gradient-based Learning Applied to Document Recognition. Proceedings of the IEEE, 1998

[42]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems, 2012

Digital Library

[43]

K. Simonyan and A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. Conference on Neural Information Processing Systems, 2014

[44]

J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel. The German Traffic Sign Recognition Benchmark: a Multi-Class Classification Competition. International Joint Conference on Neural Networks, 2011

[45]

A. Anand, T. Kadian, M. K. Shetty, and A. Gupta. Explainable AI Decision Model for ECG Data of Cardiac Disorders. Biomedical Signal Processing and Control, 2022

[46]

B.U. Demirel, A. H. Dogan, and M. A. Al Faruque. Two Might Do: A Beat-by-Beat Classification of Cardiac Abnormalities Using Deep Learning with Domain-Specific Features. Computing in Cardiology, 2021

Cited By

Kokhazad ZBournas CGkountelos DKeramidas G(2024)DNN-Based Speech Enhancement on Microcontrollers2024 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI)10.1109/CCCI61916.2024.10736456(1-7)Online publication date: 16-Oct-2024
https://doi.org/10.1109/CCCI61916.2024.10736456
Dantas PSabino da Silva WCordeiro LCarvalho C(2024)A comprehensive review of model compression techniques in machine learningApplied Intelligence10.1007/s10489-024-05747-w54:22(11804-11844)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s10489-024-05747-w

Index Terms

Denseflex: A Low Rank Factorization Methodology for Adaptable Dense Layers in DNNs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
  2. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A Practical Approach for Employing Tensor Train Decomposition in Edge Devices
Abstract
Deep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a ...
Tensor completion via a multi-linear low-n-rank factorization model

The tensor completion problem is to recover a low-n-rank tensor from a subset of its entries. The main solution strategy has been based on the extensions of trace norm for the minimization of tensor rank via convex optimization. This strategy bears the ...
A Design Space Exploration Methodology for Enabling Tensor Train Decomposition in Edge Devices
Embedded Computer Systems: Architectures, Modeling, and Simulation
Abstract
Deep Neural Networks (DNN) have made significant advances in various fields, including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive and as a consequence their deployment on edge devices is a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CF '24: Proceedings of the 21st ACM International Conference on Computing Frontiers

May 2024

345 pages

ISBN:9798400705977

DOI:10.1145/3649153

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 July 2024

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CF '24

Sponsor:

SIGMICRO

CF '24: 21st ACM International Conference on Computing Frontiers

May 7 - 9, 2024

Ischia, Italy

Acceptance Rates

CF '24 Paper Acceptance Rate 33 of 105 submissions, 31%;

Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Sponsor:
sigmicro

22nd ACM International Conference on Computing Frontiers

May 28 - 30, 2025

Cagliari , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
131
Total Downloads

Downloads (Last 12 months)131
Downloads (Last 6 weeks)39

Reflects downloads up to 23 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kokhazad ZBournas CGkountelos DKeramidas G(2024)DNN-Based Speech Enhancement on Microcontrollers2024 International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI)10.1109/CCCI61916.2024.10736456(1-7)Online publication date: 16-Oct-2024
https://doi.org/10.1109/CCCI61916.2024.10736456
Dantas PSabino da Silva WCordeiro LCarvalho C(2024)A comprehensive review of model compression techniques in machine learningApplied Intelligence10.1007/s10489-024-05747-w54:22(11804-11844)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s10489-024-05747-w

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents