skip to main content
research-article

A Practical Approach for Employing Tensor Train Decomposition in Edge Devices

Published: 16 February 2024 Publication History

Abstract

Deep Neural Networks (DNN) have made significant advances in various fields including speech recognition and image processing. Typically, modern DNNs are both compute and memory intensive, therefore their deployment in low-end devices is a challenging task. A well-known technique to address this problem is Low-Rank Factorization (LRF), where a weight tensor is approximated by one or more lower-rank tensors, reducing both the memory size and the number of executed tensor operations. However, the employment of LRF is a multi-parametric optimization process involving a huge design space where different design points represent different solutions trading-off the number of FLOPs, the memory size, and the prediction accuracy of the DNN models. As a result, extracting an efficient solution is a complex and time-consuming process. In this work, a new methodology is presented that formulates the LRF problem as a (FLOPs vs. memory vs. prediction accuracy) Design Space Exploration (DSE) problem. Then, the DSE space is drastically pruned by removing inefficient solutions. Our experimental results prove that the design space can be efficiently pruned, therefore extract only a limited set of solutions with improved accuracy, memory, and FLOPs compared to the original (non-factorized) model. Our methodology has been developed as a stand-alone, parameterized module integrated into T3F library of TensorFlow 2.X.

References

[1]
Hussain F, Hussain R, Hassan SA, and Hossain E Machine learning in IoT security: current solutions and future challenges IEEE Commun. Surv. Tutor. 2020 22 3 1686-1721
[2]
Saraswat S, Gupta HP, and Dutta T A writing activities monitoring system for preschoolers using a layered computing infrastructure IEEE Sens. J. 2020 20 3871-3878
[3]
Mishra, A., Latorre, J.A., Pool, J., Stosic, D., Stosic, D., Venkatesh, G., Yu, C., Micikevicius, P.: Accelerating sparse deep neural networks. arXiv:2104.08378 (2021)
[4]
Akmandor AO, YIN H, and Jha NK Smart, secure, yet energy-efficient, internet-of-things sensors IEEE Trans. Multi-Scale Comput. Syst. 2018 4 914-930
[5]
Long X, Ben Z, and Liu Y A survey of related research on compression and acceleration of deep neural networks J. Phys. Conf. Ser. 2019 1213 052003
[6]
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. arXiv:1710.09282 (2017)
[7]
Pasandi, M.M., Hajabdollahi, M., Karimi, N., Samavi, S.: Modeling of pruning techniques for deep neural networks simplification. arXiv:2001.04062 (2020)
[8]
Song, Z., Fu, B., Wu, F., Jiang, Z., Jiang, L., Jing, N., Liang, X.: DRQ: dynamic region-based quantization for deep neural network acceleration. In: ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), 29 May–3 June 2020 (2020)
[9]
Huang, F., Zhang, L., Yang, Y., Zhou, X.: Probability weighted compact feature for domain adaptive retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 14–19 June 2020 (2020)
[10]
Blakeney C, Li X, Yan Y, and Zong Z Parallel Blockwise knowledge distillation for deep neural network compression IEEE Trans. Parallel Distrib. Syst. 2021 32 1765-1776
[11]
Phan, A.-H., Sobolev, K., Sozykin, K., Ermilov, D., Gusak, J., Tichavskỳ, P., Glukhov, V., Oseledets, I., Cichocki, A.: Stable low-rank tensor decomposition for compression of convolutional neural network. In: European Conference on Computer Vision, 23–28 August 2020 (2020)
[12]
He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Soft filter pruning for accelerating deep convolutional neural networks. arXiv:1808.06866 (2018)
[13]
He, Y., Kang, G., Dong, X., Fu, Y., Yang, Y.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 22–29 October 2017 (2017)
[14]
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, 7–12 December 2017 (2015)
[15]
Gou J, Yu B, and Maybank SJ Knowledge distillation: a survey Int. J. Comput. Vis. 2021 129 1789-1819
[16]
Novikov A, Izmailov P, Khrulkov V, Figurnov M, and Oseledets IV Tensor train decomposition on tensorflow (t3f) J. Mach. Learn. Res. 2020 21 30 1-7
[17]
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., others.: TensorFlow: a system for Large-Scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2–4 November 2016 (2016)
[18]
Kokhazadeh, M., Keramidas, G., Kelefouras, V., Stamoulis, I.: A Design space exploration methodology for enabling tensor train decomposition in edge devices. In: International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XXII), 3–7 July 2022 (2022)
[19]
Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 26–31 May 2013 (2013)
[20]
Zhang, J., Lei, Q., Dhillon, I.: Stabilizing gradients for deep neural networks via efficient SVD parameterization. In: Proceedings of the 35th International Conference on Machine Learning, 10–15 Jul 2018 (2018)
[21]
Bejani MM and Ghatee M Theory of adaptive SVD regularization for deep neural networks Neural Netw. 2020 128 33-46
[22]
Swaminathan S, Garg D, Kannan R, and Andres F Sparse low rank factorization for deep neural network compression Neurocomputing 2020 398 185-196
[23]
Chorti, A., Picard, D.: Rate analysis and deep neural network detectors for SEFDM FTN systems. arXiv:2103.02306 (2021)
[24]
Ganev, I., van Laarhoven, T., Walters, R.: Universal approximation and model compression for radial neural networks. arXiv:2107.02550 (2021)
[25]
Chee, J., Renz, M., Damle, A., De Sa, C.: Pruning neural networks with interpolative decompositions. arXiv:2108.00065 (2021)
[26]
Chan, T.K., Chin, C.S., Li, Y.: Non-negative matrix factorization-convolutional neural network (NMF-CNN) for sound event detection. arXiv:2001.07874 (2020)
[27]
Li, D., Wang, X., Kong, D.: Deeprebirth: Accelerating deep neural network execution on mobile devices. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2–7 February 2018 (2018)
[28]
Bai Z, Li Y, Woźniak M, Zhou M, and Li D Decomvqanet: decomposing visual question answering deep network via tensor decomposition and regression Pattern Recognit. 2021 110 107538
[29]
Frusque, G., Michau, G., Fink, O.: Canonical Polyadic Decomposition and Deep Learning for Machine Fault Detection. arXiv:2107.09519 (2021)
[30]
Ma R, Lou J, Li P, and Gao J Reconstruction of generative adversarial networks in cross modal image generation with canonical polyadic decomposition Wireless Commun. Mobile Comput. 2021 2021 1747-1756
[31]
Kolda TG and Bader BW Tensor decompositions and applications SIAM Rev. 2009 51 455-500
[32]
Idelbayev, Y., Carreira-Perpinan, M.A.: Low-rank compression of neural nets: learning the rank of each layer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14–19 June 2020 (2020)
[33]
Oseledets IV Tensor-train decomposition SIAM J. Sci. Comput. 2011 33 2295-2317
[34]
Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.P.: Tensorizing neural networks. In: Advances in Neural Information Processing Systems, Vol. 28 (2015)
[35]
Pollock DSG Multidimensional arrays, indices and Kronecker products Econometrics 2021 9 18-33
[36]
Golub GH and Van Loan CF Matrix Computations 2013 Baltimore JHU Press
[37]
Hawkins C, Liu X, and Zhang Z Towards compact neural networks via end-to-end training: A Bayesian tensor approach with automatic rank determination SIAM J. Math. Data Sci. 2022 4 46-71
[38]
Cheng, Z., Li, B., Fan, Y., Bao, Y.: A novel rank selection scheme in tensor ring decomposition based on reinforcement learning for deep neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4–8 May 2020 (2020)
[39]
Kim T, Lee J, and Choe Y Bayesian optimization-based global optimal rank selection for compression of convolutional neural networks IEEE Access 2020 8 17605-17618
[40]
LeCun, Y., others.: Lenet-5, convolutional neural networks. 20(5), 14 (2015). http://yann.lecun.com/exdb/lenet
[41]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, vol. 25 (2012)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Parallel Programming
International Journal of Parallel Programming  Volume 52, Issue 1-2
Apr 2024
123 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 16 February 2024
Accepted: 19 January 2024
Received: 23 January 2023

Author Tags

  1. Deep neural networks
  2. Model compression
  3. Low-rank factorization
  4. Tensor train decomposition
  5. Design space exploration

Qualifiers

  • Research-article

Funding Sources

  • H2020 Affordable5G EU Project
  • Aristotle University of Thessaloniki

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media