research-article

CNN-Grinder: : From Algorithmic to High-Level Synthesis descriptions of CNNs for Low-end-low-cost FPGA SoCs

Authors:

Panagiotis G. Mousouliotis,

Loukas P. PetrouAuthors Info & Claims

Volume 73, Issue C

https://doi.org/10.1016/j.micpro.2020.102990

Published: 01 March 2020 Publication History

Abstract

Although High-Level Synthesis (HLS) tools have been in the scene for almost fifteen years, researchers have been reluctant to use them for accelerating their algorithms on FPGA SoCs. We present CNN-Grinder, a template-driven workflow for converting algorithmic descriptions of mobile-friendly convolutional neural networks (CNNs), such as SqueezeNet v1.1 and ZynqNet, to HLS code which can be used for programming low-end-low-cost FPGA SoCs. In contrast to other works, which from the user perspective are acting as a black box, CNN-Grinder does not hide its inner workings by automating the procedure of algorithmic-to-HLS description but it exposes every step in a clear and concise way. CNN-Grinder provides the means to developers to map a CNN on an FPGA SoC by providing easy to follow steps and templates which are not constrained to specific CNN architectures and FPGA devices. Our workflow is accompanied by the SqueezeJet-2 accelerator, which is used for the acceleration of the convolutional and the max-pooling layers of the SqueezeNet v1.1 and the ZynqNet CNNs making possible to achieve more than 10fps CNN inference at 100MHz using a batch size equal to 1 on a low-end-low-cost FPGA SoC such as the Xilinx XC7Z020. Finally, an analytical model of the SqueezeJet-2 accelerator is developed and evaluated against related results produced by the Xilinx Vivado HLS tool.

References

[1]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: convolutional architecture for fast feature embedding, Proceedings of the 22nd ACM International Conference on Multimedia, ACM, 2014, pp. 675–678.

Digital Library

[2]

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.

[3]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet large scale visual recognition challenge, Int. J. Comput. Vis. (IJCV) 115 (3) (2015) 211–252,.

Digital Library

[4]

F. Iandola, K. Keutzer, Small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures, Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, ACM, 2017, p. 1.

[5]

F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer, Squeezenet: alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size, (2016) arXiv:1602.07360.

[6]

D. Gschwend, Zynqnet: An fpga-accelerated embedded convolutional neural network, 2016, Master Thesis ETH-Zurich: Swiss Federal Institute of Technology Zurich.

[7]

P.G. Mousouliotis, K.L. Panayiotou, E.G. Tsardoulias, L.P. Petrou, A.L. Symeonidis, Expanding a robot’s life: Low power object recognition via FPGA-based DCNN deployment, Modern Circuits and Systems Technologies (MOCAST), 2018 7th International Conference on, IEEE, 2018, pp. 1–4.

[8]

I. Athanasiadis, P. Mousouliotis, L. Petrou, A Framework of Transfer Learning in Object Detection for Embedded Systems, 2018.

[9]

K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, (2014) arXiv preprint arXiv:1409.1556.

[10]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.

[11]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.

[12]

K. Abdelouahab, M. Pelcat, J. Serot, F. Berry, Accelerating cnn inference on fpgas: a survey, (2018). arXiv preprint arXiv:1806.01683.

[13]

P.G. Mousouliotis, L.P. Petrou, Software-defined FPGA accelerator design for mobile deep learning applications, International Symposium on Applied Reconfigurable Computing, Springer, 2019, pp. 68–77.

[14]

P.G. Mousouliotis, L.P. Petrou, SqueezeJet: high-level synthesis accelerator design for deep convolutional neural networks, Applied Reconfigurable Computing. Architectures, Tools, and Applications: 14th International Symposium, ARC 2018, Santorini, Greece, May 2-4, 2018, Proceedings 14, Springer, 2018, pp. 55–66.

[15]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing fpga-based accelerator design for deep convolutional neural networks, Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2015, pp. 161–170.

[16]

M. Motamedi, P. Gysel, V. Akella, S. Ghiasi, Design space exploration of FPGA-based Deep Convolutional Neural Networks., ASP-DAC, 2016, pp. 575–580.

[17]

K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, E. Chung, Accelerating deep convolutional neural networks using specialized hardware, Microsoft Res. (2015).

[18]

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al., Going deeper with embedded fpga platform for convolutional neural network, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2016, pp. 26–35.

Digital Library

[19]

S.I. Venieris, A. Kouris, C.-S. Bouganis, Toolflows for mapping convolutional neural networks on FPGAs: a survey and future directions, ACM Comput. Surv. (CSUR) 51 (3) (2018) 56.

[20]

S.I. Venieris, C.-S. Bouganis, fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs, Field-Programmable Custom Computing Machines (FCCM), 2016 IEEE 24th Annual International Symposium on, IEEE, 2016, pp. 40–47.

[21]

S.I. Venieris, C.-S. Bouganis, Latency-driven design for FPGA-based convolutional neural networks, Field Programmable Logic and Applications (FPL), 2017 27th International Conference on, IEEE, 2017, pp. 1–8.

[22]

K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, H. Yang, Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37 (1) (2018) 35–47.

[23]

J.T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: the all convolutional net, (2014). arXiv preprint arXiv:1412.6806.

[24]

B. Wu, A. Wan, X. Yue, P. Jin, S. Zhao, N. Golmant, A. Gholaminejad, J. Gonzalez, K. Keutzer, Shift: a zero flop, zero parameter alternative to spatial convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9127–9135.

[25]

K. Kwon, A. Amid, A. Gholami, B. Wu, K. Asanovic, K. Keutzer, Co-design of deep neural nets and neural net accelerators for embedded vision applications, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), IEEE, 2018, pp. 1–6.

[26]

A. Gholami, K. Kwon, B. Wu, Z. Tai, X. Yue, P. Jin, S. Zhao, K. Keutzer, Squeezenext: hardware-aware neural network design, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 1638–1647.

[27]

Y. Yang, Q. Huang, B. Wu, T. Zhang, L. Ma, G. Gambardella, M. Blott, L. Lavagno, K. Vissers, J. Wawrzynek, et al., Synetgy: algorithm-hardware co-design for convnet accelerators on embedded fpgas, Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2019, pp. 23–32.

[28]

A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: efficient convolutional neural networks for mobile vision applications, (2017) arXiv preprint arXiv:1704.04861.

[29]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.

[30]

X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: an extremely efficient convolutional neural network for mobile devices, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856.

[31]

N. Ma, X. Zhang, H.-T. Zheng, J. Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 116–131.

[32]

J. Su, J. Faraone, J. Liu, Y. Zhao, D.B. Thomas, P.H. Leong, P.Y. Cheung, Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification, International Symposium on Applied Reconfigurable Computing, Springer, 2018, pp. 16–28.

[33]

L. Bai, Y. Zhao, X. Huang, A CNN accelerator on FPGA using depthwise separable convolution, IEEE Trans. Circuits Syst. II 65 (10) (2018) 1415–1419.

[34]

P. Gysel, J. Pimentel, M. Motamedi, S. Ghiasi, Ristretto: a framework for empirical study of resource-efficient inference in convolutional neural networks, IEEE, 2018,.

[35]

V. Kathail, J. Hwang, W. Sun, Y. Chobe, T. Shui, J. Carrillo, SDSoC: a higher-level programming environment for Zynq SoC and Ultrascale+ MPSoC, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2016.

[36]

F.-J. Streit, M. Letras, S. Wildermann, B. Hackenberg, J. Falk, A. Becher, J. Teich, Model-based design automation of hardware/software co-designs for Xilinx Zynq PSoCs, 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig), IEEE, 2018, pp. 1–8.

[37]

V. Sze, Y.-H. Chen, T.-J. Yang, J.S. Emer, Efficient processing of deep neural networks: a tutorial and survey, Proc. IEEE 105 (12) (2017) 2295–2329.

[38]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, ACM Sigplan Notices, 49, ACM, 2014, pp. 269–284.

[39]

Xilinx, Vivado Design Suite User Guide - High-Level Synthesis, 2018, (https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_2/ug902-vivado-highlevel-synthesis.pdf). [Online; Accessed 14-September-2019].

[40]

Xilinx, SDSoC Environment Debugging Guide, 2018, (https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_2/ug1282-sdsoc-debuggingguide.pdf). [Online; Accessed 14-September-2019].

[41]

P. Peng, Y. Mingyu, X. Weisheng, Running 8-bit dynamic fixed-point convolutional neural network on low-cost arm platforms, 2017 Chinese Automation Congress (CAC), IEEE, 2017, pp. 4564–4568.

Cited By

Aoki KYamawaki A(2024)Development of Sprite Drawing Hardware Combining High-Level Synthesis and FPGA Internal MemoryProceedings of the 2024 6th International Electronics Communication Conference10.1145/3686625.3686632(37-41)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3686625.3686632
Otani YYamawaki A(2024)Performance variation caused by sprite drawing pattern for high-level synthesized sprite drawing hardwareProceedings of the 2024 6th International Electronics Communication Conference10.1145/3686625.3686626(1-5)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3686625.3686626
Tani HYamawaki A(2024)Light-weight color image conversion like pencil drawing for high-level synthesized hardwareArtificial Life and Robotics10.1007/s10015-023-00927-229:1(29-36)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s10015-023-00927-2
Show More Cited By

Index Terms

CNN-Grinder: From Algorithmic to High-Level Synthesis descriptions of CNNs for Low-end-low-cost FPGA SoCs
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-bit-width CNN Accelerator with Systolic-in-Systolic Dataflow and Single DSP Multiple Multiplication Scheme
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Multi-bit-width neural network enlightens a promising method for high performance yet energy efficient edge computing due to its balance between software algorithm accuracy and hardware efficiency. To date, FPGA has been one of the core hardware ...
A Unified FPGA-Based System Architecture for 2-D Discrete Wavelet Transform

This paper presents a novel unified and programmable 2-D Discrete Wavelet Transform (DWT) system architecture, which was implemented using a Field Programmable Gate Array (FPGA)-based Nios II soft-core processor working in combination with custom ...
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...

Comments

Information & Contributors

Information

Published In

cover image Microprocessors & Microsystems

Microprocessors & Microsystems Volume 73, Issue C

Mar 2020

392 pages

ISSN:0141-9331

Issue’s Table of Contents

Copyright © 2020.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 March 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Aoki KYamawaki A(2024)Development of Sprite Drawing Hardware Combining High-Level Synthesis and FPGA Internal MemoryProceedings of the 2024 6th International Electronics Communication Conference10.1145/3686625.3686632(37-41)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3686625.3686632
Otani YYamawaki A(2024)Performance variation caused by sprite drawing pattern for high-level synthesized sprite drawing hardwareProceedings of the 2024 6th International Electronics Communication Conference10.1145/3686625.3686626(1-5)Online publication date: 19-Jul-2024
https://dl.acm.org/doi/10.1145/3686625.3686626
Tani HYamawaki A(2024)Light-weight color image conversion like pencil drawing for high-level synthesized hardwareArtificial Life and Robotics10.1007/s10015-023-00927-229:1(29-36)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1007/s10015-023-00927-2
Mousouliotis PLeppänen TJääskeläinen PPetrellis NChristakos PKeramidas GAntonopoulos CVoros N(2023)On the OpenCL Support for Streaming Fixed-Function Accelerators on Embedded SoC FPGAsApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-42921-7_4(51-65)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-42921-7_4
Hussain HTamizharasan PRahul C(2022)Design possibilities and challenges of DNN models: a review on the perspective of end devicesArtificial Intelligence Review10.1007/s10462-022-10138-z55:7(5109-5167)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1007/s10462-022-10138-z
Kabir EPoudel AAklah ZHuang MAndrews D(2022)A Runtime Programmable Accelerator for Convolutional and Multilayer Perceptron Neural Networks on FPGAApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-19983-7_3(32-46)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-19983-7_3

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents