skip to main content
research-article

CNN-Grinder: : From Algorithmic to High-Level Synthesis descriptions of CNNs for Low-end-low-cost FPGA SoCs

Published: 01 March 2020 Publication History

Abstract

Although High-Level Synthesis (HLS) tools have been in the scene for almost fifteen years, researchers have been reluctant to use them for accelerating their algorithms on FPGA SoCs. We present CNN-Grinder, a template-driven workflow for converting algorithmic descriptions of mobile-friendly convolutional neural networks (CNNs), such as SqueezeNet v1.1 and ZynqNet, to HLS code which can be used for programming low-end-low-cost FPGA SoCs. In contrast to other works, which from the user perspective are acting as a black box, CNN-Grinder does not hide its inner workings by automating the procedure of algorithmic-to-HLS description but it exposes every step in a clear and concise way. CNN-Grinder provides the means to developers to map a CNN on an FPGA SoC by providing easy to follow steps and templates which are not constrained to specific CNN architectures and FPGA devices. Our workflow is accompanied by the SqueezeJet-2 accelerator, which is used for the acceleration of the convolutional and the max-pooling layers of the SqueezeNet v1.1 and the ZynqNet CNNs making possible to achieve more than 10fps CNN inference at 100MHz using a batch size equal to 1 on a low-end-low-cost FPGA SoC such as the Xilinx XC7Z020. Finally, an analytical model of the SqueezeJet-2 accelerator is developed and evaluated against related results produced by the Xilinx Vivado HLS tool.

References

[1]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: convolutional architecture for fast feature embedding, Proceedings of the 22nd ACM International Conference on Multimedia, ACM, 2014, pp. 675–678.
[2]
A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.
[3]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, ImageNet large scale visual recognition challenge, Int. J. Comput. Vis. (IJCV) 115 (3) (2015) 211–252,.
[4]
F. Iandola, K. Keutzer, Small neural nets are beautiful: enabling embedded systems with small deep-neural-network architectures, Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion, ACM, 2017, p. 1.
[5]
F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer, Squeezenet: alexnet-level accuracy with 50x fewer parameters and  <  0.5 mb model size, (2016) arXiv:1602.07360.
[6]
D. Gschwend, Zynqnet: An fpga-accelerated embedded convolutional neural network, 2016, Master Thesis ETH-Zurich: Swiss Federal Institute of Technology Zurich.
[7]
P.G. Mousouliotis, K.L. Panayiotou, E.G. Tsardoulias, L.P. Petrou, A.L. Symeonidis, Expanding a robot’s life: Low power object recognition via FPGA-based DCNN deployment, Modern Circuits and Systems Technologies (MOCAST), 2018 7th International Conference on, IEEE, 2018, pp. 1–4.
[8]
I. Athanasiadis, P. Mousouliotis, L. Petrou, A Framework of Transfer Learning in Object Detection for Embedded Systems, 2018.
[9]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, (2014) arXiv preprint arXiv:1409.1556.
[10]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9.
[11]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[12]
K. Abdelouahab, M. Pelcat, J. Serot, F. Berry, Accelerating cnn inference on fpgas: a survey, (2018). arXiv preprint arXiv:1806.01683.
[13]
P.G. Mousouliotis, L.P. Petrou, Software-defined FPGA accelerator design for mobile deep learning applications, International Symposium on Applied Reconfigurable Computing, Springer, 2019, pp. 68–77.
[14]
P.G. Mousouliotis, L.P. Petrou, SqueezeJet: high-level synthesis accelerator design for deep convolutional neural networks, Applied Reconfigurable Computing. Architectures, Tools, and Applications: 14th International Symposium, ARC 2018, Santorini, Greece, May 2-4, 2018, Proceedings 14, Springer, 2018, pp. 55–66.
[15]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing fpga-based accelerator design for deep convolutional neural networks, Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2015, pp. 161–170.
[16]
M. Motamedi, P. Gysel, V. Akella, S. Ghiasi, Design space exploration of FPGA-based Deep Convolutional Neural Networks., ASP-DAC, 2016, pp. 575–580.
[17]
K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, E. Chung, Accelerating deep convolutional neural networks using specialized hardware, Microsoft Res. (2015).
[18]
J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al., Going deeper with embedded fpga platform for convolutional neural network, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2016, pp. 26–35.
[19]
S.I. Venieris, A. Kouris, C.-S. Bouganis, Toolflows for mapping convolutional neural networks on FPGAs: a survey and future directions, ACM Comput. Surv. (CSUR) 51 (3) (2018) 56.
[20]
S.I. Venieris, C.-S. Bouganis, fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs, Field-Programmable Custom Computing Machines (FCCM), 2016 IEEE 24th Annual International Symposium on, IEEE, 2016, pp. 40–47.
[21]
S.I. Venieris, C.-S. Bouganis, Latency-driven design for FPGA-based convolutional neural networks, Field Programmable Logic and Applications (FPL), 2017 27th International Conference on, IEEE, 2017, pp. 1–8.
[22]
K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, H. Yang, Angel-Eye: a complete design flow for mapping CNN onto embedded FPGA, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 37 (1) (2018) 35–47.
[23]
J.T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, Striving for simplicity: the all convolutional net, (2014). arXiv preprint arXiv:1412.6806.
[24]
B. Wu, A. Wan, X. Yue, P. Jin, S. Zhao, N. Golmant, A. Gholaminejad, J. Gonzalez, K. Keutzer, Shift: a zero flop, zero parameter alternative to spatial convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 9127–9135.
[25]
K. Kwon, A. Amid, A. Gholami, B. Wu, K. Asanovic, K. Keutzer, Co-design of deep neural nets and neural net accelerators for embedded vision applications, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), IEEE, 2018, pp. 1–6.
[26]
A. Gholami, K. Kwon, B. Wu, Z. Tai, X. Yue, P. Jin, S. Zhao, K. Keutzer, Squeezenext: hardware-aware neural network design, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018, pp. 1638–1647.
[27]
Y. Yang, Q. Huang, B. Wu, T. Zhang, L. Ma, G. Gambardella, M. Blott, L. Lavagno, K. Vissers, J. Wawrzynek, et al., Synetgy: algorithm-hardware co-design for convnet accelerators on embedded fpgas, Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2019, pp. 23–32.
[28]
A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: efficient convolutional neural networks for mobile vision applications, (2017) arXiv preprint arXiv:1704.04861.
[29]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L.-C. Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
[30]
X. Zhang, X. Zhou, M. Lin, J. Sun, Shufflenet: an extremely efficient convolutional neural network for mobile devices, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856.
[31]
N. Ma, X. Zhang, H.-T. Zheng, J. Sun, Shufflenet v2: Practical guidelines for efficient cnn architecture design, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 116–131.
[32]
J. Su, J. Faraone, J. Liu, Y. Zhao, D.B. Thomas, P.H. Leong, P.Y. Cheung, Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification, International Symposium on Applied Reconfigurable Computing, Springer, 2018, pp. 16–28.
[33]
L. Bai, Y. Zhao, X. Huang, A CNN accelerator on FPGA using depthwise separable convolution, IEEE Trans. Circuits Syst. II 65 (10) (2018) 1415–1419.
[34]
P. Gysel, J. Pimentel, M. Motamedi, S. Ghiasi, Ristretto: a framework for empirical study of resource-efficient inference in convolutional neural networks, IEEE, 2018,.
[35]
V. Kathail, J. Hwang, W. Sun, Y. Chobe, T. Shui, J. Carrillo, SDSoC: a higher-level programming environment for Zynq SoC and Ultrascale+ MPSoC, Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ACM, 2016.
[36]
F.-J. Streit, M. Letras, S. Wildermann, B. Hackenberg, J. Falk, A. Becher, J. Teich, Model-based design automation of hardware/software co-designs for Xilinx Zynq PSoCs, 2018 International Conference on ReConFigurable Computing and FPGAs (ReConFig), IEEE, 2018, pp. 1–8.
[37]
V. Sze, Y.-H. Chen, T.-J. Yang, J.S. Emer, Efficient processing of deep neural networks: a tutorial and survey, Proc. IEEE 105 (12) (2017) 2295–2329.
[38]
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, ACM Sigplan Notices, 49, ACM, 2014, pp. 269–284.
[39]
Xilinx, Vivado Design Suite User Guide - High-Level Synthesis, 2018, (https://www.xilinx.com/support/documentation/sw_manuals/xilinx2018_2/ug902-vivado-highlevel-synthesis.pdf). [Online; Accessed 14-September-2019].
[40]
[41]
P. Peng, Y. Mingyu, X. Weisheng, Running 8-bit dynamic fixed-point convolutional neural network on low-cost arm platforms, 2017 Chinese Automation Congress (CAC), IEEE, 2017, pp. 4564–4568.

Cited By

View all

Index Terms

  1. CNN-Grinder: From Algorithmic to High-Level Synthesis descriptions of CNNs for Low-end-low-cost FPGA SoCs
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Microprocessors & Microsystems
    Microprocessors & Microsystems  Volume 73, Issue C
    Mar 2020
    392 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 March 2020

    Author Tags

    1. Algorithm-to-HLS Workflow
    2. High-Level Synthesis
    3. FPGA CNN accelerator
    4. Deep learning application
    5. Mobile embedded systems

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media