skip to main content
10.1145/3289602.3293925acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Compute-Efficient Neural-Network Acceleration

Published: 20 February 2019 Publication History

Abstract

To enhance the performance of FPGA-based neural-network accelerators, maximizing both operating clock rates and compute efficiency is paramount. Streamlining data movement between memory and compute holds the key to boosting these metrics. To unleash latent performance in FPGA-based inference processors, we outline a convolutional neural network accelerator that operates at 92.9% of the peak FPGA clock rate. First, we map neural-network operators to a minimalist hardware architecture to simplify data movement between memory and compute. Doing so enables the design to close timing at high clock rates. Second, we describe a schedule that keeps compute utilization high. We apply this architecture to classify MNIST, CIFAR-10, and ImageNet datasets. This design achieves 95.5% compute efficiency with GoogLeNet, whose nested topology makes creating an efficient design especially challenging.

References

[1]
Abadi, M. et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16). 265--283.
[2]
Chellapilla, K., Puri, S. and Simard, P. 2006. High Performance Convolutional Neural Networks for Document Processing. In Tenth International Workshop on Frontiers in Handwriting Recognition.
[3]
Chen, Y.H., Krishna, T., Emer, J.S. and Sze, V. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J. Solid-State Circuits. 52, 1 (2017), 127--138.
[4]
Fu, Y., Wu, E. and Sirasao, A. 2017. 8-Bit Dot-Product Acceleration. https://www.xilinx.com/support/documentation/white_papers/wp487-int8-acceleration.pdf.
[5]
Gokhale, V., Zaidy, A., Chang, A.X.M. and Culurciello, E. 2017. Snowflake: An efficient hardware accelerator for convolutional neural networks. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.
[6]
Gysel, P., Pimentel, J., Motamedi, M. and Ghiasi, S. 2018. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 29, 11 (2018), 5784--5789.
[7]
Hegde, G., Siddhartha, Ramasamy, N. and Kapre, N. 2016. CaffePresso. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES '16). 1--10.
[8]
Huang, Y., Shen, J., Qiao, Y., Wen, M. and Zhang, C. 2018. MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA. IEICE Electron. Express. 15, 10 (2018), 1--12.
[9]
Jia, Y. and Shelhamer, E. Convolution in Caffe: A Memo. https://github.com/Yangqing/caffe/wiki/Convolution-in-Caffe:-a-memo.
[10]
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S. and Darrell, T. 2014. Caffe. Proc. ACM Int. Conf. Multimed. - MM '14. (2014).
[11]
Ngo, K. 2016. FPGA Hardware Acceleration of Inception Style Parameter Reduced Convolution Neural Networks. Master's Thesis, School of Inform. and Commun. Technol. (ICT), KTH Royal Institute of Technology, Stockholm, Sweden.
[12]
Paszke, A., Chanan, G., Lin, Z., Gross, S., Yang, E., Antiga, L. and Devito, Z. 2017. Automatic differentiation in PyTorch. In Advances in Neural Information Processing Systems 30.
[13]
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C. and Li, F.-F. 2015. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.
[14]
Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J.K., Shao, C., Mishra, A. and Esmaeilzadeh, H. 2016. From high-level deep neural models to FPGAs. In Proceedings of the Annual International Symposium on Microarchitecture (MICRO). 17:1--17:12.
[15]
Shen, Y., Ferdman, M. and Milder, P. 2017. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In Proc. 44th Annu. Int. Symp. Comput. Architecture (ISCA). 535--547.
[16]
Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. In Proceedings of the IEEE. 2295--2329.
[17]
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A. 2015. Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '15). 1--9.
[18]
Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., Jahre, M. and Vissers, K. 2017. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 65--74.
[19]
Venieris, S.I. and Bouganis, C.-S. 2017. fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs. http://arxiv.org/abs/1711.08740.
[20]
Wang, Y., Xu, J., Han, Y., Li, H. and Li, X. 2016. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family. In 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC). 110:1--110:6.
[21]
Wilkinson., J.H. 1963. Rounding Errors in Algebraic Processes. Natl. Phys. Lab. Notes Appl. Sci. Her Majesty's Station. Off. (HMSO), London. (1963).
[22]
Wu, E., Zhang, X., Berman, D. and Cho, I. 2017. A high-throughput reconfigurable processing array for neural networks. In 2017 27th International Conference on Field Programmable Logic and Applications. 1--4.
[23]
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B. and Cong, J. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 161--170.
[24]
Zhong, G., Dubey, A., Cheng, T. and Mitra, T. 2018. Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC. http://arxiv.org/abs/1804.00706.
[25]
Zhou, Y. and Jiang, J. 2016. An FPGA-based accelerator implementation for deep convolutional neural networks. In Proceedings of 2015 4thInternational Conference on Computer Science and Network Technology, ICCSNT 2015. 829--832.
[26]
2018. UltraScale Architecture DSP Slice User Guide (UG579), v1.7. Xilinx, Inc.
[27]
2018. Virtex UltraScale+ FPGA Data Sheet: DC and AC Switching Characteristics (DS923), v1.8. Xilinx, Inc

Cited By

View all

Index Terms

  1. Compute-Efficient Neural-Network Acceleration

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
    February 2019
    360 pages
    ISBN:9781450361378
    DOI:10.1145/3289602
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 February 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. accelerator
    2. compute efficiency
    3. convolutional neural networks
    4. deep learning
    5. fpga
    6. googlenet
    7. image classification
    8. reconfigurable architecture
    9. reduced precision
    10. tensor processing

    Qualifiers

    • Research-article

    Conference

    FPGA '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 125 of 627 submissions, 20%

    Upcoming Conference

    FPGA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)48
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media