research-article

Compute-Efficient Neural-Network Acceleration

Authors:

Xiaoqian Zhang,

John ThendeanAuthors Info & Claims

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 191 - 200

https://doi.org/10.1145/3289602.3293925

Published: 20 February 2019 Publication History

Abstract

To enhance the performance of FPGA-based neural-network accelerators, maximizing both operating clock rates and compute efficiency is paramount. Streamlining data movement between memory and compute holds the key to boosting these metrics. To unleash latent performance in FPGA-based inference processors, we outline a convolutional neural network accelerator that operates at 92.9% of the peak FPGA clock rate. First, we map neural-network operators to a minimalist hardware architecture to simplify data movement between memory and compute. Doing so enables the design to close timing at high clock rates. Second, we describe a schedule that keeps compute utilization high. We apply this architecture to classify MNIST, CIFAR-10, and ImageNet datasets. This design achieves 95.5% compute efficiency with GoogLeNet, whose nested topology makes creating an efficient design especially challenging.

References

[1]

Abadi, M. et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI '16). 265--283.

Digital Library

[2]

Chellapilla, K., Puri, S. and Simard, P. 2006. High Performance Convolutional Neural Networks for Document Processing. In Tenth International Workshop on Frontiers in Handwriting Recognition.

[3]

Chen, Y.H., Krishna, T., Emer, J.S. and Sze, V. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J. Solid-State Circuits. 52, 1 (2017), 127--138.

[4]

Fu, Y., Wu, E. and Sirasao, A. 2017. 8-Bit Dot-Product Acceleration. https://www.xilinx.com/support/documentation/white_papers/wp487-int8-acceleration.pdf.

[5]

Gokhale, V., Zaidy, A., Chang, A.X.M. and Culurciello, E. 2017. Snowflake: An efficient hardware accelerator for convolutional neural networks. In 2017 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4.

[6]

Gysel, P., Pimentel, J., Motamedi, M. and Ghiasi, S. 2018. Ristretto: A Framework for Empirical Study of Resource-Efficient Inference in Convolutional Neural Networks. IEEE Trans. Neural Networks Learn. Syst. 29, 11 (2018), 5784--5789.

[7]

Hegde, G., Siddhartha, Ramasamy, N. and Kapre, N. 2016. CaffePresso. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES '16). 1--10.

Digital Library

[8]

Huang, Y., Shen, J., Qiao, Y., Wen, M. and Zhang, C. 2018. MALMM: A multi-array architecture for large-scale matrix multiplication on FPGA. IEICE Electron. Express. 15, 10 (2018), 1--12.

[9]

Jia, Y. and Shelhamer, E. Convolution in Caffe: A Memo. https://github.com/Yangqing/caffe/wiki/Convolution-in-Caffe:-a-memo.

[10]

Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S. and Darrell, T. 2014. Caffe. Proc. ACM Int. Conf. Multimed. - MM '14. (2014).

[11]

Ngo, K. 2016. FPGA Hardware Acceleration of Inception Style Parameter Reduced Convolution Neural Networks. Master's Thesis, School of Inform. and Commun. Technol. (ICT), KTH Royal Institute of Technology, Stockholm, Sweden.

[12]

Paszke, A., Chanan, G., Lin, Z., Gross, S., Yang, E., Antiga, L. and Devito, Z. 2017. Automatic differentiation in PyTorch. In Advances in Neural Information Processing Systems 30.

[13]

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C. and Li, F.-F. 2015. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 115, 3 (2015), 211--252.

Digital Library

[14]

Sharma, H., Park, J., Mahajan, D., Amaro, E., Kim, J.K., Shao, C., Mishra, A. and Esmaeilzadeh, H. 2016. From high-level deep neural models to FPGAs. In Proceedings of the Annual International Symposium on Microarchitecture (MICRO). 17:1--17:12.

Digital Library

[15]

Shen, Y., Ferdman, M. and Milder, P. 2017. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In Proc. 44th Annu. Int. Symp. Comput. Architecture (ISCA). 535--547.

Digital Library

[16]

Sze, V., Chen, Y.H., Yang, T.J. and Emer, J.S. 2017. Efficient Processing of Deep Neural Networks: A Tutorial and Survey. In Proceedings of the IEEE. 2295--2329.

[17]

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A. 2015. Going deeper with convolutions. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR '15). 1--9.

[18]

Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., Jahre, M. and Vissers, K. 2017. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 65--74.

Digital Library

[19]

Venieris, S.I. and Bouganis, C.-S. 2017. fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs. http://arxiv.org/abs/1711.08740.

[20]

Wang, Y., Xu, J., Han, Y., Li, H. and Li, X. 2016. DeepBurning: Automatic generation of FPGA-based learning accelerators for the Neural Network family. In 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC). 110:1--110:6.

Digital Library

[21]

Wilkinson., J.H. 1963. Rounding Errors in Algebraic Processes. Natl. Phys. Lab. Notes Appl. Sci. Her Majesty's Station. Off. (HMSO), London. (1963).

[22]

Wu, E., Zhang, X., Berman, D. and Cho, I. 2017. A high-throughput reconfigurable processing array for neural networks. In 2017 27th International Conference on Field Programmable Logic and Applications. 1--4.

[23]

Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B. and Cong, J. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 161--170.

Digital Library

[24]

Zhong, G., Dubey, A., Cheng, T. and Mitra, T. 2018. Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC. http://arxiv.org/abs/1804.00706.

[25]

Zhou, Y. and Jiang, J. 2016. An FPGA-based accelerator implementation for deep convolutional neural networks. In Proceedings of 2015 4thInternational Conference on Computer Science and Network Technology, ICCSNT 2015. 829--832.

[26]

2018. UltraScale Architecture DSP Slice User Guide (UG579), v1.7. Xilinx, Inc.

[27]

2018. Virtex UltraScale+ FPGA Data Sheet: DC and AC Switching Characteristics (DS923), v1.8. Xilinx, Inc

Cited By

Dai KXie ZLiu S(2025)DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343599644:2(540-553)Online publication date: Feb-2025
https://doi.org/10.1109/TCAD.2024.3435996
Liu SFan HLuk W(2024)Design of Fully Spectral CNNs for Efficient FPGA-Based AccelerationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322477935:6(8111-8123)Online publication date: Jun-2024
https://doi.org/10.1109/TNNLS.2022.3224779
Jiang WYu HChen FHa Y(2023)AOS: An Automated Overclocking System for High-Performance CNN Accelerator Through Timing Delay Measurement on FPGAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.323580342:9(2952-2965)Online publication date: Sep-2023
https://doi.org/10.1109/TCAD.2023.3235803
Show More Cited By

Index Terms

Compute-Efficient Neural-Network Acceleration
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

As convolution layers contribute most operations in convolutional neural network (CNN) algorithms, an effective convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution in CNNs ...
Xilinx Adaptive Compute Acceleration Platform: Versal^TM Architecture
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

In this paper we describe Xilinx's Versal-Adaptive Compute Acceleration Platform (ACAP). ACAP is a hybrid compute platform that tightly integrates traditional FPGA programmable fabric, software programmable processors and software programmable ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2019

360 pages

ISBN:9781450361378

DOI:10.1145/3289602

General Chair:
Kia Bazargan
Univ. of Minnesota, USA
,
Program Chair:
Stephen Neuendorffer
Xilinx, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '19

Sponsor:

SIGDA

FPGA '19: The 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 24 - 26, 2019

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Sponsor:
sigda

The 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 27 - March 1, 2025

Monterey , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
842
Total Downloads

Downloads (Last 12 months)48
Downloads (Last 6 weeks)3

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dai KXie ZLiu S(2025)DCP-CNN: Efficient Acceleration of CNNs With Dynamic Computing Parallelism on FPGAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.343599644:2(540-553)Online publication date: Feb-2025
https://doi.org/10.1109/TCAD.2024.3435996
Liu SFan HLuk W(2024)Design of Fully Spectral CNNs for Efficient FPGA-Based AccelerationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.322477935:6(8111-8123)Online publication date: Jun-2024
https://doi.org/10.1109/TNNLS.2022.3224779
Jiang WYu HChen FHa Y(2023)AOS: An Automated Overclocking System for High-Performance CNN Accelerator Through Timing Delay Measurement on FPGAIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.323580342:9(2952-2965)Online publication date: Sep-2023
https://doi.org/10.1109/TCAD.2023.3235803
Kalantar AZimmerman ZBrisk P(2022)FPGA-based Acceleration of Time Series Similarity Prediction: From Cloud to EdgeACM Transactions on Reconfigurable Technology and Systems10.1145/355581016:1(1-27)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3555810
Langhammer MNurvitadhi EGribok SPasca B(2022)Stratix 10 NX ArchitectureACM Transactions on Reconfigurable Technology and Systems10.1145/352019715:4(1-32)Online publication date: 14-Mar-2022
https://dl.acm.org/doi/10.1145/3520197
Liu SFan HFerianc MNiu XShi HLuk W(2022)Toward Full-Stack Acceleration of Deep Convolutional Neural Networks on FPGAsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.305524033:8(3974-3987)Online publication date: Aug-2022
https://doi.org/10.1109/TNNLS.2021.3055240
Langhammer MNurvitadhi EPasca BGribok SShannon LAdler M(2021)Stratix 10 NX Architecture and ApplicationsThe 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3431920.3439293(57-67)Online publication date: 17-Feb-2021
https://dl.acm.org/doi/10.1145/3431920.3439293
Raut GRai SVishvakarma SKumar A(2021)RECON: Resource-Efficient CORDIC-Based Neuron ArchitectureIEEE Open Journal of Circuits and Systems10.1109/OJCAS.2020.30427432(170-181)Online publication date: 2021
https://doi.org/10.1109/OJCAS.2020.3042743
Langhammer MFinn SGribok SPasca B(2021)Dense FPGA Compute Using Signed Byte Tuples2021 31st International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL53798.2021.00029(130-138)Online publication date: Aug-2021
https://doi.org/10.1109/FPL53798.2021.00029
Kalantar AZimmerman ZBrisk P(2021)FA-LAMP: FPGA-Accelerated Learned Approximate Matrix Profile for Time Series Similarity Prediction2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)10.1109/FCCM51124.2021.00013(40-49)Online publication date: May-2021
https://doi.org/10.1109/FCCM51124.2021.00013
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten