skip to main content
10.1145/2847263.2847276acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Published: 21 February 2016 Publication History

Abstract

Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple convolution and fully-connected layers that are compute-/memory-intensive, it is difficult to perform real-time classification with low power consumption on today?s computing systems. FPGAs have been widely explored as hardware accelerators for CNNs because of their reconfigurability and energy efficiency, as well as fast turn-around-time, especially with high-level synthesis methodologies. Previous FPGA-based CNN accelerators, however, typically implemented generic accelerators agnostic to the CNN configuration, where the reconfigurable capabilities of FPGAs are not fully leveraged to maximize the overall system throughput. In this work, we present a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGA resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. The proposed methodology is demonstrated by optimizing two representative large-scale CNNs, AlexNet and VGG, on two Altera Stratix-V FPGA platforms, DE5-Net and P395-D8 boards, which have different hardware resources. We achieve a peak performance of 136.5 GOPS for convolution operation, and 117.8 GOPS for the entire VGG network that performs ImageNet classification on P395-D8 board.

References

[1]
Y. LeCun, et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems, 396--404, 1990.
[2]
O. Russakovsky, et al. ImageNet large-scale visual recognition challenge. In Int. J. Computer Vision, 2015.
[3]
A. Karpathy, et al. Large-scale video classification with convolutional neural networks. In CVPR, 1725--1732, 2014.
[4]
H. Li, Z. Lin, X. Shen, J. Brandt and G. Hua. A convolutional neural network cascade for face detection. In CVPR, 5325--5334, 2015.
[5]
P. Barros, S. Magg, C. Weber and S. Wermter. A multichannel convolutional neural network for hand posture recognition. In Int. Conf. on Artificial Neural Networks (ICANN), 403--410, 2014.
[6]
O. Abdel-Hamid, et al. Convolutional neural networks for speech recognition. In IEEE Trans. on Audio, Speech and Language Processing, 1533--1545, Oct 2014.
[7]
R. Collobert and J. Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Int. Conf. on Machine Learning, 160--167, 2008.
[8]
S. Lai, L. Xu, K. Liu and J. Zhao. Recurrent convolutional neural networks for text classification. In AAAI Conf. on Artificial Intelligence, 2267--2273, 2015.
[9]
C. Szegedy, et al. Going deeper with convolutions. In CVPR, 1--9, 2015.
[10]
C. Farabet, et al. Hardware accelerated convolutional neural networks for synthetic vision systems. In ISCAS, 257--260, 2010.
[11]
S. Chakradhar, et al. A dynamically configurable coprocessor for convolutional neural networks. In ISCA, 247--257, 2010.
[12]
M. Peemen, et al. Memory-centric accelerator design for convolutional neural networks. In ICCD, 13--19, 2013.
[13]
C. Zhang, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM Int. Symp. On Field-Programmable Gate Arrays, 161--170, 2015.
[14]
V. Gokhale, et al. A 240 G-ops/s mobile coprocessor for deep neural networks. In CVPR Workshops, 696--701, 2014.
[15]
Y. Chen, et al. DaDianNao: A machine-learning supercomputer. In IEEE/ACM Int. Symp. on Microarchitecture, 602--622, 2014.
[16]
A. Krizhevsky, et al. ImageNet classification with deep convolutional neural networks. In NIPS, 1097--1105, 2012.
[17]
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
[18]
Y.L. Boureau, et al. A Theoretical Analysis of Feature Pooling in Visual Recognition. In Int. Conf. on Machine Learning, 2010.
[19]
M. Denil, et al. Predicting parameters in deep learning. In NIPS, 2148--2156, 2013.
[20]
Y. Jia, et al. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.
[21]
Khronos OpenCL Working Group. The OpenCL Specification, version 1.1.44, 2011.
[22]
M. S. Abdelfattah, et al. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL. In Int. Workshop on OpenCL 2014.
[23]
K. Chellapilla, S. Puri and P. Simard. High performance convolutional neural networks for document processing. In Int. Workshop on Frontiers in Handwriting Recognition, 2006.
[24]
Altera OpenCL design examples. Available online at https://www.altera.com/support/support-resources/design-examples/design-software/opencl.html
[25]
Nallatech P395-D8 OpenCL FPGA accelerator cards. http://www.nallatech.com/wp-content/uploads/openclcardspb_v1_51.pdf
[26]
DE5-Net FPGA kit user manual. Available online at ftp://ftp.altera.com/up/pub/Altera_Material/Boards/DE5/DE5_User_Manual.pdf
[27]
R.C. Whaley and J.J. Dongarra. Automatically tuned linear algebra software. In Proc. SuperComputing 1998: High Performance Networking and Computing, 2001.

Cited By

View all

Index Terms

  1. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
    February 2016
    298 pages
    ISBN:9781450338561
    DOI:10.1145/2847263
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 February 2016

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. convolutional neural networks
    2. fpga
    3. opencl
    4. optimization

    Qualifiers

    • Research-article

    Conference

    FPGA'16
    Sponsor:

    Acceptance Rates

    FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;
    Overall Acceptance Rate 125 of 627 submissions, 20%

    Upcoming Conference

    FPGA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)265
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 28 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media