research-article

FP-BNN

Authors:

Shaojun WeiAuthors Info & Claims

Neurocomputing, Volume 275, Issue C

Pages 1072 - 1086

https://doi.org/10.1016/j.neucom.2017.09.046

Published: 31 January 2018 Publication History

Abstract

Deep neural networks (DNNs) have attracted significant attention for their excellent accuracy especially in areas such as computer vision and artificial intelligence. To enhance their performance, technologies for their hardware acceleration are being studied. FPGA technology is a promising choice for hardware acceleration, given its low power consumption and high flexibility which makes it suitable particularly for embedded systems. However, complex DNN models may need more computing and memory resources than those available in many current FPGAs. This paper presents FP-BNN, a binarized neural network (BNN) for FPGAs, which drastically cuts down the hardware consumption while maintaining acceptable accuracy. We introduce a Resource-Aware Model Analysis (RAMA) method, and remove the bottleneck involving multipliers by bit-level XNOR and shifting operations, and the bottleneck of parameter access by data quantization and optimized on-chip storage. We evaluate the FP-BNN accelerator designs for MNIST multi-layer perceptrons (MLP), Cifar-10 ConvNet, and AlexNet on a Stratix-V FPGA system. An inference performance of Tera opartions per second with acceptable accuracy loss is obtained, which shows improvement in speed and energy efficiency over other computing platforms.

References

[1]

Y. LeCun, C. Cortes, C.J. Burges, The MNIST database of handwritten digits, 1998, http://yann.lecun.com/exdb/mnist/.

[2]

A. Krizhevsky, V. Nair, G. Hinton, The CIFAR-10 dataset, 2014, https://www.cs.toronto.edu/~kriz/cifar.html.

[3]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, Imagenet large scale visual recognition challenge, Int. J. Comput. Vis., 115 (2015) 211-252.

Digital Library

[4]

G. Hinton, L. Deng, D. Yu, G.E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups, IEEE Signal Process. Mag., 29 (2012) 82-97.

[5]

D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, Deep speech 2: end-to-end speech recognition in English and Mandarin, International Conference on Machine Learning (2016) 173-182.

Digital Library

[6]

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller, Playing atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602 (2013).

[7]

D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016) 484-489.

[8]

A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, 2012.

[9]

K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on imagenet classification, 2015.

[10]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2016.

[11]

A. Coates, B. Huval, T. Wang, D. Wu, B. Catanzaro, N. Andrew, Deep learning with COTS HPC systems, 2013.

[12]

NVIDIA, Tesla K40 GPU Active Accelerator, NVIDIA, 2013.

[13]

C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, Y. LeCun, Neuflow: a runtime reconfigurable dataflow processor for vision, IEEE, 2011.

[14]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, Optimizing FPGA-based accelerator design for deep convolutional neural networks, ACM, 2015.

[15]

J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, Going deeper with embedded FPGA platform for convolutional neural network, ACM, 2016.

[16]

S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint arXiv:1510.00149 (2015).

[17]

F.N. Iandola, M.W. Moskewicz, K. Ashraf, S. Han, W.J. Dally, K. Keutzer, Squeezenet: alexnet-level accuracy with 50x fewer parameters and less than 1MB model size, arXiv preprint arXiv:1602.07360 (2016).

[18]

M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks: training deep neural networks with weights andactivations constrained to +1 or 1, arXiv preprint arXiv:1602.02830 (2016).

[19]

Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A.Y. Ng, Reading digits in natural images with unsupervised feature learning, 2011.

[20]

M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: imagenet classification using binary convolutional neural networks, European Conference on Computer Vision, Springer International Publishing (2016) 525-542.

[21]

V. Sze, Y.-H. Chen, T.-J. Yang, J. Emer, Efficient processing of deep neural networks: a tutorial and survey, arXiv preprint arXiv:1703.09039 (2017).

[22]

W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, F.E. Alsaadi, A survey of deep neural network architectures and their applications, Neurocomputing, 234 (2017) 11-26.

[23]

S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning (2015) 448-456.

Digital Library

[24]

A. Ng, J. Ngiam, C. Foo, Y. Mai, C. Suen, Backpropagation algorithm of ufldl tutorial, http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm.

[25]

G. Hinton, Neural Network for Machine Learning, Coursera, 2012.

[26]

C. Farabet, C. Poulet, J.Y. Han, Y. LeCun, CNP: an FPGA-based processor for convolutional networks, IEEE, 2009.

[27]

T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, O. Temam, Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning, ACM, 2014.

[28]

Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, Dadiannao: a machine-learning supercomputer, IEEE Computer Society, 2014.

[29]

N.P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, etal., In-datacenter performance analysis of a tensor processing unit, arXiv preprint arXiv:1704.04760 (2017).

Digital Library

[30]

W. Wen, C. Wu, Y. Wang, Y. Chen, H. Li, Learning structured sparsity in deep neural networks, 2016.

[31]

T.-J. Yang, Y.-H. Chen, V. Sze, Designing energy-efficient convolutional neural networks using energy-aware pruning, arXiv preprint arXiv:1611.05128 (2016).

[32]

H. Li, X. Fan, L. Jiao, W. Cao, X. Zhou, L. Wang, A high performance FPGA-based accelerator for large-scale convolutional neural networks, IEEE, 2016.

[33]

P. Gysel, Ristretto: Hardware-oriented approximation of convolutional neural networks, arXiv preprint arXiv:1605.06402 (2016).

[34]

F. Li, B. Zhang, B. Liu, Ternary weight networks, arXiv preprint arXiv:1605.04711 (2016).

[35]

C. Zhu, S. Han, H. Mao, W.J. Dally, Trained ternary quantization, arXiv preprint arXiv:1612.01064 (2016).

[36]

S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, Y. Zou, Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint arXiv:1606.06160 (2016).

[37]

H. Alemdar, N. Caldwell, V. Leroy, A. Prost-Boucle, F. Ptrot, Ternary neural networks for resource-efficient ai applications, arXiv:1609.00222 (2016).

[38]

W. Meng, Z. Gu, M. Zhang, Z. Wu, Two-bit networks for deep learning on resource-constrained embedded devices, arXiv preprint arXiv:1701.00485 (2017).

[39]

R. Andri, L. Cavigelli, D. Rossi, L. Benini, YodaNN: an architecture for ultra-low power binary-weight cnn acceleration, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., PP (2017) 1-14.

[40]

R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta, Z. Zhang, Accelerating binarized convolutional neural networks with software-programmable fpgas, ACM, 2017.

[41]

Y. Umuroglu, N.J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, K. Vissers, Finn: a framework for fast, scalable binarized neural network inference, ACM, 2017.

[42]

M. Kumm, P. Zipf, Pipelined compressor tree optimization using integer linear programming, IEEE, 2014.

[43]

S. Gupta, A. Agrawal, K. Gopalakrishnan, P. Narayanan, Deep learning with limited numerical precision, CoRR, 392 (2015).

[44]

D. Williamson, Dynamically scaled fixed point arithmetic, IEEE, 1991.

[45]

Maxeler, MPC-X series, https://www.maxeler.com/products/mpc-xseries/.

[46]

R. Collobert, K. Kavukcuoglu, C. Farabet, Torch7: A matlab-like environment for machine learning, 2011.

[47]

I.J. Goodfellow, D. Warde-Farley, M. Mirza, A.C. Courville, Y. Bengio, Maxout networks., ICML (3), 28 (2013) 1319-1327.

Digital Library

[48]

M. Lin, Q. Chen, S. Yan, Network in network, arXiv preprint arXiv:1312.4400 (2013).

[49]

D. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).

[50]

M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: training deep neural networks with binary weights during propagations, 2015.

[51]

N. Suda, V. Chandra, G. Dasika, A. Mohanty, Y. Ma, S. Vrudhula, J.-s. Seo, Y. Cao, Throughput-optimized opencl-based FPGA accelerator for large-scale convolutional neural networks, ACM, 2016.

[52]

I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Quantized neural networks: Training neural networks with low precision weights and activations, arXiv preprint arXiv:1609.07061 (2016).

Cited By

Zang ZWang QPan MZhang YChen XLi XLi D(2025)Towards high-performance deep learning architecture and hardware accelerator design for robust analysis in diffuse correlation spectroscopyComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2024.108471258:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.cmpb.2024.108471
Yang XAn ZPan QYang LLei DFan YGanesan DShi W(2024)Binary Optical Machine Learning: Million-Scale Physical Neural Networks with Nano NeuronsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649384(603-617)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649384
Yang GLei JFang ZLi YZhang JXie W(2024)HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural NetworksACM Transactions on Reconfigurable Technology and Systems10.1145/363161017:2(1-24)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3631610
Show More Cited By

Recommendations

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we present FINN, a ...
A BNN Accelerator Based on Edge-skip-calculation Strategy and Consolidation Compressed Tree
Binarized neural networks (BNNs) and batch normalization (BN) have already become typical techniques in artificial intelligence today. Unfortunately, the massive accumulation and multiplication in BNN models bring challenges to field-programmable gate ...
Reconfigurable and hardware efficient adaptive quantization model-based accelerator for binarized neural network
Highlights
- Adaptive spatial amplitude model is propose to reduce complexity of BNN accelerator.
Abstract
Binarized neural networks (BNNs) architecture play a vital role in the development of deep learning accelerator for memory-constrained IoT devices. However, the cost-efficiency of the domain-specific accelerators still requires a ...
Graphical abstract

Display Omitted

Comments

Information & Contributors

Information

Published In

cover image Neurocomputing

Neurocomputing Volume 275, Issue C

January 2018

2070 pages

ISSN:0925-2312

Issue’s Table of Contents

Copyright © Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 31 January 2018

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

60
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zang ZWang QPan MZhang YChen XLi XLi D(2025)Towards high-performance deep learning architecture and hardware accelerator design for robust analysis in diffuse correlation spectroscopyComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2024.108471258:COnline publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.cmpb.2024.108471
Yang XAn ZPan QYang LLei DFan YGanesan DShi W(2024)Binary Optical Machine Learning: Million-Scale Physical Neural Networks with Nano NeuronsProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649384(603-617)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649384
Yang GLei JFang ZLi YZhang JXie W(2024)HyBNN: Quantifying and Optimizing Hardware Efficiency of Binary Neural NetworksACM Transactions on Reconfigurable Technology and Systems10.1145/363161017:2(1-24)Online publication date: 30-Apr-2024
https://dl.acm.org/doi/10.1145/3631610
He YZhang LWu WZhou H(2024)Binarizing by Classification: Is Soft Function Really Necessary?IEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.328857234:2(973-982)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TCSVT.2023.3288572
Song MAsim FLee JKim T(2024)Extending Neural Processing Unit and Compiler for Advanced Binarized Neural NetworksProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473822(115-120)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1109/ASP-DAC58780.2024.10473822
Rangsikunpum AAmiri SOst L(2024)A Reconfigurable Coarse-to-Fine Approach for the Execution of CNN Inference Models in Low-Power Edge DevicesIET Computers & Digital Techniques10.1049/cdt2/62144362024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1049/cdt2/6214436
Liu FLi HHu WHe Y(2024)Review of neural network model acceleration techniques based on FPGA platformsNeurocomputing10.1016/j.neucom.2024.128511610:COnline publication date: 28-Dec-2024
https://dl.acm.org/doi/10.1016/j.neucom.2024.128511
Huang CChen YHsu CYang JChang C(2024)FPGA-based UAV and UGV for search and rescue applicationsComputers and Electrical Engineering10.1016/j.compeleceng.2024.109491119:PAOnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.compeleceng.2024.109491
Al-Rikabi HRenczes B(2024)Floating-Point Quantization Analysis of Multi-Layer Perceptron Artificial Neural NetworksJournal of Signal Processing Systems10.1007/s11265-024-01911-096:4-5(301-312)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1007/s11265-024-01911-0
Susskind ZArora AMiranda IBacellar AVillon LKatopodis Rde Araújo LDutra DLima PFrança FBreternitz Jr. MJohn L(2023)ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural NetworksACM Transactions on Architecture and Code Optimization10.1145/362952220:4(1-24)Online publication date: 25-Oct-2023
https://dl.acm.org/doi/10.1145/3629522
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents