poster

A Fine-Grained Sparse Accelerator for Multi-Precision DNN

Authors:

Yi Shan,

Yu Wang,

Huazhong YangAuthors Info & Claims

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Page 185

https://doi.org/10.1145/3289602.3293964

Published: 20 February 2019 Publication History

Abstract

Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers proposed model compression algorithms using sparsification and quantization, along with specific hardware architecture designs, to accelerate various applications. However, the irregularity of memory access caused by the sparsity severely damages the regularity of intensive computation loops. Therefore, the architecture design for sparse neural networks is crucial to better software and hardware co-design for neural network applications. To face these challenges, this paper first analyzes the computation patterns of different NN structures and unify them into the form of sparse matrix-vector multiplication, sparse matrix-matrix multiplication, and element-wise multiplication. On the basis of the EIE which supports only the fully-connected network and recurrent neural network (RNN), we expand it to support the convolution neural network (CNN) using the input vector transform unit. This paper designs a multi-precision multiplier with supporting datapath, which makes the proposed architecture have a better acceleration effect in the low-bit quantization with the same hardware architecture. The proposed accelerator architecture can achieve the equivalent performance and energy efficiency up to 574.2 GOPS, 42.8 GOPS/W for CNN and 110.4 GOPS, 8.24 GOPS/W for RNN under 4-bit quantization on Xilinx XCKU115 FPGA running at 200MHz. And it is the state-of-the-art accelerator supporting CNN-RNN-based models like the long-term recurrent convolutional network with 571.1 GOPS performance and 42.6 GOPS/W energy efficiency under 4-bit data format.

Cited By

View all

Véstias MDuarte Rde Sousa JNeto H(2022)Efficient Design of Low Bitwidth Convolutional Neural Networks on FPGA with Optimized Dot Product UnitsACM Transactions on Reconfigurable Technology and Systems10.1145/354618216:1(1-36)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3546182
Ma XLin SYe SHe ZZhang LYuan GTan SLi ZFan DQian XLin XMa KWang Y(2022)Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?IEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.306326533:9(4930-4944)Online publication date: Sep-2022
https://doi.org/10.1109/TNNLS.2021.3063265

Index Terms

A Fine-Grained Sparse Accelerator for Multi-Precision DNN
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-Grained Pruning
ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine ...
An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGA
Abstract
Currently, the sparsity of weights and activations are mainly utilized to improve the energy efficiency and computational performance of CNN accelerators. However, the irregular sparsity of weights and activations can lead to problems such as ...
Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks
This work introduces Remarn, a reconfigurable multi-threaded multi-core accelerator supporting both spatial and temporal co-execution of Recurrent Neural Network (RNN) inferences. It increases processing capabilities and quality of service of cloud-based ...

Comments

Information & Contributors

Information

Published In

FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2019

360 pages

ISBN:9781450361378

DOI:10.1145/3289602

General Chair:
Kia Bazargan
Univ. of Minnesota, USA
,
Program Chair:
Stephen Neuendorffer
Xilinx, USA

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

National Natural Science Foundation of China
Beijing Innovation Center for Future Chip
National Key R&D Program of China
Beijing National Research Center for Information Science and Technology-NRist?

Conference

FPGA '19

Sponsor:

SIGDA

FPGA '19: The 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 24 - 26, 2019

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Sponsor:
sigda

The 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 27 - March 1, 2025

Monterey , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Véstias MDuarte Rde Sousa JNeto H(2022)Efficient Design of Low Bitwidth Convolutional Neural Networks on FPGA with Optimized Dot Product UnitsACM Transactions on Reconfigurable Technology and Systems10.1145/354618216:1(1-36)Online publication date: 22-Dec-2022
https://dl.acm.org/doi/10.1145/3546182
Ma XLin SYe SHe ZZhang LYuan GTan SLi ZFan DQian XLin XMa KWang Y(2022)Non-Structured DNN Weight Pruning—Is It Beneficial in Any Platform?IEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.306326533:9(4930-4944)Online publication date: Sep-2022
https://doi.org/10.1109/TNNLS.2021.3063265

Abstract

Cited By

Index Terms

Recommendations

Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-Grained Pruning

An efficient CNN accelerator for pattern-compressed sparse neural networks on FPGA

Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations