research-article

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel

Authors:

Hai LiAuthors Info & Claims

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Pages 1 - 6

https://doi.org/10.1145/2966986.2966987

Published: 07 November 2016 Publication History

Abstract

Sparse matrix-vector multiplication (SpMV) is an important computational kernel in many applications. For performance improvement, software libraries designated for SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library for GPUs. However, the computational throughput of these libraries is far below the peak floating-point performance offered by hardware platforms, because the efficiency of SpMV kernel is greatly constrained by the limited memory bandwidth and irregular data access patterns. In this work, we propose a data locality-aware design framework for FPGA-based SpMV acceleration. We first include the hardware constraints in sparse matrix compression at software level to regularize the memory allocation and accesses. Moreover, a distributed architecture composed of processing elements is developed to improve the computation parallelism. We implement the reconfigurable SpMV kernel on Convey HC-2<sup>ex</sup> and conduct the evaluation by using the University of Florida sparse matrix collection. The experiments demonstrate an average computational efficiency of 48.2%, which is a lot better than those of CPU and GPU implementations. Our FPGA-based kernel has a comparable runtime as GPU, and achieves 2.1× reduction than CPU. Moreover, our design obtains substantial saving in energy consumption, say, 9.3× and 5.6× better than the implementations on CPU and GPU, respectively.

9. References

[1]

A. Yzelman and R. H. Bisseling, “Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods”, SIAM Journal on ScientificComputing, vol. 31, no. 4, pp. 3128–3154, 2009.

Digital Library

[2]

P. Sonneveld and M. B. van Gijzen, “Idr (s): A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations”, SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 1035–1062, 2008.

Digital Library

[3]

J. Fowers, K. Ovtcharov, K. Strauss, E. S. Chung, and G. Stitt, “A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication”, in Proceedings of the International Symposium on Field-Programmable Custom Computing Machines, pp. 36–43, 2014.

[4]

Intel, Intel math kernel library. http://software.intel.com/en-us/intelmkl/.

[5]

Nvidia cuSPARSE. http://developer.nvidia.com/cusparse.

[6]

R. Dorrance, F. Ren, and D. Marković, “A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on fpgas”, in Proceedings of the ACM/SIGDA international symposium on Field-programmable gate arrays, pp. 161–170, ACM, 2014.

[7]

S. Sun, M. Monga, P. Jones, and J. Zambreno, “An I/O bandwidth-sensitive sparse matrix-vector multiplication engine on fpgas”, Circuits and Systems I: Regular Papers, IEEE Transactions on. vol. 59. no. 1. pp. 113–123. 2012.

[8]

M. DeLorimier and A. DeHon, “Floating-point sparse matrix-vector multiply for fpgas”, in Proceedings of the ACM/SIGDA 13th international symposium on Field-programmable gate arrays, pp. 75–85, ACM, 2005.

[9]

Convey computer Convey Reference Manual, 2012. http://www.conveycomputer.com/.

[10]

T. A. Davis and Y. Hu, “The university of florida sparse matrix collection”, ACM Trans. Math. Softw, vol. 38, no. 1, 2011.

Digital Library

[11]

M. Wolf and B. Miller, “Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs”, in IEEE, High Performance Extreme Computing Conference, pp. 1–6, 2014.

[12]

C. Lin, H. Kwok-Hay So, and P. Leong, “A model for matrix multiplication performance on fpgas”, in International Conference on Field Programmable Logic and Applications, pp. 305–310, 2011.

[13]

P. Grigoras, P. Burovskiy, E. Hung, and W. Luk, “Accelerating spmv on fpgas by compressing nonzero values”, in IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 64–67, 2015.

[14]

Y. Shan, T. Wu, Y. Wang, B. Wang, Z. Wang, N. Xu, and H. Yang, “Fpga and gpu implementation of large scale Spmv”. in IEEE 8th Symposium on Application Specific Processors„ pp. 64–70, 2010.

[15]

S. Jain, R. Pottathuparambil, and R. Sass, “Implications of memory-efficiency on sparse matrix-vector multiplication”, in Symposium on Application Accelerators in High-Performance Computing, pp. 80–83, IEEE, 2014.

[16]

A. Rafique, G. Constantinides, N. Kapre et al., “Communication optimization of iterative sparse matrix-vector multiply on gpus and fpgas”, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 1, pp. 24–34, 2015.

[17]

E. G. Boman, K. D. Devine, and S. Rajamanickam, “Scalable matrix computations on large scale-free graphs using 2d graph partitioning”, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 50, ACM, 2013.

[18]

U. Catalyurek, M. Deveci, K. Kaya, and B. Ucar, “Multithreaded clustering for multi-level hypergraph partitioning”, in IEEE 26th InternationalParallel Distributed Processing Symposium„ pp. 848–859, 2012.

[19]

K. K. Nagar and J. D. Bakos, “A sparse matrix personality for the convey hc-1” in IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 1–8, IEEE, 2011.

Cited By

Shaji AAizaz ZKhare K(2024)Power and Delay Efficient Approximate Sparse Matrix Vector Multiplication on FPGA using HLS2024 3rd International Conference for Innovation in Technology (INOCON)10.1109/INOCON60754.2024.10512183(1-6)Online publication date: 1-Mar-2024
https://doi.org/10.1109/INOCON60754.2024.10512183
Xiao GYin CZhou TLi XChen YLi K(2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3604606
Li SHuai SLiu W(2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: Dec-2023
https://doi.org/10.1109/TCAD.2023.3281719
Show More Cited By

Index Terms

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel

Index terms have been assigned to the content through auto-classification.

Recommendations

A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs
FPGA '14: Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

Sparse Matrix-Vector Multiplication (SpMxV) is a widely used mathematical operation in many high-performance scientific and engineering applications. In recent years, tuned software libraries for multi-core microprocessors (CPUs) and graphics processing ...
GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication

Many high performance computing applications require computing both sparse matrix-vector product SMVP and sparse matrix-transpose vector product SMTVP for better overall performance. Under such a circumstance, it is critical to maintain a similarly high ...
Vectorized Parallel Sparse Matrix-Vector Multiplication in PETSc Using AVX-512
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing

Emerging many-core CPU architectures with high degrees of single-instruction, multiple data (SIMD) parallelism promise to enable increasingly ambitious simulations based on partial differential equations (PDEs) via extreme-scale computing. However, such ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Nov 2016

946 pages

Copyright © 2016.

Publisher

IEEE Press

Publication History

Published: 07 November 2016

Permissions

Request permissions for this article.

Request Permissions

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
193
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shaji AAizaz ZKhare K(2024)Power and Delay Efficient Approximate Sparse Matrix Vector Multiplication on FPGA using HLS2024 3rd International Conference for Innovation in Technology (INOCON)10.1109/INOCON60754.2024.10512183(1-6)Online publication date: 1-Mar-2024
https://doi.org/10.1109/INOCON60754.2024.10512183
Xiao GYin CZhou TLi XChen YLi K(2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
https://dl.acm.org/doi/10.1145/3604606
Li SHuai SLiu W(2023)An Efficient Gustavson-Based Sparse Matrix–Matrix Multiplication Accelerator on Embedded FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171942:12(4671-4680)Online publication date: Dec-2023
https://doi.org/10.1109/TCAD.2023.3281719
Li SLiu DLiu W(2023)Efficient FPGA-Based Sparse Matrix–Vector Multiplication With Data Reuse-Aware CompressionIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2023.328171542:12(4606-4617)Online publication date: Dec-2023
https://doi.org/10.1109/TCAD.2023.3281715
Mandal UDeb A(2023)ReMCOO: An Efficient Representation of Sparse Matrix-Vector Multiplication2023 IEEE Guwahati Subsection Conference (GCON)10.1109/GCON58516.2023.10183488(01-06)Online publication date: 23-Jun-2023
https://doi.org/10.1109/GCON58516.2023.10183488
Li SLiu DLiu W(2021)Optimized Data Reuse via Reordering for Sparse Matrix-Vector Multiplication on FPGAs2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643453(1-9)Online publication date: 1-Nov-2021
https://doi.org/10.1109/ICCAD51958.2021.9643453
Hosseinabady MNunez-Yanez J(2020)A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication Using High-Level SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2019.291292339:6(1272-1285)Online publication date: Jun-2020
https://doi.org/10.1109/TCAD.2019.2912923
Jain AOmidian HFraisse HBenipal MLiu LGaitonde D(2020)A Domain-Specific Architecture for Accelerating Sparse Matrix Vector Multiplication on FPGAs2020 30th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL50879.2020.00031(127-132)Online publication date: Aug-2020
https://doi.org/10.1109/FPL50879.2020.00031

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents