skip to main content
10.1145/2966986.2966987guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel

Published: 07 November 2016 Publication History

Abstract

Sparse matrix-vector multiplication (SpMV) is an important computational kernel in many applications. For performance improvement, software libraries designated for SpMV computation have been introduced, e.g., MKL library for CPUs and cuSPARSE library for GPUs. However, the computational throughput of these libraries is far below the peak floating-point performance offered by hardware platforms, because the efficiency of SpMV kernel is greatly constrained by the limited memory bandwidth and irregular data access patterns. In this work, we propose a data locality-aware design framework for FPGA-based SpMV acceleration. We first include the hardware constraints in sparse matrix compression at software level to regularize the memory allocation and accesses. Moreover, a distributed architecture composed of processing elements is developed to improve the computation parallelism. We implement the reconfigurable SpMV kernel on Convey HC-2<sup>ex</sup> and conduct the evaluation by using the University of Florida sparse matrix collection. The experiments demonstrate an average computational efficiency of 48.2%, which is a lot better than those of CPU and GPU implementations. Our FPGA-based kernel has a comparable runtime as GPU, and achieves 2.1&#x00D7; reduction than CPU. Moreover, our design obtains substantial saving in energy consumption, say, 9.3&#x00D7; and 5.6&#x00D7; better than the implementations on CPU and GPU, respectively.

9. References

[1]
A. Yzelman and R. H. Bisseling, “Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods”, SIAM Journal on ScientificComputing, vol. 31, no. 4, pp. 3128–3154, 2009.
[2]
P. Sonneveld and M. B. van Gijzen, “Idr (s): A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations”, SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 1035–1062, 2008.
[3]
J. Fowers, K. Ovtcharov, K. Strauss, E. S. Chung, and G. Stitt, “A high memory bandwidth fpga accelerator for sparse matrix-vector multiplication”, in Proceedings of the International Symposium on Field-Programmable Custom Computing Machines, pp. 36–43, 2014.
[6]
R. Dorrance, F. Ren, and D. Marković, “A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on fpgas”, in Proceedings of the ACM/SIGDA international symposium on Field-programmable gate arrays, pp. 161–170, ACM, 2014.
[7]
S. Sun, M. Monga, P. Jones, and J. Zambreno, “An I/O bandwidth-sensitive sparse matrix-vector multiplication engine on fpgas”, Circuits and Systems I: Regular Papers, IEEE Transactions on. vol. 59. no. 1. pp. 113–123. 2012.
[8]
M. DeLorimier and A. DeHon, “Floating-point sparse matrix-vector multiply for fpgas”, in Proceedings of the ACM/SIGDA 13th international symposium on Field-programmable gate arrays, pp. 75–85, ACM, 2005.
[9]
Convey computer Convey Reference Manual, 2012. http://www.conveycomputer.com/.
[10]
T. A. Davis and Y. Hu, “The university of florida sparse matrix collection”, ACM Trans. Math. Softw, vol. 38, no. 1, 2011.
[11]
M. Wolf and B. Miller, “Sparse matrix partitioning for parallel eigenanalysis of large static and dynamic graphs”, in IEEE, High Performance Extreme Computing Conference, pp. 1–6, 2014.
[12]
C. Lin, H. Kwok-Hay So, and P. Leong, “A model for matrix multiplication performance on fpgas”, in International Conference on Field Programmable Logic and Applications, pp. 305–310, 2011.
[13]
P. Grigoras, P. Burovskiy, E. Hung, and W. Luk, “Accelerating spmv on fpgas by compressing nonzero values”, in IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 64–67, 2015.
[14]
Y. Shan, T. Wu, Y. Wang, B. Wang, Z. Wang, N. Xu, and H. Yang, “Fpga and gpu implementation of large scale Spmv”. in IEEE 8th Symposium on Application Specific Processors„ pp. 64–70, 2010.
[15]
S. Jain, R. Pottathuparambil, and R. Sass, “Implications of memory-efficiency on sparse matrix-vector multiplication”, in Symposium on Application Accelerators in High-Performance Computing, pp. 80–83, IEEE, 2014.
[16]
A. Rafique, G. Constantinides, N. Kapre et al., “Communication optimization of iterative sparse matrix-vector multiply on gpus and fpgas”, IEEE Transactions on Parallel and Distributed Systems, vol. 26, no. 1, pp. 24–34, 2015.
[17]
E. G. Boman, K. D. Devine, and S. Rajamanickam, “Scalable matrix computations on large scale-free graphs using 2d graph partitioning”, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, p. 50, ACM, 2013.
[18]
U. Catalyurek, M. Deveci, K. Kaya, and B. Ucar, “Multithreaded clustering for multi-level hypergraph partitioning”, in IEEE 26th InternationalParallel Distributed Processing Symposium„ pp. 848–859, 2012.
[19]
K. K. Nagar and J. D. Bakos, “A sparse matrix personality for the convey hc-1” in IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 1–8, IEEE, 2011.

Cited By

View all

Index Terms

  1. A data locality-aware design framework for reconfigurable sparse matrix-vector multiplication kernel
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
        Nov 2016
        946 pages

        Publisher

        IEEE Press

        Publication History

        Published: 07 November 2016

        Permissions

        Request permissions for this article.

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 06 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media