Pattern-based sparse matrix representation for memory-efficient SMVM kernels
M Belgin, G Back, CJ Ribbens - … of the 23rd international conference on …, 2009 - dl.acm.org
M Belgin, G Back, CJ Ribbens
Proceedings of the 23rd international conference on Supercomputing, 2009•dl.acm.orgPattern-based Representation (PBR) is a novel approach to improving the performance of
Sparse Matrix-Vector Multiply (SMVM) numerical kernels. Motivated by our observation that
many matrices can be divided into blocks that share a small number of distinct patterns, we
generate custom multiplication kernels for frequently recurring block patterns. The resulting
reduction in index overhead significantly reduces memory bandwidth requirements and
improves performance. Unlike existing methods, PBR requires neither detection of dense …
Sparse Matrix-Vector Multiply (SMVM) numerical kernels. Motivated by our observation that
many matrices can be divided into blocks that share a small number of distinct patterns, we
generate custom multiplication kernels for frequently recurring block patterns. The resulting
reduction in index overhead significantly reduces memory bandwidth requirements and
improves performance. Unlike existing methods, PBR requires neither detection of dense …
Pattern-based Representation (PBR) is a novel approach to improving the performance of Sparse Matrix-Vector Multiply (SMVM) numerical kernels. Motivated by our observation that many matrices can be divided into blocks that share a small number of distinct patterns, we generate custom multiplication kernels for frequently recurring block patterns. The resulting reduction in index overhead significantly reduces memory bandwidth requirements and improves performance. Unlike existing methods, PBR requires neither detection of dense blocks nor zero filling, making it particularly advantageous for matrices that lack dense nonzero concentrations. SMVM kernels for PBR can benefit from explicit prefetching and vectorization, and are amenable to parallelization. We present sequential and parallel performance results for PBR on two current multicore architectures, which show that PBR outperforms available alternatives for the matrices to which it is applicable.
ACM Digital Library
Showing the best result for this search. See all results