research-article

Efficient Execution of SpGEMM on Long Vector Architectures

Authors:

Valentin Le Fèvre,

Marc CasasAuthors Info & Claims

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

Pages 101 - 113

https://doi.org/10.1145/3588195.3593000

Published: 07 August 2023 Publication History

Abstract

The Sparse GEneral Matrix-Matrix multiplication (SpGEMM) C=A x B is a fundamental routine extensively used in domains like machine learning or graph analytics. Despite its relevance, the efficient execution of SpGEMM on vector architectures is a relatively unexplored topic. The most recent algorithm to run SpGEMM on these architectures is based on the SParse Accumulator (SPA) approach, and it is relatively efficient for sparse matrices featuring several tens of non-zero coefficients per column as it computes C columns one by one. However, when dealing with matrices containing just a few non-zero coefficients per column, the state-of-the-art algorithm is not able to fully exploit long vector architectures when computing the SpGEMM kernel.

To overcome this issue we propose the SPA paRallel with Sorting (SPARS) algorithm, which computes in parallel several C columns among other optimizations, and the HASH algorithm, which uses dynamically sized hash tables to store intermediate output values. To combine the efficiency of SPA for relatively dense matrix blocks with the high performance that SPARS and HASH deliver for very sparse matrix blocks we propose H-SPA(t) and H-HASH(t), which dynamically switch between different algorithms. H-SPA(t) and H-HASH(t) obtain 1.24x and 1.57x average speed-ups with respect to SPA respectively, over a set of 40 sparse matrices obtained from the SuiteSparse Matrix Collection. For the 22 most sparse matrices, H-SPA(t) and H-HASH(t) deliver 1.42x and 1.99x average speed-ups respectively.

References

[1]

2023. Source code and evaluation files. https://doi.org/10.5281/zenodo.7574444.

[2]

Ilya Afanasyev and Vladimir V. Voevodin. 2020. Developing Efficient Implementations of Connected Component Algorithms for NEC SX-Aurora TSUB-ASA. Lobachevskii Journal of Mathematics 41 (08 2020), 1417--1426. https://doi.org/10.1134/S1995080220080028

[3]

Ilya V. Afanasyev, Vladimir V. Voevodin, Kazuhiko Komatsu, and Hiroaki Kobayashi. 2020. Developing an Efficient Vector-Friendly Implementation of the Breadth-First Search Algorithm for NEC SX-Aurora TSUBASA. In Parallel Computational Technologies, Leonid Sokolinsky and Mikhail Zymbler (Eds.). Springer International Publishing, Cham, 131--145.

[4]

Kadir Akbudak and Cevdet Aykanat. 2017. Exploiting Locality in Sparse Matrix- Matrix Multiplication on Many-Core Architectures. IEEE Transactions on Parallel and Distributed Systems 28, 8 (2017), 2258--2271. https://doi.org/10.1109/TPDS.2017.2656893

Digital Library

[5]

Kadir Akbudak, Oguz Selvitopi, and Cevdet Aykanat. 2018. Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication. ACM Trans. Parallel Comput. 4, 3, Article 13 (jan 2018), 34 pages. https://doi.org/10.1145/3155292

Digital Library

[6]

Pham Nguyen Quang Anh, Rui Fan, and Yonggang Wen. 2016. Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication. In Proceedings of the 2016 International Conference on Supercomputing (Istanbul, Turkey) (ICS '16). Association for Computing Machinery, New York, NY, USA, Article 36, 12 pages. https://doi.org/10.1145/2925426.2926273

Digital Library

[7]

Adrià Armejach, Helena Caminal, Juan Cebrian, Rubén Langarita, Rekai González-Alberquilla, Chris Adeniyi-Jones, Mateo Valero, Marc Casas, and Miquel Moretó. 2020. Using Arm's scalable vector extension on stencil codes. The Journal of Supercomputing 76 (03 2020), 2039--2062. https://doi.org/10.1007/s11227-019-02842--5

[8]

Ariful Azad, Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication. SIAM Journal on Scientific Computing 38, 6 (2016), C624--C651. https://doi.org/10.1137/15M104253X arXiv:https://doi.org/10.1137/15M104253X

[9]

Ariful Azad, Aydin Buluç, and John Gilbert. 2015. Parallel Triangle Counting and Enumeration Using Matrix Algebra. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. 804--811. https://doi.org/10.1109/IPDPSW.2015.75

Digital Library

[10]

Grey Ballard, Christopher Siefert, and Jonathan Hu. 2016. Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multi-grid. SIAM Journal on Scientific Computing 38, 3 (2016), C203--C231. https://doi.org/10.1137/15M1028807 arXiv:https://doi.org/10.1137/15M1028807

Digital Library

[11]

Nathan Bell, Steven Dalton, and Luke N. Olson. 2012. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods. SIAM Journal on Scientific Computing 34, 4 (2012), C123--C152. https://doi.org/10.1137/110838844 arXiv:https://doi.org/10.1137/110838844

Digital Library

[12]

Berenger Bramas. 2017. A Novel Hybrid Quicksort Algorithm Vectorized using AVX-512 on Intel Skylake. International Journal of Advanced Computer Science and Applications 8, 10 (2017). https://doi.org/10.14569/IJACSA.2017.081044

[13]

Aydin Buluc and John R. Gilbert. 2008. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication. In 2008 37th International Conference on Parallel Processing. 503--510. https://doi.org/10.1109/ICPP.2008.45

Digital Library

[14]

RISC-V Community. 2022. RISC-V Vector Extension. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc.

[15]

Steven Dalton, Sean Baxter, Duane Merrill, Luke Olson, and Michael Garland. 2015. Optimizing Sparse Matrix Operations on GPUs Using Merge Path. In 2015 IEEE International Parallel and Distributed Processing Symposium. 407--416. https://doi.org/10.1109/IPDPS.2015.98

Digital Library

[16]

Steven Dalton, Nathan Bell, Luke Olson, and Michael Garland. 2014. Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. http://cusplibrary.github.io/ Version 0.5.0.

[17]

Steven Dalton, Luke Olson, and Nathan Bell. 2015. Optimizing Sparse Matrix-Matrix Multiplication for the GPU. ACM Trans. Math. Softw. 41, 4, Article 25 (oct 2015), 20 pages. https://doi.org/10.1145/2699470

Digital Library

[18]

Timothy A. Davis, Mohsen Aznaveh, and Scott Kolodziej. 2019. Write Quick, Run Fast: Sparse Deep Neural Network in 20 Minutes of Development Time via SuiteSparse:GraphBLAS. In 2019 IEEE High Performance Extreme Computing Conference (HPEC). 1--6. https://doi.org/10.1109/HPEC.2019.8916550

[19]

Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. https://doi.org/10.1145/2049662.2049663

Digital Library

[20]

Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2018. Multi-threaded sparse matrix-matrix multiplication for many-core and GPU architectures. Parallel Comput. 78 (2018), 33--46. https://doi.org/10.1016/j.parco.2018.06.009

Digital Library

[21]

Zhen Du, Jiajia Li, Yinshan Wang, Xueqi Li, Guangming Tan, and Ninghui Sun. 2022. AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse Matrices. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (Dallas, Texas) (SC '22). IEEE Press, Article 66, 15 pages.

[22]

John R. Gilbert, Cleve Moler, and Robert Schreiber. 1992. Sparse Matrices in MATLAB: Design and Implementation. SIAM J. Matrix Anal. Appl. 13, 1 (1992), 333--356. https://doi.org/10.1137/0613024 arXiv:https://doi.org/10.1137/0613024

Digital Library

[23]

Constantino Gómez, Filippo Mantovani, Erich Focht, and Marc Casas. 2021. Efficiently Running SpMV on Long Vector Architectures. In Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Virtual Event, Republic of Korea) (PPoPP '21). Association for Computing Machinery, New York, NY, USA, 292--303. https://doi.org/10.1145/3437801.3441592

Digital Library

[24]

Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw. 4, 3 (sep 1978), 250--269. https://doi.org/10.1145/355791.355796

Digital Library

[25]

Guyue Huang, Guohao Dai, Yu Wang, and Huazhong Yang. 2020. GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1--12. https://doi.org/10.1109/SC41405.2020.00076

[26]

Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç, and Ariful Azad. 2021. Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 90--100. https://doi.org/10.1109/IPDPS49936.2021.00018

[27]

Raehyun Kim, Jaeyoung Choi, and Myungho Lee. 2019. Optimizing Parallel GEMM Routines Using Auto-Tuning with Intel AVX-512. In Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region (Guangzhou, China) (HPC Asia 2019). Association for Computing Machinery, New York, NY, USA, 101--110. https://doi.org/10.1145/3293320.3293334

Digital Library

[28]

Penporn Koanantakool, Ariful Azad, Aydin Buluç, Dmitriy Morozov, Sang-Yun Oh, Leonid Oliker, and Katherine Yelick. 2016. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 842--853. https://doi.org/10.1109/IPDPS.2016.117

[29]

Kazuhiko Komatsu, Shintaro Momose, Yoko Isobe, Osamu Watanabe, Akihiro Musa, Mitsuo Yokokawa, Toshikazu Aoyama, Masayuki Sato, and Hiroaki Kobayashi. 2018. Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. 685--696. https://doi.org/10.1109/SC.2018.00057

Digital Library

[30]

Rakshith Kunchum, Ankur Chaudhry, Aravind Sukumaran-Rajam, Qingpeng Niu, Israt Nisa, and P. Sadayappan. 2017. On Improving Performance of Sparse Matrix-Matrix Multiplication on GPUs. In Proceedings of the International Conference on Supercomputing (Chicago, Illinois) (ICS '17). Association for Computing Machinery, New York, NY, USA, Article 14, 11 pages. https://doi.org/10.1145/3079079.3079106

Digital Library

[31]

Jeongmyung Lee, Seokwon Kang, Yongseung Yu, Yong-Yeon Jo, Sang-Wook Kim, and Yongjun Park. 2020. Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 925--936. https://doi.org/10.1109/ICDE48307.2020.00085

[32]

Jiayu Li, Fugang Wang, Takuya Araki, and Judy Qiu. 2019. Generalized Sparse Matrix-Matrix Multiplication for Vector Engines and Graph Applications. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC). 33--42. https://doi.org/10.1109/MCHPC49590.2019.00012

[33]

Kenli Li, Wangdong Yang, and Keqin Li. 2015. Performance Analysis and Optimization for SpMV on GPU Using Probabilistic Modeling. IEEE Transactions on Parallel and Distributed Systems 26, 1 (2015), 196--205. https://doi.org/10.1109/TPDS.2014.2308221

[34]

Roktaek Lim, Yeongha Lee, Raehyun Kim, and Jaeyoung Choi. 2018. An Implementation of Matrix-Matrix Multiplication on the Intel KNL Processor with AVX-512. Cluster Computing 21, 4 (dec 2018), 1785--1795. https://doi.org/10.1007/s10586-018--2810-y

Digital Library

[35]

Weifeng Liu and Brian Vinter. 2015. A framework for general sparse matrix--matrix multiplication on GPUs and heterogeneous processors. J. Parallel and Distrib. Comput. 85 (2015), 47--61. https://doi.org/10.1016/j.jpdc.2015.06.010 IPDPS 2014 Selected Papers on Numerical and Combinatorial Algorithms.

Digital Library

[36]

Duane Merrill and Michael Garland. 2016. Merge-Based Sparse Matrix-Vector Multiplication (SpMV) Using the CSR Storage Format. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Barcelona, Spain) (PPoPP '16). Association for Computing Machinery, New York, NY, USA, Article 43, 2 pages. https://doi.org/10.1145/2851141.2851190

Digital Library

[37]

Francesco Minervini, Oscar Palomar, Osman Unsal, Enrico Reggiani, Josue Quiroga, Joan Marimon, Carlos Rojas, Roger Figueras, Abraham Ruiz, Alberto Gonzalez, Jonnatan Mendoza, Ivan Vargas, César Hernandez, Joan Cabre, Lina Khoirunisya, Mustapha Bouhali, Julian Pavon, Francesc Moll, Mauro Olivieri, Mario Kovac, Mate Kovac, Leon Dragic, Mateo Valero, and Adrian Cristal. 2023. Vitruvius: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications. ACM Trans. Archit. Code Optim. 20, 2, Article 28 (mar 2023), 25 pages. https://doi.org/10.1145/3575861

Digital Library

[38]

Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydin Buluç. 2019. Performance Optimization, Modeling and Analysis of Sparse Matrix-Matrix Products on Multi-Core and Many-Core Processors. Parallel Comput. 90, C (dec 2019), 13 pages. https://doi.org/10.1016/j.parco.2019.102545

Digital Library

[39]

Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, and Guangming Tan. 2021. TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs. In 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 68--78. https://doi.org/10.1109/IPDPS49936.2021.00016

[40]

Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu. 2022. TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Seoul, Republic of Korea) (PPoPP '22). Association for Computing Machinery, New York, NY, USA, 90--106. https://doi.org/10.1145/3503221.3508431

Digital Library

[41]

Mathias Parger, Martin Winter, Daniel Mlakar, and Markus Steinberger. 2020. spECK: accelerating GPU sparse matrix-matrix multiplication through light-weight analysis. In PPoPP '20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22--26, 2020, Rajiv Gupta and Xipeng Shen (Eds.). ACM, 362--375. https://doi.org/10.1145/3332466.3374521

Digital Library

[42]

Alejandro Rico, José A. Joao, Chris Adeniyi-Jones, and Eric Van Hensbergen. 2017. ARM HPC Ecosystem and the Reemergence of Vectors: Invited Paper. In Proceedings of the Computing Frontiers Conference, CF'17, Siena, Italy, May 15--17, 2017. ACM, 329--334. https://doi.org/10.1145/3075564.3095086

Digital Library

[43]

Yousef Saad. 2003. Iterative Methods for Sparse Linear Systems (second ed.). SIAM. https://doi.org/10.1137/1.9780898718003

[44]

Avinash Sodani, Roger Gramunt, Jesus Corbal, Ho-Seop Kim, Krishna Vinod, Sundaram Chinthamani, Steven Hutsell, Rajat Agarwal, and Yen-Chen Liu. 2016. Knights Landing: Second-Generation Intel Xeon Phi Product. IEEE Micro 36 (2016). https://doi.org/10.1109/MM.2016.25

Digital Library

[45]

Nigel Stephens, Stuart Biles, Matthias Boettcher, Jacob Eapen, Mbou Eyole, Giacomo Gabrielli, Matt Horsnell, Grigorios Magklis, Alejandro Martinez, Nathanael Premillieu, et al . 2017. The ARM scalable vector extension. IEEE micro 37, 2 (2017), 26--39.

[46]

Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A High-Performance Graph Processing Library on the GPU. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Barcelona, Spain) (PPoPP '16). Association for Computing Machinery, New York, NY, USA, Article 11, 12 pages. https://doi.org/10.1145/2851141.2851145

Digital Library

[47]

Yang Wang, Chen Zhang, Zhiqiang Xie, Cong Guo, Yunxin Liu, and Jingwen Leng. 2021. Dual-Side Sparse Tensor Core. In Proceedings of the 48th Annual International Symposium on Computer Architecture (Virtual Event, Spain) (ISCA '21). IEEE Press, 1083--1095. https://doi.org/10.1109/ISCA52012.2021.00088

Digital Library

[48]

Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, and Markus Steinberger. 2019. Adaptive Sparse Matrix-Matrix Multiplication on the GPU. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (Washington, District of Columbia) (PPoPP '19). Association for Computing Machinery, New York, NY, USA, 68--81. https://doi.org/10.1145/3293883.3295701

Digital Library

[49]

Zhen Xie, Guangming Tan, Weifeng Liu, and Ninghui Sun. 2022. A Pattern- Based SpGEMM Library for Multi-Core and Many-Core Architectures. IEEE Transactions on Parallel and Distributed Systems 33, 1 (2022), 159--175. https://doi.org/10.1109/TPDS.2021.3090328

Digital Library

[50]

Yohei Yamada and Shintaro Momose. 2018. Vector engine processor of NEC's brand-new supercomputer SX-Aurora TSUBASA. In Proceedings of A Symposium on High Performance Chips, Hot Chips, Vol. 30. 19--21.

[51]

Dong Zhong, Qinglei Cao, George Bosilca, and Jack Dongarra. 2022. Using long vector extensions for MPI reductions. Parallel Comput. 109 (2022), 102871. https://doi.org/10.1016/j.parco.2021.102871

Digital Library

Cited By

Hong CWang QMao RLiang YXia RLiu J(2024)SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core ProcessorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673054(1166-1175)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673054
Zhang GHsu OKjolstad F(2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656426
Lu YZeng LWang TFu XLi WCheng HYang DJin ZCasas MLiu W(2024)AmgT: Algebraic Multigrid Solver on Tensor CoresSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00058(1-16)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00058

Index Terms

Efficient Execution of SpGEMM on Long Vector Architectures
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Mathematics of computing
  1. Mathematical analysis
    1. Numerical analysis
      1. Computations on matrices

Recommendations

On improving performance of sparse matrix-matrix multiplication on GPUs
ICS '17: Proceedings of the International Conference on Supercomputing

Sparse matrix-matrix multiplication (SpGEMM) is an important primitive for many data analytics algorithms, such as Markov clustering. Unlike the dense case, where performance of matrix-matrix multiplication is considerably higher than matrix-vector ...
Block ILU Preconditioners for a Nonsymmetric Block-Tridiagonal M-Matrix
Abstract
We propose block ILU (incomplete LU) factorization preconditioners for a nonsymmetric block-tridiagonal M-matrix whose computation can be done in parallel based on matrix blocks. Some theoretical properties for these block ILU factorization ...
Adaptive sparse matrix-matrix multiplication on the GPU
PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

In the ongoing efforts targeting the vectorization of linear algebra primitives, sparse matrix-matrix multiplication (SpGEMM) has received considerably less attention than sparse Matrix-Vector multiplication (SpMV). While both are equally important, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing

August 2023

350 pages

ISBN:9798400701559

DOI:10.1145/3588195

General Chair:
Ali R. Butt
Virginia Tech, USA
,
Program Chairs:
Ningfang Mi
Northeastern University, USA
,
Kyle Chard
University of Chicago & Argonne National Laboratory, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HPDC '23

Sponsor:

HPDC '23: The 32nd International Symposium on High-Performance Parallel and Distributed Computing

June 16 - 23, 2023

FL, Orlando, USA

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
332
Total Downloads

Downloads (Last 12 months)215
Downloads (Last 6 weeks)19

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hong CWang QMao RLiang YXia RLiu J(2024)SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core ProcessorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673054(1166-1175)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673054
Zhang GHsu OKjolstad F(2024)Compilation of Modular and General Sparse WorkspacesProceedings of the ACM on Programming Languages10.1145/36564268:PLDI(1213-1238)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656426
Lu YZeng LWang TFu XLi WCheng HYang DJin ZCasas MLiu W(2024)AmgT: Algebraic Multigrid Solver on Tensor CoresSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00058(1-16)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00058

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents