Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2025
Taking One for the Team: Trading Overhead and Blocking for Optimal Critical-Section Granularity with a Shared GPU
RTNS '24: Proceedings of the 32nd International Conference on Real-Time Networks and SystemsPages 94–104https://doi.org/10.1145/3696355.3696368Conventional wisdom maintains that the interval of time mutually exclusive access is granted to a shared resource (i.e., in a critical section) should be as short as possible. However, the arbitration of shared-resource accesses introduces overhead. As a ...
- research-articleNovember 2024
Refining HPCToolkit for application performance analysis at exascale
- Laksono Adhianto,
- Jonathon Anderson,
- Robert Matthew Barnett,
- Dragana Grbic,
- Vladimir Indic,
- Mark Krentel,
- Yumeng Liu,
- Srđan Milaković,
- Wileam Phan,
- John Mellor-Crummey,
- Michael Heroux
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 6Pages 612–632https://doi.org/10.1177/10943420241277839As part of the US Department of Energy’s Exascale Computing Project (ECP), Rice University has been refining its HPCToolkit performance tools to better support measurement and analysis of applications executing on exascale supercomputers. To efficiently ...
- research-articleFebruary 2024
Fast Kronecker Matrix-Matrix Multiplication on GPUs
PPoPP '24: Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingPages 390–403https://doi.org/10.1145/3627535.3638489Kronecker Matrix-Matrix Multiplication (Kron-Matmul) is the multiplication of a matrix with the Kronecker Product of several smaller matrices. Kron-Matmul is a core operation for many scientific and machine learning computations. State-of-the-art Kron-...
- research-articleMarch 2023
Acceleration of a parallel BDDC solver by using graphics processing units on subdomains
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 37, Issue 2Pages 151–164https://doi.org/10.1177/10943420221136873An approach to accelerating a parallel domain decomposition (DD) solver by graphics processing units (GPUs) is investigated. The solver is based on the Balancing Domain Decomposition Method by Constraints (BDDC), which is a nonoverlapping DD technique. ...
- research-articleMarch 2023
PeleC: An adaptive mesh refinement solver for compressible reacting flows
- Marc T Henry de Frahan,
- Jon S Rood,
- Marc S Day,
- Hariswaran Sitaraman,
- Shashank Yellapantula,
- Bruce A Perry,
- Ray W Grout,
- Ann Almgren,
- Weiqun Zhang,
- John B Bell,
- Jacqueline H Chen
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 37, Issue 2Pages 115–131https://doi.org/10.1177/10943420221121151Reacting flow simulations for combustion applications require extensive computing capabilities. Leveraging the AMReX library, the Pele suite of combustion simulation tools targets the largest supercomputers available and future exascale machines. We ...
-
- research-articleMarch 2023
Compressed basis GMRES on high-performance graphics processing units
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 37, Issue 2Pages 82–100https://doi.org/10.1177/10943420221115140Krylov methods provide a fast and highly parallel numerical tool for the iterative solution of many large-scale sparse linear systems. To a large extent, the performance of practical realizations of these methods is constrained by the communication ...
- research-articleJanuary 2023
- research-articleDecember 2023
Designing Virtual Memory System of MCM GPUs
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 404–422https://doi.org/10.1109/MICRO56248.2022.00036Multi-Chip Module (MCM) designs have emerged as a key technique to scale up a GPU's compute capabilities in the face of slowing transistor technology. However, the disaggregated nature of MCM GPUs with many chiplets connected via in-package ...
- research-articleJune 2022
GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs
ISCA '22: Proceedings of the 49th Annual International Symposium on Computer ArchitecturePages 424–436https://doi.org/10.1145/3470496.3527384Analytical models can greatly help computer architects perform orders of magnitude faster early-stage design space exploration than using cycle-level simulators. To facilitate rapid design space exploration for graphics processing units (GPUs), prior ...
- research-articleMarch 2022
Development of a hardware-accelerated simulation kernel for ultra-high vacuum with Nvidia RTX GPUs
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 36, Issue 2Pages 141–152https://doi.org/10.1177/10943420211056654Molflow+ is a Monte Carlo (MC) simulation software for ultra-high vacuum, mainly used to simulate pressure in particle accelerators. In this article, we present and discuss the design choices arising in a new implementation of its ray-tracing–based ...
- surveyJanuary 2022
Query Processing on Heterogeneous CPU/GPU Systems
ACM Computing Surveys (CSUR), Volume 55, Issue 1Article No.: 11, Pages 1–38https://doi.org/10.1145/3485126Due to their high computational power and internal memory bandwidth, graphic processing units (GPUs) have been extensively studied by the database systems research community. A heterogeneous query processing system that employs CPUs and GPUs at the same ...
- research-articleJanuary 2022
Optimisation of plagiarism detection using vector space model on CUDA architecture
International Journal of Innovative Computing and Applications (IJICA), Volume 13, Issue 4Pages 232–244https://doi.org/10.1504/ijica.2022.125675Plagiarism is a rapidly rising issue among students during submission of assignments, reports and publications in universities and educational institutions, due to easy accessibility of abundant e-resources on the internet. Existing tools become ...
- research-articleOctober 2021
Multi‐stream adaptive spatial‐temporal attention graph convolutional network for skeleton‐based action recognition
IET Computer Vision (CVI2), Volume 16, Issue 2Pages 143–158https://doi.org/10.1049/cvi2.12075AbstractSkeleton‐based action recognition algorithms have been widely applied to human action recognition. Graph convolutional networks (GCNs) generalize convolutional neural networks (CNNs) to non‐Euclidean graphs and achieve significant performance in ...
- research-articleOctober 2021
Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
ACM Transactions on Parallel Computing (TOPC), Volume 8, Issue 4Article No.: 20, Pages 1–23https://doi.org/10.1145/3470644A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This ...
- research-articleNovember 2021
Dual-side sparse tensor core
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer ArchitecturePages 1083–1095https://doi.org/10.1109/ISCA52012.2021.00088Leveraging sparsity in deep neural network (DNN) models is promising for accelerating model inference. Yet existing GPUs can only leverage the sparsity from weights but not activations, which are dynamic, unpredictable, and hence challenging to exploit. ...
- research-articleJuly 2021
Exploring AMD GPU Scheduling Details by Experimenting With “Worst Practices”
RTNS '21: Proceedings of the 29th International Conference on Real-Time Networks and SystemsPages 24–34https://doi.org/10.1145/3453417.3453432Graphics processing units (GPUs) have been the target of a significant body of recent real-time research, but research is often hampered by the “black box” nature of GPU hardware and software. Now that one GPU manufacturer, AMD, has embraced an open-...
- research-articleMarch 2021
Implicit Hari–Zimmermann algorithm for the generalized SVD on the GPUs
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 35, Issue 2Pages 170–205https://doi.org/10.1177/1094342020972772A parallel, blocked, one-sided Hari–Zimmermann algorithm for the generalized singular value decomposition (GSVD) of a real or a complex matrix pair ( F , G ) is here proposed, where F and G have the same number of columns, and are both of the ...
- research-articleNovember 2020
Sparse GPU kernels for deep learning
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 17, Pages 1–14Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these ...
- research-articleSeptember 2020
Model-Based Warp Overlapped Tiling for Image Processing Programs on GPUs
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 317–328https://doi.org/10.1145/3410463.3414649Domain-specific languages that execute image processing pipelines on GPUs, such as Halide and Forma, operate by 1)~dividing the image into overlapped tiles, and 2)~fusing loops to improve memory locality. However, current approaches have limitations: 1)~...
- research-articleSeptember 2020
ScoRD: a scoped race detector for GPUs
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer ArchitecturePages 1036–1049https://doi.org/10.1109/ISCA45697.2020.00088GPUs have emerged as a key computing platform for an ever-growing range of applications. Unlike traditional bulk-synchronous GPU programs, many emerging GPU-accelerated applications, such as graph processing, have irregular interaction among the ...