Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Acceleration of Tensor-Product Operations with Tensor Cores
ACM Transactions on Parallel Computing (TOPC), Volume 11, Issue 4Article No.: 15, Pages 1–24https://doi.org/10.1145/3695466In this article, we explore the acceleration of tensor product operations in finite element methods, leveraging the computational power of the NVIDIA A100 GPU Tensor Cores. We provide an accessible overview of the necessary mathematical background and ...
- introductionDecember 2023
- research-articleMarch 2023
Non-overlapping High-accuracy Parallel Closure for Compact Schemes: Application in Multiphysics and Complex Geometry
ACM Transactions on Parallel Computing (TOPC), Volume 10, Issue 1Article No.: 1, Pages 1–28https://doi.org/10.1145/3580005Compact schemes are often preferred in performing scientific computing for their superior spectral resolution. Error-free parallelization of a compact scheme is a challenging task due to the requirement of additional closures at the inter-processor ...
- research-articleNovember 2020
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-core and Many-core Systems
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 22, Pages 1–45https://doi.org/10.1145/3418075Sparse matrix-vector multiplication (SpMV) operations are commonly used in various scientific and engineering applications. The performance of the SpMV operation often depends on exploiting regularity patterns in the matrix. Various representations and ...
- research-articleOctober 2020
A High Accuracy Preserving Parallel Algorithm for Compact Schemes for DNS
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 21, Pages 1–32https://doi.org/10.1145/3418073A new accuracy-preserving parallel algorithm employing compact schemes is presented for direct numerical simulation of the Navier-Stokes equations. Here the connotation of accuracy preservation is having the same level of accuracy obtained by the ...
- research-articleJune 2020
Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 3Article No.: 20, Pages 1–30https://doi.org/10.1145/3399736Traditional solution approaches for problems in quantum mechanics scale as O(M3), where M is the number of electrons. Various methods have been proposed to address this issue and obtain a linear scaling O(M). One promising formulation is the direct ...
- research-articleMarch 2020
Load-balancing Sparse Matrix Vector Product Kernels on GPUs
- Hartwig Anzt,
- Terry Cojean,
- Chen Yen-Chen,
- Jack Dongarra,
- Goran Flegar,
- Pratik Nayak,
- Stanimire Tomov,
- Yuhsiang M. Tsai,
- Weichung Wang
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 1Article No.: 2, Pages 1–26https://doi.org/10.1145/3380930Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that ...
- research-articleMarch 2020
Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 1Article No.: 4, Pages 1–19https://doi.org/10.1145/3380934We describe the application of a communication-reduction technique for the PageRank algorithm that dynamically adapts the precision of the data access to the numerical requirements of the algorithm as the iteration converges. Our variable-precision ...
- research-articleMay 2019
Multigrid for Matrix-Free High-Order Finite Element Computations on Graphics Processors
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 1Article No.: 2, Pages 1–32https://doi.org/10.1145/3322813This article presents matrix-free finite-element techniques for efficiently solving partial differential equations on modern many-core processors, such as graphics cards. We develop a GPU parallelization of a matrix-free geometric multigrid iterative ...
- research-articleJanuary 2018
Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication
ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 3Article No.: 13, Pages 1–34https://doi.org/10.1145/3155292We investigate outer-product--parallel, inner-product--parallel, and row-by-row-product--parallel formulations of sparse matrix-matrix multiplication (SpGEMM) on distributed memory architectures. For each of these three formulations, we propose a ...
- research-articleJanuary 2017
Trade-Offs Between Synchronization, Communication, and Computation in Parallel Linear Algebra Computations
ACM Transactions on Parallel Computing (TOPC), Volume 3, Issue 1Article No.: 3, Pages 1–47https://doi.org/10.1145/2897188This article derives trade-offs between three basic costs of a parallel algorithm: synchronization, data movement, and computational cost. These trade-offs are lower bounds on the execution time of the algorithm that are independent of the number of ...
- research-articleDecember 2016
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication
ACM Transactions on Parallel Computing (TOPC), Volume 3, Issue 3Article No.: 18, Pages 1–34https://doi.org/10.1145/3015144We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication (SpGEMM), a key computational kernel in scientific computing and data analysis whose performance is often communication bound. This model correctly describes both the ...
- research-articleSeptember 2015
Profitable Scheduling on Multiple Speed-Scalable Processors
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 3Article No.: 19, Pages 1–19https://doi.org/10.1145/2809872We present a new online algorithm for profit-oriented scheduling on multiple speed-scalable processors and provide a tight analysis of the algorithm’s competitiveness. Our results generalize and improve upon work by Chan et al. [2010], which considers a ...
- research-articleSeptember 2015
Work-Efficient Matrix Inversion in Polylogarithmic Time
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 3Article No.: 15, Pages 1–29https://doi.org/10.1145/2809812We present an algorithm for inversion of symmetric positive definite matrices that combines the practical requirement of an optimal number of arithmetic operations and the theoretical goal of a polylogarithmic critical path length. The algorithm reduces ...
- research-articleApril 2015
Noise-Tolerant Explicit Stencil Computations for Nonuniform Process Execution Rates
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 1Article No.: 7, Pages 1–33https://doi.org/10.1145/2742351Next-generation HPC computing platforms are likely to be characterized by significant, unpredictable nonuniformities in execution time among compute nodes and cores. The resulting load imbalances from this nonuniformity are expected to arise from a ...
- research-articleFebruary 2015
Avoiding Communication in Successive Band Reduction
ACM Transactions on Parallel Computing (TOPC), Volume 1, Issue 2Article No.: 11, Pages 1–37https://doi.org/10.1145/2686877The running time of an algorithm depends on both arithmetic and communication (i.e., data movement) costs, and the relative costs of communication are growing over time. In this work, we present sequential and distributed-memory parallel algorithms for ...