Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- short-paperJuly 2024
Performance of Molecular Dynamics Acceleration Strategies on Composable Cyberinfrastructure
- Richard Lawrence,
- Dhruva K. Chakravorty,
- Lisa M. Perez,
- Honggao Liu,
- Zhenhua He,
- Wesley Brashear,
- Joshua Winchell,
- James X. Mao,
- Chun-Yaung Lu
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 46, Pages 1–5https://doi.org/10.1145/3626203.3670631Modern powerful accelerators and composable infrastructures put our simulation frameworks to the test. We will show that the acceleration of a simulation framework is absolutely critical for good performance and scaling. Building on our previous work ...
- research-articleNovember 2023
Moment Representation of Regularized Lattice Boltzmann Methods on NVIDIA and AMD GPUs
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1697–1704https://doi.org/10.1145/3624062.3624250The lattice Boltzmann method is a highly scalable Navier-Stokes solver that has been applied to flow problems in a wide array of domains. However, the method is bandwidth-bound on modern GPU accelerators and has a large memory footprint. In this paper, ...
- research-articleNovember 2023
GPUscout: Locating Data Movement-related Bottlenecks on GPUs
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1392–1402https://doi.org/10.1145/3624062.3624208GPUs pose an attractive opportunity for delivering high-performance applications. However, GPU codes are often limited due to memory contention, resulting in overall performance degradation. Since GPU scheduling is transparent to the user, and GPU ...
- research-articleNovember 2023
Many Cores, Many Models: GPU Programming Model vs. Vendor Compatibility Overview
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1019–1026https://doi.org/10.1145/3624062.3624178In recent history, GPUs became a key driver of compute performance in HPC. With the installation of the Frontier supercomputer, they became the enablers of the Exascale era; further largest-scale installations are in progress (Aurora, El Capitan, ...
- research-articleJuly 2022
Ginkgo—A math library designed for platform portability
Parallel Computing (PACO), Volume 111, Issue Chttps://doi.org/10.1016/j.parco.2022.102902AbstractIn an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime ...
Highlights- We discuss the Ginkgo design separating the numerical core from the architecture-specific backends written in the architecture-specific language to allow for ...
-
- ArticleJune 2022
Identifying, Evaluating, and Addressing Nondeterminism in Mask R-CNNs
Pattern Recognition and Artificial IntelligencePages 3–14https://doi.org/10.1007/978-3-031-09037-0_1AbstractConvolutional Neural Networks, and many other machine learning algorithms, use Graphical Processing Units (GPUs) instead of Central Processing Units (CPUs) to improve the training time of very large modeling computations. This work evaluates the ...
- research-articleNovember 2021
In-depth analyses of unified virtual memory system for GPU accelerated computing
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 64, Pages 1–15https://doi.org/10.1145/3458817.3480855The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for the ease of use provided by systems-managed memory space with a moderate-to-high ...
- ArticleNovember 2020
Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs
- Joshua Hoke Davis,
- Christopher Daley,
- Swaroop Pophale,
- Thomas Huber,
- Sunita Chandrasekaran,
- Nicholas J. Wright
Accelerator Programming Using DirectivesPages 25–44https://doi.org/10.1007/978-3-030-74224-9_2AbstractHeterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today’s systems to ...
- research-articleNovember 2020
GPU lifetimes on titan supercomputer: survival analysis and reliability
- George Ostrouchov,
- Don Maxwell,
- Rizwan A. Ashraf,
- Christian Engelmann,
- Mallikarjun Shankar,
- James H. Rogers
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 41, Pages 1–14The Cray XK7 Titan was the top supercomputer system in the world for a long time and remained critically important throughout its nearly seven year life. It was an interesting machine from a reliability viewpoint as most of its power came from 18,688 ...
- ArticleSeptember 2020
A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload
OpenMP: Portable Multi-Level Parallelism on Modern SystemsPages 37–51https://doi.org/10.1007/978-3-030-58144-2_3AbstractThe HPGMG benchmark is a non-trivial Multigrid benchmark used to evaluate system performance. We ported this benchmark from CUDA to OpenMP target offload and added the capability to use explicit data management rather than managed memory. Our ...
- ArticleJune 2020
Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On
High Performance ComputingPages 309–327https://doi.org/10.1007/978-3-030-50743-5_16AbstractEfficiently processing sparse matrices is a central and performance-critical part of many scientific simulation codes. Recognizing the adoption of manycore accelerators in HPC, we evaluate in this paper the performance of the currently best ...
- research-articleMay 2018
Stochastic first passage time accelerated with CUDA
Journal of Computational Physics (JOCP), Volume 361, Issue CPages 136–149https://doi.org/10.1016/j.jcp.2018.01.039Highlights- Parallelization with GPU is proposed for fast computation of first passage time.
The numerical integration of stochastic trajectories to estimate the time to pass a threshold is an interesting physical quantity, for instance in Josephson junctions and atomic force microscopy, where the full trajectory is not ...
- research-articleJune 2017
Fast segmented sort on GPUs
ICS '17: Proceedings of the International Conference on SupercomputingArticle No.: 12, Pages 1–10https://doi.org/10.1145/3079079.3079105Segmented sort, as a generalization of classical sort, orders a batch of independent segments in a whole array. Along with the wider adoption of manycore processors for HPC and big data applications, segmented sort plays an increasingly important role ...
- research-articleMay 2017
GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs
CF'17: Proceedings of the Computing Frontiers ConferencePages 107–116https://doi.org/10.1145/3075564.3075583Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly ...
- articleOctober 2016
Towards implementation of residual-feedback GMDH neural network on parallel GPU memory guided by a regression curve
The Journal of Supercomputing (JSCO), Volume 72, Issue 10Pages 3993–4020https://doi.org/10.1007/s11227-016-1740-9GMDH, which stands for Group Method Data Handling, is an evolutionary type of neural network. It has received much attention in the supercomputing research community because of its ability to optimize its internal structure for maximum prediction ...
- articleOctober 2016
GPU-enabled back-propagation artificial neural network for digit recognition in parallel
The Journal of Supercomputing (JSCO), Volume 72, Issue 10Pages 3868–3886https://doi.org/10.1007/s11227-016-1633-yIn this paper, we show that the GPU (graphics processing unit) can be used not only for processing graphics, but also for high speed computing. We provide a comparison between the times taken on the CPU and GPU to perform the training and testing of a ...
- research-articleSeptember 2015
Collision detection of convex polyhedra on the NVIDIA GPU architecture for the discrete element method
Applied Mathematics and Computation (APMC), Volume 267, Issue CPages 810–829https://doi.org/10.1016/j.amc.2014.10.013Convex polyhedra represent granular media well. This geometric representation may be critical in obtaining realistic simulations of many industrial processes using the discrete element method (DEM). However detecting collisions between the polyhedra and ...
- research-articleJuly 2015
A Performance Model for GPUs with Caches
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 26, Issue 7Pages 1800–1813https://doi.org/10.1109/TPDS.2014.2333526To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, ...
- research-articleAugust 2014
Galactica: A GPU Parallelized Database Accelerator
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and ComputingArticle No.: 29, Pages 1–2https://doi.org/10.1145/2640087.2644180The amount of business data generated and collected is increasing exponentially every year. There are studies pointing out that using GPU as a general-purpose computing device has limitations. In order to exploit current GPU computing capabilities for ...
- research-articleAugust 2014
Galactica: A GPU Parallelized Database Accelerator
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and ComputingArticle No.: 10, Pages 1–4https://doi.org/10.1145/2640087.2644166The amount of business data generated and collected is increasing exponentially every year. A Graphics Processing Unit (GPU) is not used for only optimization of image filtering and video processing, but is also widely adopted for accelerating big data ...