Keyword: NVIDIA : Search

Applied Filters

People

Publications

Reproducibility Badges

Publication Date

30 Results for: Keyword: NVIDIAEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,802,574 records)|Limit your search to The ACM Full-Text Collection (772,074 records)

Showing 1 - 20of30 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

short-paper
Open Access
July 2024
Performance of Molecular Dynamics Acceleration Strategies on Composable Cyberinfrastructure
PEARC '24: Practice and Experience in Advanced Research Computing 2024: Human Powered ComputingArticle No.: 46, Pages 1–5https://doi.org/10.1145/3626203.3670631

Modern powerful accelerators and composable infrastructures put our simulation frameworks to the test. We will show that the acceleration of a simulation framework is absolutely critical for good performance and scaling. Building on our previous work ...
0
195
Metrics
Total Citations0
Total Downloads195
Last 12 Months195
Last 6 weeks29
View online with eReader
View this article in HTML format
PDF
research-article
November 2023
Moment Representation of Regularized Lattice Boltzmann Methods on NVIDIA and AMD GPUs
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1697–1704https://doi.org/10.1145/3624062.3624250

The lattice Boltzmann method is a highly scalable Navier-Stokes solver that has been applied to flow problems in a wide array of domains. However, the method is bandwidth-bound on modern GPU accelerators and has a large memory footprint. In this paper, ...
0
84
Metrics
Total Citations0
Total Downloads84
Last 12 Months54
Last 6 weeks1
1
Supplementary Material
moment_representation_of_regularized_lattice_boltzmann_methods_on_nvidia_and_amd_gpus (1080p).mp4
Get Access
research-article
November 2023
GPUscout: Locating Data Movement-related Bottlenecks on GPUs
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1392–1402https://doi.org/10.1145/3624062.3624208

GPUs pose an attractive opportunity for delivering high-performance applications. However, GPU codes are often limited due to memory contention, resulting in overall performance degradation. Since GPU scheduling is transparent to the user, and GPU ...
1
104
Metrics
Total Citations1
Total Downloads104
Last 12 Months65
Last 6 weeks5
1
Supplementary Material
gpuscout-_locating_data_movement-related_bottlenecks_on_gpus (1080p).mp4
Get Access
research-article
November 2023
Many Cores, Many Models: GPU Programming Model vs. Vendor Compatibility Overview
- Andreas Herten
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 1019–1026https://doi.org/10.1145/3624062.3624178

In recent history, GPUs became a key driver of compute performance in HPC. With the installation of the Frontier supercomputer, they became the enablers of the Exascale era; further largest-scale installations are in progress (Aurora, El Capitan, ...
2
176
Metrics
Total Citations2
Total Downloads176
Last 12 Months144
Last 6 weeks17
1
Supplementary Material
many_cores,_many_models-_gpu_programming_model_vs._vendor_compatibility_overview (1080p).mp4
Get Access
research-article
July 2022
Ginkgo—A math library designed for platform portability
Parallel Computing (PACO), Volume 111, Issue Chttps://doi.org/10.1016/j.parco.2022.102902
Abstract
In an era of increasing computer system diversity, the portability of software from one system to another plays a central role. Software portability is important for the software developers as many software projects have a lifetime ...
Highlights

We discuss the Ginkgo design separating the numerical core from the architecture-specific backends written in the architecture-specific language to allow for ...
2
Metrics
Total Citations2
Upcoming Conferences

CF '25

May 28 - 30, 2025

Hotel Regina Margherita, Cagliari, Italy

CF '25 Website

SC '25

November 16 - 21, 2025

America's Center, St Louis, MO, USA

SC '25 Website
Article
June 2022
Identifying, Evaluating, and Addressing Nondeterminism in Mask R-CNNs
- Stephen Price,
- Rodica Neamtu
Pattern Recognition and Artificial IntelligencePages 3–14https://doi.org/10.1007/978-3-031-09037-0_1
Abstract
Convolutional Neural Networks, and many other machine learning algorithms, use Graphical Processing Units (GPUs) instead of Central Processing Units (CPUs) to improve the training time of very large modeling computations. This work evaluates the ...
0
Metrics
Total Citations0
research-article
Public Access
November 2021
Results Reproduced / v1.1
Artifacts Evaluated & Functional / v1.1
Artifacts Available / v1.1
In-depth analyses of unified virtual memory system for GPU accelerated computing
- Tyler Allen,
- Rong Ge
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 64, Pages 1–15https://doi.org/10.1145/3458817.3480855

The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for the ease of use provided by systems-managed memory space with a moderate-to-high ...
22
3,018
Metrics
Total Citations22
Total Downloads3,018
Last 12 Months1,309
Last 6 weeks128
1
Supplementary Material
In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated Computing.mp4.mp4
View online with eReader
PDF
Article
November 2020
Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs
Accelerator Programming Using DirectivesPages 25–44https://doi.org/10.1007/978-3-030-74224-9_2
Abstract
Heterogeneous systems are becoming increasingly prevalent. In order to exploit the rich compute resources of such systems, robust programming models are needed for application developers to seamlessly migrate legacy code from today’s systems to ...
8
Metrics
Total Citations8
research-article
November 2020
GPU lifetimes on titan supercomputer: survival analysis and reliability
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 41, Pages 1–14

The Cray XK7 Titan was the top supercomputer system in the world for a long time and remained critically important throughout its nearly seven year life. It was an interesting machine from a reliability viewpoint as most of its power came from 18,688 ...
1
220
Metrics
Total Citations1
Total Downloads220
Last 12 Months21
Last 6 weeks5
Get Access
Article
September 2020
A Case Study of Porting HPGMG from CUDA to OpenMP Target Offload
OpenMP: Portable Multi-Level Parallelism on Modern SystemsPages 37–51https://doi.org/10.1007/978-3-030-58144-2_3
Abstract
The HPGMG benchmark is a non-trivial Multigrid benchmark used to evaluate system performance. We ported this benchmark from CUDA to OpenMP target offload and added the capability to use explicit data management rather than managed memory. Our ...
3
Metrics
Total Citations3
Article
June 2020
Sparse Linear Algebra on AMD and NVIDIA GPUs – The Race Is On
High Performance ComputingPages 309–327https://doi.org/10.1007/978-3-030-50743-5_16
Abstract
Efficiently processing sparse matrices is a central and performance-critical part of many scientific simulation codes. Recognizing the adoption of manycore accelerators in HPC, we evaluate in this paper the performance of the currently best ...
3
Metrics
Total Citations3
research-article
May 2018
Stochastic first passage time accelerated with CUDA
Journal of Computational Physics (JOCP), Volume 361, Issue CPages 136–149https://doi.org/10.1016/j.jcp.2018.01.039
Highlights

Parallelization with GPU is proposed for fast computation of first passage time.
Abstract
The numerical integration of stochastic trajectories to estimate the time to pass a threshold is an interesting physical quantity, for instance in Josephson junctions and atomic force microscopy, where the full trajectory is not ...
0
Metrics
Total Citations0
research-article
Public Access
June 2017
Fast segmented sort on GPUs
ICS '17: Proceedings of the International Conference on SupercomputingArticle No.: 12, Pages 1–10https://doi.org/10.1145/3079079.3079105

Segmented sort, as a generalization of classical sort, orders a batch of independent segments in a whole array. Along with the wider adoption of manycore processors for HPC and big data applications, segmented sort plays an increasingly important role ...
45
1,804
Metrics
Total Citations45
Total Downloads1,804
Last 12 Months380
Last 6 weeks61
View online with eReader
PDF
research-article
Public Access
May 2017
GPU-UniCache: Automatic Code Generation of Spatial Blocking for Stencils on GPUs
CF'17: Proceedings of the Computing Frontiers ConferencePages 107–116https://doi.org/10.1145/3075564.3075583

Spatial blocking is a critical memory-access optimization to efficiently exploit the computing resources of parallel processors, such as many-core GPUs. By reusing cache-loaded data over multiple spatial iterations, spatial blocking can significantly ...
12
604
Metrics
Total Citations12
Total Downloads604
Last 12 Months111
Last 6 weeks10
View online with eReader
PDF
article
October 2016
Towards implementation of residual-feedback GMDH neural network on parallel GPU memory guided by a regression curve
The Journal of Supercomputing (JSCO), Volume 72, Issue 10Pages 3993–4020https://doi.org/10.1007/s11227-016-1740-9

GMDH, which stands for Group Method Data Handling, is an evolutionary type of neural network. It has received much attention in the supercomputing research community because of its ability to optimize its internal structure for maximum prediction ...
1
Metrics
Total Citations1
article
October 2016
GPU-enabled back-propagation artificial neural network for digit recognition in parallel
The Journal of Supercomputing (JSCO), Volume 72, Issue 10Pages 3868–3886https://doi.org/10.1007/s11227-016-1633-y

In this paper, we show that the GPU (graphics processing unit) can be used not only for processing graphics, but also for high speed computing. We provide a comparison between the times taken on the CPU and GPU to perform the training and testing of a ...
1
Metrics
Total Citations1
research-article
September 2015
Collision detection of convex polyhedra on the NVIDIA GPU architecture for the discrete element method
Applied Mathematics and Computation (APMC), Volume 267, Issue CPages 810–829https://doi.org/10.1016/j.amc.2014.10.013

Convex polyhedra represent granular media well. This geometric representation may be critical in obtaining realistic simulations of many industrial processes using the discrete element method (DEM). However detecting collisions between the polyhedra and ...
1
Metrics
Total Citations1
research-article
July 2015
A Performance Model for GPUs with Caches
IEEE Transactions on Parallel and Distributed Systems (TPDS), Volume 26, Issue 7Pages 1800–1813https://doi.org/10.1109/TPDS.2014.2333526
To exploit the abundant computational power of the world's fastest supercomputers, an even workload distribution to the typically heterogeneous compute devices is necessary. While relatively accurate performance models exist for conventional CPUs, ...
8
Metrics
Total Citations8
research-article
August 2014
Galactica: A GPU Parallelized Database Accelerator
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and ComputingArticle No.: 29, Pages 1–2https://doi.org/10.1145/2640087.2644180

The amount of business data generated and collected is increasing exponentially every year. There are studies pointing out that using GPU as a general-purpose computing device has limitations. In order to exploit current GPU computing capabilities for ...
2
85
Metrics
Total Citations2
Total Downloads85
Last 12 Months2
Last 6 weeks1
Get Access
research-article
August 2014
Galactica: A GPU Parallelized Database Accelerator
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and ComputingArticle No.: 10, Pages 1–4https://doi.org/10.1145/2640087.2644166

The amount of business data generated and collected is increasing exponentially every year. A Graphics Processing Unit (GPU) is not used for only optimization of image filtering and video processing, but is also widely adopted for accelerating big data ...
2
211
Metrics
Total Citations2
Total Downloads211
Last 12 Months13
Last 6 weeks1
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences