General programming languages

Applied Filters

People

Publications

Conferences

Publication Date

16 Results for: Book/Issue: ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,829,152 records)|Limit your search to The ACM Full-Text Collection (772,633 records)

Showing 1 - 16of16 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
Public Access
June 2015
A Nested Partitioning Algorithm for Adaptive Meshes on Heterogeneous Clusters
- Hari Sundar,
- Omar Ghattas
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 319–328https://doi.org/10.1145/2751205.2751246

In the era of the accelerator, load balancing strategies that are well-understood for traditional homogeneous supercomputers must be re-worked in order to address the problem of distributing work across heterogeneous hardware such that neither the CPU ...
8
435
Metrics
Total Citations8
Total Downloads435
Last 12 Months54
Last 6 weeks8
View online with eReader
PDF
research-article
June 2015
Automatic Energy Efficient Parallelization of Uniform Dependence Computations
- Yun Zou,
- Sanjay Rajopadhye
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 373–382https://doi.org/10.1145/2751205.2751245

Energy is now a critical concern in all aspects of computing. We address a class of programs that includes the so-called "stencil computations" that have already been optimized for speed. We target the energy expended in dynamic memory accesses, since ...
1
221
Metrics
Total Citations1
Total Downloads221
Last 12 Months0
Last 6 weeks0
Get Access
research-article
June 2015
PeerWave: Exploiting Wavefront Parallelism on GPUs with Peer-SM Synchronization
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 25–35https://doi.org/10.1145/2751205.2751243

Nested loops with regular iteration dependencies span a large class of applications ranging from string matching to linear system solvers. Wavefront parallelism is a well-known technique to enable concurrent processing of such applications and is widely ...
14
467
Metrics
Total Citations14
Total Downloads467
Last 12 Months11
Last 6 weeks1
Get Access
research-article
Public Access
June 2015
Composing Algorithmic Skeletons to Express High-Performance Scientific Applications
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 415–424https://doi.org/10.1145/2751205.2751241

Algorithmic skeletons are high-level representations for parallel programs that hide the underlying parallelism details from program specification. These skeletons are defined in terms of higher-order functions that can be composed to build larger ...
15
642
Metrics
Total Citations15
Total Downloads642
Last 12 Months65
Last 6 weeks9
View online with eReader
PDF
research-article
June 2015
DaCache: Memory Divergence-Aware GPU Cache Management
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 89–98https://doi.org/10.1145/2751205.2751239

The lock-step execution model of GPU requires a warp to have the data blocks for all its threads before execution. However, there is a lack of salient cache mechanisms that can recognize the need of managing GPU cache blocks at the warp level for ...
30
620
Metrics
Total Citations30
Total Downloads620
Last 12 Months42
Last 6 weeks8
Get Access
research-article
June 2015
Unique Worker model for OpenMP
- Raghesh Aloor,
- V. Krishna Nandivada
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 47–56https://doi.org/10.1145/2751205.2751238

In OpenMP, because of the underlying efficient 'team of workers' model, each worker is given a chunk of tasks (iterations of a parallel-for-loop, or sections in a parallel-sections block), and a barrier construct is used to synchronize the workers (not ...
5
237
Metrics
Total Citations5
Total Downloads237
Last 12 Months3
Last 6 weeks1
Get Access
research-article
June 2015
Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 329–338https://doi.org/10.1145/2751205.2751235

Current and future parallel programming models need to be portable and efficient when moving to heterogeneous multi-core systems. OmpSs is a task-based programming model with dependency tracking and dynamic scheduling. This paper describes the OmpSs ...
41
694
Metrics
Total Citations41
Total Downloads694
Last 12 Months37
Last 6 weeks1
Get Access
research-article
Public Access
June 2015
STAPL-RTS: An Application Driven Runtime System
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 425–434https://doi.org/10.1145/2751205.2751233

Modern HPC systems are growing in complexity, as they move towards deeper memory hierarchies and increasing use of computational heterogeneity via GPUs or other accelerators. When developing applications for these platforms, programmers are faced with ...
5
470
Metrics
Total Citations5
Total Downloads470
Last 12 Months56
Last 6 weeks6
View online with eReader
PDF
research-article
June 2015
Fine-Grained Synchronizations and Dataflow Programming on GPUs
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 109–118https://doi.org/10.1145/2751205.2751232

The last decade has witnessed the blooming emergence of many-core platforms, especially the graphic processing units (GPUs). With the exponential growth of cores in GPUs, utilizing them efficiently becomes a challenge. The data-parallel programming ...
39
411
Metrics
Total Citations39
Total Downloads411
Last 12 Months38
Last 6 weeks3
Get Access
research-article
Public Access
June 2015
Parameterized Diamond Tiling for Stencil Computations with Chapel parallel iterators
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 197–206https://doi.org/10.1145/2751205.2751226

Stencil computations figure prominently in the core kernels of many scientific computations, such as partial differential equation solvers. Parallel scaling of stencil computations can be significantly improved on multicore processors using advanced ...
26
573
Metrics
Total Citations26
Total Downloads573
Last 12 Months48
Last 6 weeks5
View online with eReader
PDF
research-article
Public Access
June 2015
PaCMap: Topology Mapping of Unstructured Communication Patterns onto Non-contiguous Allocations
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 37–46https://doi.org/10.1145/2751205.2751225

In high performance computing (HPC), applications usually have many parallel tasks running on multiple machine nodes. As these tasks intensively communicate with each other, the communication overhead has a significant impact on an application's ...
21
480
Metrics
Total Citations21
Total Downloads480
Last 12 Months118
Last 6 weeks11
View online with eReader
PDF
research-article
June 2015
Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations
- Maciej Besta,
- Torsten Hoefler
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 155–164https://doi.org/10.1145/2751205.2751219

Remote memory access (RMA) is an emerging high-performance programming model that uses RDMA hardware directly. Yet, accessing remote memories cannot invoke activities at the target which complicates implementation and limits performance of data-centric ...
12
245
Metrics
Total Citations12
Total Downloads245
Last 12 Months12
Last 6 weeks1
Get Access
research-article
June 2015
Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 3–13https://doi.org/10.1145/2751205.2751218

In this paper we present AMGE, a programming framework and runtime system that transparently decomposes GPU kernels and executes them on multiple GPUs in parallel. AMGE exploits the remote memory access capability in modern GPUs to ensure that data can ...
19
498
Metrics
Total Citations19
Total Downloads498
Last 12 Months35
Last 6 weeks1
Get Access
research-article
June 2015
FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 187–196https://doi.org/10.1145/2751205.2751214

Stencil computations comprise an important class of kernels in many scientific computing applications. As the diversity of both architectures and programming models grow, autotuning is emerging as a critical strategy for achieving portable performance ...
21
487
Metrics
Total Citations21
Total Downloads487
Last 12 Months7
Last 6 weeks0
Get Access
research-article
Public Access
June 2015
SemCache++: Semantics-Aware Caching for Efficient Multi-GPU Offloading
- Nabeel Al-Saber,
- Milind Kulkarni
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPages 79–88https://doi.org/10.1145/2751205.2751210

Offloading computations to multiple GPUs is not an easy task. It requires decomposing data, distributing computations and handling communication manually. GPU drop-in libraries (which require no program rewrite) have made it easy to offload computations ...
6
337
Metrics
Total Citations6
Total Downloads337
Last 12 Months49
Last 6 weeks13
View online with eReader
PDF
invited-talk
June 2015
Streaming Task Parallelism
- Albert Cohen
ICS '15: Proceedings of the 29th ACM on International Conference on SupercomputingPage 1https://doi.org/10.1145/2751205.2751208

Stream computing is often associated with regular, data-intensive applications, and more specifically with the family of cyclo-static data-flow models. The term also refers to bulk-synchronous data parallelism on SIMD architectures. Both interpretations ...
0
195
Metrics
Total Citations0
Total Downloads195
Last 12 Months1
Last 6 weeks0
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

A Nested Partitioning Algorithm for Adaptive Meshes on Heterogeneous Clusters

Automatic Energy Efficient Parallelization of Uniform Dependence Computations

PeerWave: Exploiting Wavefront Parallelism on GPUs with Peer-SM Synchronization

Composing Algorithmic Skeletons to Express High-Performance Scientific Applications

DaCache: Memory Divergence-Aware GPU Cache Management

Unique Worker model for OpenMP

Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures

STAPL-RTS: An Application Driven Runtime System

Fine-Grained Synchronizations and Dataflow Programming on GPUs

Parameterized Diamond Tiling for Stencil Computations with Chapel parallel iterators

PaCMap: Topology Mapping of Unstructured Communication Patterns onto Non-contiguous Allocations

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations

Automatic Parallelization of Kernels in Shared-Memory Multi-GPU Nodes

FAST: A Fast Stencil Autotuning Framework Based On An Optimal-solution Space Model

SemCache++: Semantics-Aware Caching for Efficient Multi-GPU Offloading

Streaming Task Parallelism