skip to main content
10.1145/2884045acmotherconferencesBook PagePublication PagesgpgpuConference Proceedingsconference-collections
GPGPU '16: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit
ACM2016 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
PPoPP '16: 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming Barcelona Spain 12 March 2016
ISBN:
978-1-4503-4195-0
Published:
12 March 2016
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 01 Jan 2025Bibliometrics
Abstract

No abstract available.

Skip Table Of Content Section
SESSION: Algorithm
abstract
Runtime aware architectures

In the last years the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore's Law vanished. When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined ...

research-article
GPU centric extensions for parallel strongly connected components computation

Finding Strongly Connected Components (SCC) of a directed graph is a fundamental graph problem. Many of the state-of-the-art sequential algorithms use depth-first search (DFS) to find SCCs. Since, in general DFS is hard to parallelize, researchers rely ...

research-article
General-purpose join algorithms for large graph triangle listing on heterogeneous systems

We investigate applying general-purpose join algorithms to the triangle listing problem on heterogeneous systems that feature a multi-core CPU and multiple GPUs. In particular, we consider an out-of-core context where graph data are available on ...

SESSION: Heterogenous languages, extensions and runtimes
research-article
Performance portable GPU code generation for matrix multiplication

Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to ...

research-article
Multi-stage programming for GPUs in C++ using PACXX

Writing and optimizing programs for high performance on systems with Graphics Processing Units (GPUs) remains a challenging task even for expert programmers. A promising optimization technique is multi-stage programming -- evaluating parts of the ...

research-article
Simplifying programming and load balancing of data parallel applications on heterogeneous systems

Heterogeneous architectures have experienced a great development thanks to their excellent cost/performance ratio and low power consumption. But heterogeneity significantly complicates both programming and efficient use of the resources. As a result, ...

SESSION: Tasking and scheduling
abstract
Working together to build the heterogeneous processing ecosystem

We can now say that almost all future performance improvements will come from heterogeneous acceleration. But the reality of building successful software and platforms is that no one company or individual can create everything. That means we need to ...

research-article
Implementing directed acyclic graphs with the heterogeneous system architecture

Achieving optimal performance on heterogeneous computing systems requires a programming model that supports the execution of asynchronous, multi-stream, and out-of-order tasks in a shared memory environment. Asynchronous dependency-driven tasking is one ...

research-article
GPUpIO: the case for I/O-driven preemption on GPUs

As GPUs become general purpose, they are outgrowing the coprocessor model and require convenient I/O abstractions such as files and network sockets. Recent studies have shown the benefits of native GPU I/O layers, in terms of both programmability and ...

SESSION: Stencil optimization
research-article
A systems perspective on GPU computing: a tribute to Karsten Schwan

Over a distinguished career, Regents Professor Karsten Schwan has made significant contributions across a diverse array of topics in computer systems, including operating systems for multi-core platforms, virtualization technologies, enterprise ...

research-article
Public Access
Designing high performance communication runtime for GPU managed memory: early experiences

Graphics Processing Units (GPUs) have gained the position of a main stream accelerator due to its low power footprint and massive parallelism. CUDA 6.0 onward, NVIDIA has introduced the Managed Memory capability which unifies the host and device memory ...

research-article
Public Access
Effective resource management for enhancing performance of 2D and 3D stencils on GPUs

GPUs are an attractive target for data parallel stencil computations prevalent in scientific computing and image processing applications. Many tiling schemes, such as overlapped tiling and split tiling, have been proposed in past to improve the ...

Contributors
  • Northeastern University
  • University of Delaware
Index terms have been assigned to the content through auto-classification.

Recommendations

Acceptance Rates

GPGPU '16 Paper Acceptance Rate 9 of 23 submissions, 39%;
Overall Acceptance Rate 57 of 129 submissions, 44%
YearSubmittedAcceptedRate
GPGPU '2012758%
GPGPU '1915640%
GPGPU-1015853%
GPGPU '1623939%
GPGPU-7271244%
GPGPU-6371541%
Overall1295744%