No abstract available.
Proceeding Downloads
Runtime aware architectures
In the last years the traditional ways to keep the increase of hardware performance to the rate predicted by the Moore's Law vanished. When uni-cores were the norm, hardware design was decoupled from the software stack thanks to a well defined ...
GPU centric extensions for parallel strongly connected components computation
Finding Strongly Connected Components (SCC) of a directed graph is a fundamental graph problem. Many of the state-of-the-art sequential algorithms use depth-first search (DFS) to find SCCs. Since, in general DFS is hard to parallelize, researchers rely ...
General-purpose join algorithms for large graph triangle listing on heterogeneous systems
We investigate applying general-purpose join algorithms to the triangle listing problem on heterogeneous systems that feature a multi-core CPU and multiple GPUs. In particular, we consider an out-of-core context where graph data are available on ...
Performance portable GPU code generation for matrix multiplication
Parallel accelerators such as GPUs are notoriously hard to program; exploiting their full performance potential is a job best left for ninja programmers. High-level programming languages coupled with optimizing compilers have been proposed to attempt to ...
Multi-stage programming for GPUs in C++ using PACXX
Writing and optimizing programs for high performance on systems with Graphics Processing Units (GPUs) remains a challenging task even for expert programmers. A promising optimization technique is multi-stage programming -- evaluating parts of the ...
Simplifying programming and load balancing of data parallel applications on heterogeneous systems
Heterogeneous architectures have experienced a great development thanks to their excellent cost/performance ratio and low power consumption. But heterogeneity significantly complicates both programming and efficient use of the resources. As a result, ...
Working together to build the heterogeneous processing ecosystem
We can now say that almost all future performance improvements will come from heterogeneous acceleration. But the reality of building successful software and platforms is that no one company or individual can create everything. That means we need to ...
Implementing directed acyclic graphs with the heterogeneous system architecture
Achieving optimal performance on heterogeneous computing systems requires a programming model that supports the execution of asynchronous, multi-stream, and out-of-order tasks in a shared memory environment. Asynchronous dependency-driven tasking is one ...
GPUpIO: the case for I/O-driven preemption on GPUs
As GPUs become general purpose, they are outgrowing the coprocessor model and require convenient I/O abstractions such as files and network sockets. Recent studies have shown the benefits of native GPU I/O layers, in terms of both programmability and ...
A systems perspective on GPU computing: a tribute to Karsten Schwan
Over a distinguished career, Regents Professor Karsten Schwan has made significant contributions across a diverse array of topics in computer systems, including operating systems for multi-core platforms, virtualization technologies, enterprise ...
Designing high performance communication runtime for GPU managed memory: early experiences
Graphics Processing Units (GPUs) have gained the position of a main stream accelerator due to its low power footprint and massive parallelism. CUDA 6.0 onward, NVIDIA has introduced the Managed Memory capability which unifies the host and device memory ...
Effective resource management for enhancing performance of 2D and 3D stencils on GPUs
GPUs are an attractive target for data parallel stencil computations prevalent in scientific computing and image processing applications. Many tiling schemes, such as overlapped tiling and split tiling, have been proposed in past to improve the ...
Index Terms
- Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit