Using Task-Based Parallelism Directly on the GPU for Automated Asynchronous Data Transfer

Chalk, Aidan B.G.; Gonnet, Pedro; Schaller, Matthieu

doi:10.3233/978-1-61499-621-7-683

Abstract

We present a framework, based on the QuickSched [1] library, that implements priority-aware task-based parallelism directly on CUDA GPUs. This allows large computations with complex data dependencies to be executed in a single GPU kernel call, removing any synchronization points that might otherwise be required between kernel calls. Using this paradigm, data transfers to and from the GPU are modelled as load and unload tasks. These tasks are automatically generated and executed alongside the rest of the computational tasks, allowing fully asynchronous and concurrent data transfers. We implemented a tiled-QR decomposition, and a Barnes-Hut gravity calculation, both of which show significant improvement when utilising the task-based setup, effectively eliminating any latencies due to data transfers between the GPU and the CPU. This shows that task-based parallelism is a valid alternative programming paradigm on GPUs, and can provide significant gains from both a data transfer and ease-of-use perspective.

Contact

IOS Press Copyright 2025

Contact

IOS Press Copyright 2025

This website uses cookies

This website uses cookies