Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2013
Load-balanced pipeline parallelism
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 14, Pages 1–12https://doi.org/10.1145/2503210.2503295Accelerating a single thread in current parallel systems remains a challenging problem, because sequential threads do not naturally take advantage of the additional cores. Recent work shows that automatic extraction of pipeline parallelism is an ...
- research-articleNovember 2013
Compiling affine loop nests for distributed-memory parallel architectures
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 33, Pages 1–12https://doi.org/10.1145/2503210.2503289We present new techniques for compilation of arbitrarily nested loops with affine dependences for distributed-memory parallel architectures. Our framework is implemented as a source-level transformer that uses the polyhedral model, and generates ...
- research-articleNovember 2013
A large-scale cross-architecture evaluation of thread-coarsening
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 11, Pages 1–11https://doi.org/10.1145/2503210.2503268OpenCL has become the de-facto data parallel programming model for parallel devices in today's high-performance supercomputers. OpenCL was designed with the goal of guaranteeing program portability across hardware from different vendors. However, ...
- research-articleNovember 2013
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 32, Pages 1–12https://doi.org/10.1145/2503210.2503233Ubiquitous parallel computing aims to make parallel programming accessible to a wide variety of programming areas using deterministic and scale-free programming models built on a task abstraction. However, it remains hard to reconcile these attributes ...
- research-articleNovember 2013
General transformations for GPU execution of tree traversals
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 10, Pages 1–12https://doi.org/10.1145/2503210.2503223With the advent of programmer-friendly GPU computing environments, there has been much interest in offloading workloads that can exploit the high degree of parallelism available on modern GPUs. Exploiting this parallelism and optimizing for the GPU ...