Compilers

Applied Filters

Publications

Conferences

Publication Date

5 Results for: Book/Issue: SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,823,348 records)|Limit your search to The ACM Full-Text Collection (772,531 records)

Showing 1 - 5of5 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
November 2013
Load-balanced pipeline parallelism
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 14, Pages 1–12https://doi.org/10.1145/2503210.2503295

Accelerating a single thread in current parallel systems remains a challenging problem, because sequential threads do not naturally take advantage of the additional cores. Recent work shows that automatic extraction of pipeline parallelism is an ...
8
497
Metrics
Total Citations8
Total Downloads497
Last 12 Months21
Last 6 weeks4
Get Access
research-article
November 2013
Compiling affine loop nests for distributed-memory parallel architectures
- Uday Bondhugula
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 33, Pages 1–12https://doi.org/10.1145/2503210.2503289

We present new techniques for compilation of arbitrarily nested loops with affine dependences for distributed-memory parallel architectures. Our framework is implemented as a source-level transformer that uses the polyhedral model, and generates ...
41
418
Metrics
Total Citations41
Total Downloads418
Last 12 Months29
Last 6 weeks4
Get Access
research-article
November 2013
A large-scale cross-architecture evaluation of thread-coarsening
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 11, Pages 1–11https://doi.org/10.1145/2503210.2503268

OpenCL has become the de-facto data parallel programming model for parallel devices in today's high-performance supercomputers. OpenCL was designed with the goal of guaranteeing program portability across hardware from different vendors. However, ...
74
695
Metrics
Total Citations74
Total Downloads695
Last 12 Months18
Last 6 weeks3
Get Access
research-article
November 2013
Deterministic scale-free pipeline parallelism with hyperqueues
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 32, Pages 1–12https://doi.org/10.1145/2503210.2503233

Ubiquitous parallel computing aims to make parallel programming accessible to a wide variety of programming areas using deterministic and scale-free programming models built on a task abstraction. However, it remains hard to reconcile these attributes ...
6
251
Metrics
Total Citations6
Total Downloads251
Last 12 Months8
Last 6 weeks0
Get Access
research-article
November 2013
General transformations for GPU execution of tree traversals
SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 10, Pages 1–12https://doi.org/10.1145/2503210.2503223

With the advent of programmer-friendly GPU computing environments, there has been much interest in offloading workloads that can exploit the high degree of parallelism available on modern GPUs. Exploiting this parallelism and optimizing for the GPU ...
43
535
Metrics
Total Citations43
Total Downloads535
Last 12 Months24
Last 6 weeks2
Get Access

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Load-balanced pipeline parallelism

Compiling affine loop nests for distributed-memory parallel architectures

A large-scale cross-architecture evaluation of thread-coarsening

Deterministic scale-free pipeline parallelism with hyperqueues

General transformations for GPU execution of tree traversals