General programming languages

research-article

Automatic analysis of speedup of MPI applications

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 349–358https://doi.org/10.1145/1375527.1375578

The intricacy of high performance computing applications has been growing very

fast in the last years. Only skilled analysts are able to determine the factors that are undermining the performance of up-to-date applications. Analyst time is a very ...

research-article

Efficient computation of sum-products on GPUs through software-managed cache

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 309–318https://doi.org/10.1145/1375527.1375572

We present a technique for designing memory-bound algorithms with high data reuse on Graphics Processing Units (GPUs) equipped with close-to-ALU software-managed memory. The approach is based on the efficient use of this memory through the ...

research-article

Phasers: a unified deadlock-free construct for collective and point-to-point synchronization

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 277–288https://doi.org/10.1145/1375527.1375568

Coordination and synchronization of parallel tasks is a major source of complexity in parallel programming. These constructs take many forms in practice including mutual exclusion in accesses to shared resources, termination detection of child tasks, ...

research-article

Performance portable optimizations for loops containing communication operations

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 266–276https://doi.org/10.1145/1375527.1375567

Effective use of communication networks is critical to the performance and scalability of parallel applications. Partitioned Global Address Space languages like UPC bring the promise of performance and programmer productivity. Studies of well-tuned ...

research-article

Three-dimensional delaunay refinement for multi-core processors

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 214–224https://doi.org/10.1145/1375527.1375560

We develop the first ever fully functional three-dimensional guaranteed quality parallel graded Delaunay mesh generator. First, we prove a criterion and a sufficient condition of Delaunay-independence of Steiner points in three dimensions. Based on ...

research-article

Fast scan algorithms on graphics processors

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 205–213https://doi.org/10.1145/1375527.1375559

Scan and segmented scan are important data-parallel primitives for a wide range of applications. We present fast, work-efficient algorithms for these primitives on graphics processing units (GPUs). We use novel data representations that map well to the ...

research-article

Advanced collective communication in aspen

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 83–93https://doi.org/10.1145/1375527.1375543

Aspen is a programming language that relies on high-level messaging to support communication among different program tasks executing in parallel. Unlike MPI, the computational logic of Aspen tasks is specified and developed independently of the global ...

research-article

Preserving time in large-scale communication traces

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 46–55https://doi.org/10.1145/1375527.1375537

Analyzing the performance of large-scale scientific applications is becoming increasingly difficult due to the sheer size of performance data gathered. Recent work on scalable communication tracing applies online interprocess compression to address this ...

research-article

Implementing Wilson-Dirac operator on the cell broadband engine

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPages 4–14https://doi.org/10.1145/1375527.1375532

Computing the actions of Wilson-Dirac operator contributes most of the CPU time for the grand challenge problem of simulating Lattice Quantum Chromodynamics (Lattice QCD). This routine exhibits many challenges in implementation on most computational ...

keynote

Many-core GPU computing with NVIDIA CUDA

Mark Harris

ICS '08: Proceedings of the 22nd annual international conference on SupercomputingPage 1https://doi.org/10.1145/1375527.1375528

In the past, graphics processors were special-purpose hardwired application accelerators, suitable only for conventional graphics applications. Modern GPUs are fully programmable, massively parallel floating point processors. In this talk I will ...

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Automatic analysis of speedup of MPI applications

Efficient computation of sum-products on GPUs through software-managed cache

Phasers: a unified deadlock-free construct for collective and point-to-point synchronization

Performance portable optimizations for loops containing communication operations

Three-dimensional delaunay refinement for multi-core processors

Fast scan algorithms on graphics processors

Advanced collective communication in aspen

Preserving time in large-scale communication traces

Implementing Wilson-Dirac operator on the cell broadband engine

Many-core GPU computing with NVIDIA CUDA

Applied Filters

People

Names

Institutions

Authors

Publications

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Publication Date

Save to Binder