Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2021
Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
ACM Transactions on Parallel Computing (TOPC), Volume 8, Issue 4Article No.: 20, Pages 1–23https://doi.org/10.1145/3470644A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This ...
- research-articleSeptember 2020
Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 27, Pages 1–26https://doi.org/10.1145/3418086Field Programmable Gate Arrays (FPGAs) are becoming an appealing technology in datacenters and High Performance Computing. High-Level Synthesis (HLS) of multi-threaded parallel programs is increasingly used to extract parallelism. Despite great leaps ...
- research-articleSeptember 2020
A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 24, Pages 1–25https://doi.org/10.1145/3418084Languages and libraries based on Partitioned Global Address Space (PGAS) programming models are convenient for exploiting scalable parallelism on large applications across different domains with irregular memory access patterns. OpenSHMEM is a PGAS-...
- research-articleDecember 2019
Extracting SIMD Parallelism from Recursive Task-Parallel Programs
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 24, Pages 1–37https://doi.org/10.1145/3365663The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to execute data-parallel computations ...
- research-articleDecember 2019
Processor-Oblivious Record and Replay
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 20, Pages 1–28https://doi.org/10.1145/3365659Record-and-replay systems are useful tools for debugging non-deterministic parallel programs by first recording an execution and then replaying that execution to produce the same access pattern. Existing record-and-replay systems generally target thread-...
- research-articleDecember 2019
Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 19, Pages 1–33https://doi.org/10.1145/3365655Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers ...
- research-articleNovember 2019
Hyperqueues: Design and Implementation of Deterministic Concurrent Queues
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 23, Pages 1–35https://doi.org/10.1145/3365660The hyperqueue is a programming abstraction for queues that results in deterministic and scale-free parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are ...
- research-articleJanuary 2019
An Autotuning Protocol to Rapidly Build Autotuners
ACM Transactions on Parallel Computing (TOPC), Volume 5, Issue 2Article No.: 9, Pages 1–25https://doi.org/10.1145/3291527Automatic performance tuning (Autotuning) is an increasingly critical tuning technique for the high portable performance of Exascale applications. However, constructing an autotuner from scratch remains a challenge, even for domain experts. In this work,...
- research-articleDecember 2018
New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code
ACM Transactions on Parallel Computing (TOPC), Volume 5, Issue 2Article No.: 7, Pages 1–42https://doi.org/10.1145/3291523We introduce “Hybrid Fortran,” a new approach that allows a high-performance GPGPU port for structured grid Fortran codes. This technique only requires minimal changes for a CPU targeted codebase, which is a significant advancement in terms of ...
- research-articleSeptember 2018
Race Detection in Two Dimensions
ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 4Article No.: 19, Pages 1–22https://doi.org/10.1145/3264618Dynamic race detection is a program analysis technique for detecting errors caused by undesired interleavings of concurrent tasks. A primary challenge when designing efficient race detection algorithms is to achieve manageable space requirements.
State-...
- research-articleApril 2018
C-Stream: A Co-routine-Based Elastic Stream Processing Engine
ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 3Article No.: 15, Pages 1–27https://doi.org/10.1145/3184120Stream processing is a computational paradigm for on-the-fly processing of live data. This paradigm lends itself to implementations that can provide high throughput and low latency by taking advantage of various forms of parallelism that are naturally ...
- editorialSeptember 2015
- research-articleJuly 2015
Supporting Time-Based QoS Requirements in Software Transactional Memory
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 2Article No.: 10, Pages 1–30https://doi.org/10.1145/2779621Software transactional memory (STM) is an optimistic concurrency control mechanism that simplifies parallel programming. However, there has been little interest in its applicability to reactive applications in which there is a required response time for ...
- research-articleJune 2015
Remote Memory Access Programming in MPI-3
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 2Article No.: 9, Pages 1–26https://doi.org/10.1145/2780584The Message Passing Interface (MPI) 3.0 standard, introduced in September 2012, includes a significant update to the one-sided communication interface, also known as remote memory access (RMA). In particular, the interface has been extended to better ...
- research-articleFebruary 2015
SciPAL: Expression Templates and Composition Closure Objects for High Performance Computational Physics with CUDA and OpenMP
ACM Transactions on Parallel Computing (TOPC), Volume 1, Issue 2Article No.: 15, Pages 1–31https://doi.org/10.1145/2686886We present SciPAL (scientific parallel algorithms library), a C++-based, hardware-independent open-source library. Its core is a domain-specific embedded language for numerical linear algebra. The main fields of application are finite element ...
- research-articleOctober 2014
A methodology for automatic generation of executable communication specifications from parallel MPI applications
ACM Transactions on Parallel Computing (TOPC), Volume 1, Issue 1Article No.: 6, Pages 1–30https://doi.org/10.1145/2660249Portable parallel benchmarks are widely used for performance evaluation of HPC systems. However, because these are manually produced, they generally represent a greatly simplified view of application behavior, missing the subtle but important-to-...