Parallel programming languages

Applied Filters

People

Publications

Publication Date

Past 5 years

Searched The ACM Guide to Computing Literature (3,823,142 records)|Limit your search to The ACM Full-Text Collection (772,325 records)

Showing 1 - 16of16 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
October 2021
Pointer-Based Divergence Analysis for OpenCL 2.0 Programs
ACM Transactions on Parallel Computing (TOPC), Volume 8, Issue 4Article No.: 20, Pages 1–23https://doi.org/10.1145/3470644
A modern GPU is designed with many large thread groups to achieve a high throughput and performance. Within these groups, the threads are grouped into fixed-size SIMD batches in which the same instruction is applied to vectors of data in a lockstep. This ...
6
314
Metrics
Total Citations6
Total Downloads314
Last 12 Months54
Last 6 weeks6
Get Access
research-article
September 2020
Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications
- Pietro Fezzardi,
- Fabrizio Ferrandi
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 27, Pages 1–26https://doi.org/10.1145/3418086

Field Programmable Gate Arrays (FPGAs) are becoming an appealing technology in datacenters and High Performance Computing. High-Level Synthesis (HLS) of multi-threaded parallel programs is increasingly used to extract parallelism. Despite great leaps ...
2
135
Metrics
Total Citations2
Total Downloads135
Last 12 Months15
Last 6 weeks1
Get Access
research-article
September 2020
A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays
ACM Transactions on Parallel Computing (TOPC), Volume 7, Issue 4Article No.: 24, Pages 1–25https://doi.org/10.1145/3418084

Languages and libraries based on Partitioned Global Address Space (PGAS) programming models are convenient for exploiting scalable parallelism on large applications across different domains with irregular memory access patterns. OpenSHMEM is a PGAS-...
0
76
Metrics
Total Citations0
Total Downloads76
Last 12 Months8
Last 6 weeks1
Get Access
research-article
Public Access
December 2019
Extracting SIMD Parallelism from Recursive Task-Parallel Programs
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 24, Pages 1–37https://doi.org/10.1145/3365663

The pursuit of computational efficiency has led to the proliferation of throughput-oriented hardware, from GPUs to increasingly wide vector units on commodity processors and accelerators. This hardware is designed to execute data-parallel computations ...
2
701
Metrics
Total Citations2
Total Downloads701
Last 12 Months128
Last 6 weeks17
View online with eReader
PDF
research-article
Public Access
December 2019
Processor-Oblivious Record and Replay
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 20, Pages 1–28https://doi.org/10.1145/3365659

Record-and-replay systems are useful tools for debugging non-deterministic parallel programs by first recording an execution and then replaying that execution to produce the same access pattern. Existing record-and-replay systems generally target thread-...
0
427
Metrics
Total Citations0
Total Downloads427
Last 12 Months203
Last 6 weeks24
View online with eReader
View this article in HTML format
PDF
research-article
Public Access
December 2019
Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 19, Pages 1–33https://doi.org/10.1145/3365655

Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers ...
11
1,260
Metrics
Total Citations11
Total Downloads1,260
Last 12 Months351
Last 6 weeks32
View online with eReader
View this article in HTML format
PDF
research-article
November 2019
Hyperqueues: Design and Implementation of Deterministic Concurrent Queues
- Hans Vandierendonck,
- Dimitrios S. Nikolopoulos
ACM Transactions on Parallel Computing (TOPC), Volume 6, Issue 4Article No.: 23, Pages 1–35https://doi.org/10.1145/3365660

The hyperqueue is a programming abstraction for queues that results in deterministic and scale-free parallel programs. Hyperqueues extend the concept of Cilk++ hyperobjects to provide thread-local views on a shared data structure. While hyperobjects are ...
0
213
Metrics
Total Citations0
Total Downloads213
Last 12 Months5
Last 6 weeks1
Get Access
research-article
January 2019
An Autotuning Protocol to Rapidly Build Autotuners
ACM Transactions on Parallel Computing (TOPC), Volume 5, Issue 2Article No.: 9, Pages 1–25https://doi.org/10.1145/3291527

Automatic performance tuning (Autotuning) is an increasingly critical tuning technique for the high portable performance of Exascale applications. However, constructing an autotuner from scratch remains a challenge, even for domain experts. In this work,...
3
188
Metrics
Total Citations3
Total Downloads188
Last 12 Months9
Last 6 weeks3
Get Access
research-article
December 2018
New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code
- Michel Müller,
- Takayuki Aoki
ACM Transactions on Parallel Computing (TOPC), Volume 5, Issue 2Article No.: 7, Pages 1–42https://doi.org/10.1145/3291523

We introduce “Hybrid Fortran,” a new approach that allows a high-performance GPGPU port for structured grid Fortran codes. This technique only requires minimal changes for a CPU targeted codebase, which is a significant advancement in terms of ...
4
146
Metrics
Total Citations4
Total Downloads146
Last 12 Months8
Last 6 weeks1
Get Access
research-article
September 2018
Race Detection in Two Dimensions
ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 4Article No.: 19, Pages 1–22https://doi.org/10.1145/3264618

Dynamic race detection is a program analysis technique for detecting errors caused by undesired interleavings of concurrent tasks. A primary challenge when designing efficient race detection algorithms is to achieve manageable space requirements.

State-...
0
122
Metrics
Total Citations0
Total Downloads122
Last 12 Months6
Last 6 weeks1
Get Access
research-article
April 2018
C-Stream: A Co-routine-Based Elastic Stream Processing Engine
- Semih Şahin,
- Buğra Gedik
ACM Transactions on Parallel Computing (TOPC), Volume 4, Issue 3Article No.: 15, Pages 1–27https://doi.org/10.1145/3184120

Stream processing is a computational paradigm for on-the-fly processing of live data. This paradigm lends itself to implementations that can provide high throughput and low latency by taking advantage of various forms of parallelism that are naturally ...
4
219
Metrics
Total Citations4
Total Downloads219
Last 12 Months9
Last 6 weeks1
Get Access
editorial
Free
September 2015
Introduction to the Special Issue on SPAA 2013
- Michael Dinitz,
- Torsten Hoefler
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 3Article No.: 14e, Pages 1–2https://doi.org/10.1145/2809923
0
130
Metrics
Total Citations0
Total Downloads130
Last 12 Months27
Last 6 weeks5
View online with eReader
PDF
research-article
July 2015
Supporting Time-Based QoS Requirements in Software Transactional Memory
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 2Article No.: 10, Pages 1–30https://doi.org/10.1145/2779621

Software transactional memory (STM) is an optimistic concurrency control mechanism that simplifies parallel programming. However, there has been little interest in its applicability to reactive applications in which there is a required response time for ...
1
165
Metrics
Total Citations1
Total Downloads165
Last 12 Months2
Last 6 weeks1
Get Access
research-article
June 2015
Remote Memory Access Programming in MPI-3
ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 2Article No.: 9, Pages 1–26https://doi.org/10.1145/2780584

The Message Passing Interface (MPI) 3.0 standard, introduced in September 2012, includes a significant update to the one-sided communication interface, also known as remote memory access (RMA). In particular, the interface has been extended to better ...
73
770
Metrics
Total Citations73
Total Downloads770
Last 12 Months47
Last 6 weeks7
Get Access
research-article
February 2015
SciPAL: Expression Templates and Composition Closure Objects for High Performance Computational Physics with CUDA and OpenMP
- Stephan C. Kramer,
- Johannes Hagemann
ACM Transactions on Parallel Computing (TOPC), Volume 1, Issue 2Article No.: 15, Pages 1–31https://doi.org/10.1145/2686886

We present SciPAL (scientific parallel algorithms library), a C++-based, hardware-independent open-source library. Its core is a domain-specific embedded language for numerical linear algebra. The main fields of application are finite element ...
3
309
Metrics
Total Citations3
Total Downloads309
Last 12 Months4
Last 6 weeks2
Get Access
research-article
Public Access
October 2014
A methodology for automatic generation of executable communication specifications from parallel MPI applications
ACM Transactions on Parallel Computing (TOPC), Volume 1, Issue 1Article No.: 6, Pages 1–30https://doi.org/10.1145/2660249

Portable parallel benchmarks are widely used for performance evaluation of HPC systems. However, because these are manually produced, they generally represent a greatly simplified view of application behavior, missing the subtle but important-to-...
2
472
Metrics
Total Citations2
Total Downloads472
Last 12 Months41
Last 6 weeks9
1
Supplementary Material
a6-wu_appendix.pdf
View online with eReader
PDF

Applied Filters

People

Names

Institutions

Authors

Editors

Reviewers

Publications

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Publication Date

Pointer-Based Divergence Analysis for OpenCL 2.0 Programs

Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications

A Modern Fortran Interface in OpenSHMEM Need for Interoperability with Parallel Fortran Using Coarrays

Extracting SIMD Parallelism from Recursive Task-Parallel Programs

Processor-Oblivious Record and Replay

Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation

Hyperqueues: Design and Implementation of Deterministic Concurrent Queues

An Autotuning Protocol to Rapidly Build Autotuners

New High Performance GPGPU Code Transformation Framework Applied to Large Production Weather Prediction Code

Race Detection in Two Dimensions

C-Stream: A Co-routine-Based Elastic Stream Processing Engine

Introduction to the Special Issue on SPAA 2013

Supporting Time-Based QoS Requirements in Software Transactional Memory

Remote Memory Access Programming in MPI-3

SciPAL: Expression Templates and Composition Closure Objects for High Performance Computational Physics with CUDA and OpenMP

A methodology for automatic generation of executable communication specifications from parallel MPI applications