Keyword: OpenMP : Search

research-article

Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP

Journal of Parallel and Distributed Computing (JPDC), Volume 195, Issue Chttps://doi.org/10.1016/j.jpdc.2024.104977

Abstract

Fortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. ...

Graphical abstract

Highlights

Intel Coarray Fortran with Nvidia CUDA Fortran and OpenMP allows parallel computing without extensive code redesign.
Coarray Fortran offers comparable performance to the Message Passing Interface (MPI) for distributed memory ...

research-article

INLA $^{+}$ : approximate Bayesian inference for non-sparse models using HPC

Statistics and Computing (KLU-STCO), Volume 35, Issue 1https://doi.org/10.1007/s11222-024-10545-y

Abstract

The integrated nested Laplace approximations (INLA) method has become a widely utilized tool for researchers and practitioners seeking to perform approximate Bayesian inference across various fields of application. To address the growing demand ...

research-article

StarPlat: A versatile DSL for graph analytics

Journal of Parallel and Distributed Computing (JPDC), Volume 194, Issue Chttps://doi.org/10.1016/j.jpdc.2024.104967

Abstract

Graphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, ...

Highlights

Domain-specific language for graph algorithms.
Targets multiple backends (CPU, GPU, distributed systems).
Performance close to hand-tuned codes.

research-article

Open Access

A NUMA-Aware Version of an Adaptive Self-Scheduling Loop Scheduler

ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 4Article No.: 75, Pages 1–22https://doi.org/10.1145/3680549

Parallelizing code in a shared-memory environment is commonly done utilizing loop scheduling (LS) in a fork-join manner as in OpenMP. This manner of parallelization is popular due to its ease to code, but the choice of the LS method is important when the ...

research-article

Static Generation of Efficient OpenMP Offload Data Mappings

SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 35, Pages 1–15https://doi.org/10.1109/SC41406.2024.00041

Increasing heterogeneity in HPC architectures and compiler advancements have led to OpenMP being frequently used to enable computations on heterogeneous devices. However, the efficient movement of data on heterogeneous computing platforms is crucial for ...

research-article

Open Access

(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms

Ari Rasch

ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 46, Issue 3Article No.: 10, Pages 1–74https://doi.org/10.1145/3665643

Data-parallel computations, such as linear algebra routines and stencil computations, constitute one of the most relevant classes in parallel computing, e.g., due to their importance for deep learning. Efficiently de-composing such computations for the ...

research-article

Parallelized plastic coupling of non-ordinary state-based peridynamics and finite element method

Advances in Engineering Software (ADES), Volume 196, Issue Chttps://doi.org/10.1016/j.advengsoft.2024.103718

Abstract

Parallel computing is essential for enhancing computational efficiency and advancing computational mechanics. To reduce the computational cost, peridynamics, a nonlocal numerical method, has been coupled with the finite element method (FEM). ...

Highlights

A plastic coupling of the finite element method and peridynamics is developed.
The plastic coupling model is parallelized using OpenMP.
A framework for the coupled code implementation using OpenMP is described.
The coupling ...

Article

Evaluation of Directive-Based Programming Models for Stencil Computation on Current GPGPU Architectures

Advancing OpenMP for Future AcceleratorsPages 126–140https://doi.org/10.1007/978-3-031-72567-8_9

Abstract

Stencil calculations are a widely-used computing pattern, and tracking the performance of such computing pattern on modern GPGPUs is of interest to the computational community. In this document we focus on how directive-based programming models ...

Article

CI/CD Efforts for Validation, Verification and Benchmarking OpenMP Implementations

Advancing OpenMP for Future AcceleratorsPages 111–125https://doi.org/10.1007/978-3-031-72567-8_8

Abstract

Software developers must adapt to keep up with the changing capabilities of platforms so that they can utilize the power of High-Performance Computers (HPC), including exascale systems. OpenMP, a directive-based parallel programming model, allows ...

Article

Survey of OpenMP Practice in General Open Source Software

Advancing OpenMP for Future AcceleratorsPages 97–110https://doi.org/10.1007/978-3-031-72567-8_7

Abstract

OpenMP, a widely adopted standard for shared memory parallel programming, is known for its simplicity and portability, making it accessible to programmers across various domains, not just HPC experts. This study aims at providing an overview of ...

Article

Multilayer Multipurpose Caches for OpenMP Target Regions on FPGAs

Advancing OpenMP for Future AcceleratorsPages 79–93https://doi.org/10.1007/978-3-031-72567-8_6

Abstract

Multipurpose caches can improve the throughput between the FPGA’s memory and the hardware that is generated when offloading OpenMP target regions. We discuss and evaluate the weaknesses (and also advantages) of different cacheing techniques in ...

Article

Towards a Scalable and Efficient PGAS-Based Distributed OpenMP

Advancing OpenMP for Future AcceleratorsPages 64–78https://doi.org/10.1007/978-3-031-72567-8_5

Abstract

MPI+X has been the de facto standard for distributed memory parallel programming. It is widely used primarily as an explicit two-sided communication model, which often leads to complex and error-prone code. Alternatively, PGAS model utilizes ...

Article

Open Access

Event-Based OpenMP Tasks for Time-Sensitive GPU-Accelerated Systems

Advancing OpenMP for Future AcceleratorsPages 31–45https://doi.org/10.1007/978-3-031-72567-8_3

Abstract

The throughput-centric design of GPUs poses challenges when integrating them into time-sensitive applications. Nevertheless, modern GPU architectures and software have recently evolved, making it possible to minimize overheads and interference ...

Article

Detrimental Task Execution Patterns in Mainstream OpenMP^® Runtimes

Advancing OpenMP for Future AcceleratorsPages 210–224https://doi.org/10.1007/978-3-031-72567-8_14

Abstract

The OpenMP^® API offers both task-based and data-parallel concepts to scientific computing. While it provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how to implement its annotations. As the ...

Article

Automatic Parallelization and OpenMP Offloading of Fortran Array Notation

Advancing OpenMP for Future AcceleratorsPages 197–209https://doi.org/10.1007/978-3-031-72567-8_13

Abstract

The Fortran programming language is prevalent in the scientific computing community with a wealth of existing software written in it. It is still being developed with the latest standard released in 2023. However, due to its long history, many old ...

Article

Developing an Interactive OpenMP Programming Book with Large Language Models

Advancing OpenMP for Future AcceleratorsPages 176–194https://doi.org/10.1007/978-3-031-72567-8_12

Abstract

This paper presents an approach to authoring a textbook titled Interactive OpenMP Programming with the assistance of Large Language Models (LLMs). The writing process utilized state-of-the-art LLMs, including Gemini Pro 1.5, Claude 3, and ChatGPT-...

Article

Finding Equivalent OpenMP Fortran and C/C++ Code Snippets Using Large Language Models

Advancing OpenMP for Future AcceleratorsPages 143–160https://doi.org/10.1007/978-3-031-72567-8_10

Abstract

This paper investigates the feasibility of using Large Language Models (LLMs) to identify semantically equivalent code snippets across different programming languages. Motivated by the need for cross-language translation datasets between OpenMP ...

research-article

An automated OpenMP mutation testing framework for performance optimization

Parallel Computing (PACO), Volume 121, Issue Chttps://doi.org/10.1016/j.parco.2024.103097

Abstract

Performance optimization continues to be a challenge in modern HPC software. Existing performance optimization techniques, including profiling-based and auto-tuning techniques, fail to indicate program modifications at the source level thus ...

research-article

Clacc: OpenACC for C/C++ in Clang

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 5Pages 427–446https://doi.org/10.1177/10943420241261976

The Clacc project has developed OpenACC compiler, runtime, and profiling interface support for C/C++ by extending Clang and LLVM. A key Clacc design feature is that it translates OpenACC to OpenMP to leverage the OpenMP offloading support that is ...

Article

OMPGPT: A Generative Pre-trained Transformer Model for OpenMP

Euro-Par 2024: Parallel ProcessingPages 121–134https://doi.org/10.1007/978-3-031-69577-3_9

Abstract

Large language models (LLMs)such as ChatGPT have significantly advanced the field of Natural Language Processing (NLP). This trend led to the development of code-based large language models such as StarCoder, WizardCoder, and CodeLlama, which are ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

INLA+: approximate Bayesian inference for non-sparse models using HPC

Upcoming Conferences

INLA $^{+}$ : approximate Bayesian inference for non-sparse models using HPC