Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJanuary 2025
Accelerating Fortran codes: A method for integrating Coarray Fortran with CUDA Fortran and OpenMP
Journal of Parallel and Distributed Computing (JPDC), Volume 195, Issue Chttps://doi.org/10.1016/j.jpdc.2024.104977AbstractFortran's prominence in scientific computing requires strategies to ensure both that legacy codes are efficient on high-performance computing systems, and that the language remains attractive for the development of new high-performance codes. ...
Graphical abstract Highlights- Intel Coarray Fortran with Nvidia CUDA Fortran and OpenMP allows parallel computing without extensive code redesign.
- Coarray Fortran offers comparable performance to the Message Passing Interface (MPI) for distributed memory ...
- research-articleDecember 2024
INLA: approximate Bayesian inference for non-sparse models using HPC
- research-articleJanuary 2025
StarPlat: A versatile DSL for graph analytics
Journal of Parallel and Distributed Computing (JPDC), Volume 194, Issue Chttps://doi.org/10.1016/j.jpdc.2024.104967AbstractGraphs model several real-world phenomena. With the growth of unstructured and semi-structured data, parallelization of graph algorithms is inevitable. Unfortunately, due to inherent irregularity of computation, memory access, and communication, ...
Highlights- Domain-specific language for graph algorithms.
- Targets multiple backends (CPU, GPU, distributed systems).
- Performance close to hand-tuned codes.
- research-articleNovember 2024
A NUMA-Aware Version of an Adaptive Self-Scheduling Loop Scheduler
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 4Article No.: 75, Pages 1–22https://doi.org/10.1145/3680549Parallelizing code in a shared-memory environment is commonly done utilizing loop scheduling (LS) in a fork-join manner as in OpenMP. This manner of parallelization is popular due to its ease to code, but the choice of the LS method is important when the ...
- research-articleNovember 2024
Static Generation of Efficient OpenMP Offload Data Mappings
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 35, Pages 1–15https://doi.org/10.1109/SC41406.2024.00041Increasing heterogeneity in HPC architectures and compiler advancements have led to OpenMP being frequently used to enable computations on heterogeneous devices. However, the efficient movement of data on heterogeneous computing platforms is crucial for ...
-
- research-articleOctober 2024
(De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms
ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 46, Issue 3Article No.: 10, Pages 1–74https://doi.org/10.1145/3665643Data-parallel computations, such as linear algebra routines and stencil computations, constitute one of the most relevant classes in parallel computing, e.g., due to their importance for deep learning. Efficiently de-composing such computations for the ...
- research-articleOctober 2024
Parallelized plastic coupling of non-ordinary state-based peridynamics and finite element method
Advances in Engineering Software (ADES), Volume 196, Issue Chttps://doi.org/10.1016/j.advengsoft.2024.103718AbstractParallel computing is essential for enhancing computational efficiency and advancing computational mechanics. To reduce the computational cost, peridynamics, a nonlocal numerical method, has been coupled with the finite element method (FEM). ...
Highlights- A plastic coupling of the finite element method and peridynamics is developed.
- The plastic coupling model is parallelized using OpenMP.
- A framework for the coupled code implementation using OpenMP is described.
- The coupling ...
- ArticleSeptember 2024
Evaluation of Directive-Based Programming Models for Stencil Computation on Current GPGPU Architectures
Advancing OpenMP for Future AcceleratorsPages 126–140https://doi.org/10.1007/978-3-031-72567-8_9AbstractStencil calculations are a widely-used computing pattern, and tracking the performance of such computing pattern on modern GPGPUs is of interest to the computational community. In this document we focus on how directive-based programming models ...
- ArticleSeptember 2024
CI/CD Efforts for Validation, Verification and Benchmarking OpenMP Implementations
- Aaron Jarmusch,
- Felipe Cabarcas,
- Swaroop Pophale,
- Andrew Kallai,
- Johannes Doerfert,
- Luke Peyralans,
- Seyong Lee,
- Joel Denny,
- Sunita Chandrasekaran
Advancing OpenMP for Future AcceleratorsPages 111–125https://doi.org/10.1007/978-3-031-72567-8_8AbstractSoftware developers must adapt to keep up with the changing capabilities of platforms so that they can utilize the power of High-Performance Computers (HPC), including exascale systems. OpenMP, a directive-based parallel programming model, allows ...
- ArticleSeptember 2024
Multilayer Multipurpose Caches for OpenMP Target Regions on FPGAs
Advancing OpenMP for Future AcceleratorsPages 79–93https://doi.org/10.1007/978-3-031-72567-8_6AbstractMultipurpose caches can improve the throughput between the FPGA’s memory and the hardware that is generated when offloading OpenMP target regions. We discuss and evaluate the weaknesses (and also advantages) of different cacheing techniques in ...
- ArticleSeptember 2024
Detrimental Task Execution Patterns in Mainstream OpenMP® Runtimes
Advancing OpenMP for Future AcceleratorsPages 210–224https://doi.org/10.1007/978-3-031-72567-8_14AbstractThe OpenMP® API offers both task-based and data-parallel concepts to scientific computing. While it provides descriptive and prescriptive annotations, it is in many places deliberately unspecific how to implement its annotations. As the ...
- ArticleSeptember 2024
Automatic Parallelization and OpenMP Offloading of Fortran Array Notation
Advancing OpenMP for Future AcceleratorsPages 197–209https://doi.org/10.1007/978-3-031-72567-8_13AbstractThe Fortran programming language is prevalent in the scientific computing community with a wealth of existing software written in it. It is still being developed with the latest standard released in 2023. However, due to its long history, many old ...
- ArticleSeptember 2024
Developing an Interactive OpenMP Programming Book with Large Language Models
Advancing OpenMP for Future AcceleratorsPages 176–194https://doi.org/10.1007/978-3-031-72567-8_12AbstractThis paper presents an approach to authoring a textbook titled Interactive OpenMP Programming with the assistance of Large Language Models (LLMs). The writing process utilized state-of-the-art LLMs, including Gemini Pro 1.5, Claude 3, and ChatGPT-...
- ArticleSeptember 2024
Finding Equivalent OpenMP Fortran and C/C++ Code Snippets Using Large Language Models
Advancing OpenMP for Future AcceleratorsPages 143–160https://doi.org/10.1007/978-3-031-72567-8_10AbstractThis paper investigates the feasibility of using Large Language Models (LLMs) to identify semantically equivalent code snippets across different programming languages. Motivated by the need for cross-language translation datasets between OpenMP ...
- research-articleNovember 2024
An automated OpenMP mutation testing framework for performance optimization
Parallel Computing (PACO), Volume 121, Issue Chttps://doi.org/10.1016/j.parco.2024.103097AbstractPerformance optimization continues to be a challenge in modern HPC software. Existing performance optimization techniques, including profiling-based and auto-tuning techniques, fail to indicate program modifications at the source level thus ...
- research-articleOctober 2024
Clacc: OpenACC for C/C++ in Clang
- Joel E Denny,
- Seyong Lee,
- Pedro Valero-Lara,
- Marc Gonzalez-Tallada,
- Keita Teranishi,
- Jeffrey S Vetter,
- Michael Heroux
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 5Pages 427–446https://doi.org/10.1177/10943420241261976The Clacc project has developed OpenACC compiler, runtime, and profiling interface support for C/C++ by extending Clang and LLVM. A key Clacc design feature is that it translates OpenACC to OpenMP to leverage the OpenMP offloading support that is ...
- ArticleAugust 2024
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
Euro-Par 2024: Parallel ProcessingPages 121–134https://doi.org/10.1007/978-3-031-69577-3_9AbstractLarge language models (LLMs)such as ChatGPT have significantly advanced the field of Natural Language Processing (NLP). This trend led to the development of code-based large language models such as StarCoder, WizardCoder, and CodeLlama, which are ...