Keyword: Code optimization : Search

research-article

Abstractions for C++ code optimizations in parallel high-performance applications

Parallel Computing (PACO), Volume 121, Issue Chttps://doi.org/10.1016/j.parco.2024.103096

Abstract

Many computational problems consider memory throughput a performance bottleneck, especially in the domain of parallel computing. Software needs to be attuned to hardware features like cache architectures or concurrent memory banks to reach a ...

Highlights

Proposing novel abstraction for flexible traversals of regular data structures.
Designed for traversal-agnostic algorithms in HPC parallel computing.
Reduces traversal code complexity, improving separation of concerns and ...

research-article

Gas-expensive patterns detection to optimize smart contracts

Applied Soft Computing (APSC), Volume 145, Issue Chttps://doi.org/10.1016/j.asoc.2023.110542

Abstract

Smart contracts are programmable protocols that run on Ethereum and require gas to be deployed and used. Gas-expensive operations in some smart contracts can cause users to consume extra gas in transactions. There are already several ...

Highlights

Detection of waste gas problems in smart contract loops.
Introducing pre-...

research-article

Java Vector API: Benchmarking and Performance Analysis

CC 2023: Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler ConstructionPages 1–12https://doi.org/10.1145/3578360.3580265

The Java Vector API is a new module introduced in Java 16, allowing developers to concisely express vector computations. The API promises both high performance, achieved via the runtime compilation of vector operations to hardware vector instructions, ...

research-article

Open Access

Verifying optimizations of concurrent programs in the promising semantics

PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationPages 903–917https://doi.org/10.1145/3519939.3523734

Weak memory models for concurrent programming languages are expected to admit standard compiler optimizations. However, prior works on verifying optimizations in weak memory models are mostly focused on simple optimizations on small code snippets ...

research-article

Open Access

Not so fast: understanding and mitigating negative impacts of compiler optimizations on code reuse gadget sets

Proceedings of the ACM on Programming Languages (PACMPL), Volume 5, Issue OOPSLAArticle No.: 154, Pages 1–30https://doi.org/10.1145/3485531

Despite extensive testing and correctness certification of their functional semantics, a number of compiler optimizations have been shown to violate security guarantees implemented in source code. While prior work has shed light on how such optimizations ...

Article

iCetus: A Semi-automatic Parallel Programming Assistant

Languages and Compilers for Parallel ComputingPages 18–32https://doi.org/10.1007/978-3-030-99372-6_2

Abstract

The iCetus tool is a new interactive parallelizer, providing users with a range of capabilities for the source-to-source transformation of C programs using OpenMP directives in shared memory machines. While the tool can parallelize code fully ...

other

Efficient code development for improving execution performance in high-performance computing centers

The Journal of Supercomputing (JSCO), Volume 77, Issue 4Pages 3261–3288https://doi.org/10.1007/s11227-020-03382-z

Abstract

Thanks to high-performance computing (HPC), it is possible to solve all kinds of highly complex projects from multiple scientific disciplines that require computationally intensive tasks to be undertaken and which otherwise could not be addressed. ...

research-article

Porting and optimization of solidification application for CPU-MIC hybrid platforms

International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 32, Issue 4Pages 523–539https://doi.org/10.1177/1094342016677740

Modern heterogeneous computing platforms have become powerful HPC solutions, which could be applied to a wide range of real-life applications. In particular, the hybrid platforms equipped with Intel Xeon Phi coprocessors offer the advantages of ...

research-article

On optimization of wireless XOR erasure codes

Physical Communication (PHYCOM), Volume 27, Issue CPages 74–85https://doi.org/10.1016/j.phycom.2018.01.007

In this paper we study the problem of optimizing linear erasure codes over GF(2) for single-hop wireless multicasting. We first present an algorithmic optimization technique which minimizes the number of transmissions given the knowledge of packets ...

article

High-level synthesis for FPGAs: code optimization strategies for real-time image processing

Journal of Real-Time Image Processing (SPJRTIP), Volume 14, Issue 3Pages 701–712

High-level synthesis (HLS) is a potential solution to increase the productivity of FPGA-based real-time image processing development. It allows designers to reap the benefits of hardware implementation directly from the algorithm behaviors specified ...

research-article

A scalable interface-resolved simulation of particle-laden flow using the lattice Boltzmann method

Parallel Computing (PACO), Volume 67, Issue CPages 20–37https://doi.org/10.1016/j.parco.2017.07.005

We examine the scalable implementation of the lattice Boltzmann method (LBM) in the context of interface-resolved direct numerical simulation of wall-bounded turbulent particle-laden flows.Three distinct aspects relevant to performance optimization of ...

article

Panda: A Compiler Framework for Concurrent CPU$$+$$+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

International Journal of Parallel Programming (IJPP), Volume 45, Issue 3Pages 711–729https://doi.org/10.1007/s10766-016-0454-1

We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of ...

article

Improving Code Density with Variable Length Encoding Aware Instruction Scheduling

Journal of Signal Processing Systems (JSPS), Volume 84, Issue 3Pages 435–446https://doi.org/10.1007/s11265-015-1081-6

Variable length encoding can considerably decrease code size in VLIW processors by reducing the number of bits wasted on encoding No Operations(NOPs). A processor may have different instruction templates where different execution slots are implicitly ...

research-article

Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming

Computers & Geosciences (CGEO), Volume 85, Issue PAPages 210–233https://doi.org/10.1016/j.cageo.2015.09.016

The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners and researchers. Despite ...

article

Fully Optimized Code Block Segmentation Algorithm for LTE-Advanced

International Journal of Parallel Programming (IJPP), Volume 43, Issue 6Pages 988–1003https://doi.org/10.1007/s10766-014-0324-7

In our previous work, we presented a brief analysis of the performance of the code block segmentation procedure adopted by the 3GPP LTE Advanced (LTE-A) Standard as part of its physical layer channel coding scheme. Here, a detailed analysis of its ...

article

A multiple-ISA reconfigurable architecture

Design Automation for Embedded Systems (DAES), Volume 19, Issue 4Pages 329–344https://doi.org/10.1007/s10617-015-9159-8

In these days, every newly added hardware feature must not change the underlying instruction set architecture (ISA), in order to avoid adaptation or recompilation of existing code. Nevertheless, this need for compatibility imposes a great number of ...

research-article

Efficient implementation of Galerkin meshfree methods for large-scale problems with an emphasis on maximum entropy approximants

Computers and Structures (CSTR), Volume 150, Issue CPages 52–62https://doi.org/10.1016/j.compstruc.2014.12.005

We propose a simple method to implement matrix assembly in Galerkin meshfree methods.By looping over groups of quadrature points, performance is significantly improved.We propose a method to efficiently store the maximum entropy basis functions.It ...

research-article

FALCON or how to compute measures time efficiently on dynamically evolving dense complex networks?

Journal of Biomedical Informatics (JOBI), Volume 47, Issue CPages 62–70

Display Omitted A new time efficient complex network analysis library for dense networks is proposed.It is suited for the analysis, e.g. of several thousands of distributed brain sources.High performance is achieved by combined hard- and software ...

Article

An Effective Way to Generate Complex Instructions for Media Processors

CSA '13: Proceedings of the 2013 International Conference on Computer Sciences and ApplicationsPages 503–507https://doi.org/10.1109/CSA.2013.122

With the development of digital signal processing (DSP) processors, the design of retarget able C compiler is necessary. But the C compiler for media processors which is built by porting GCC can't generate effective complex instructions in its way of ...

article

Code reordering using local random extraction and insertion (LREI) operator for GPGPU-based track-before-detect systems

Przemysław Mazurek

Soft Computing - A Fusion of Foundations, Methodologies and Applications (SOFC), Volume 17, Issue 6Pages 1095–1106https://doi.org/10.1007/s00500-012-0956-8

Track-before-detect (TBD) algorithms are used for tracking systems, where the object's signal is below the noise floor (low-SNR objects). A lot of computations and memory transfers for real-time signal processing are necessary. GPGPU in parallel ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences