Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleSeptember 2024
Abstractions for C++ code optimizations in parallel high-performance applications
Parallel Computing (PACO), Volume 121, Issue Chttps://doi.org/10.1016/j.parco.2024.103096AbstractMany computational problems consider memory throughput a performance bottleneck, especially in the domain of parallel computing. Software needs to be attuned to hardware features like cache architectures or concurrent memory banks to reach a ...
Highlights- Proposing novel abstraction for flexible traversals of regular data structures.
- Designed for traversal-agnostic algorithms in HPC parallel computing.
- Reduces traversal code complexity, improving separation of concerns and ...
- research-articleSeptember 2023
Gas-expensive patterns detection to optimize smart contracts
Applied Soft Computing (APSC), Volume 145, Issue Chttps://doi.org/10.1016/j.asoc.2023.110542AbstractSmart contracts are programmable protocols that run on Ethereum and require gas to be deployed and used. Gas-expensive operations in some smart contracts can cause users to consume extra gas in transactions. There are already several ...
Highlights- Detection of waste gas problems in smart contract loops.
- Introducing pre-...
- research-articleFebruary 2023
Java Vector API: Benchmarking and Performance Analysis
CC 2023: Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler ConstructionPages 1–12https://doi.org/10.1145/3578360.3580265The Java Vector API is a new module introduced in Java 16, allowing developers to concisely express vector computations. The API promises both high performance, achieved via the runtime compilation of vector operations to hardware vector instructions, ...
Verifying optimizations of concurrent programs in the promising semantics
PLDI 2022: Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and ImplementationPages 903–917https://doi.org/10.1145/3519939.3523734Weak memory models for concurrent programming languages are expected to admit standard compiler optimizations. However, prior works on verifying optimizations in weak memory models are mostly focused on simple optimizations on small code snippets ...
Not so fast: understanding and mitigating negative impacts of compiler optimizations on code reuse gadget sets
Proceedings of the ACM on Programming Languages (PACMPL), Volume 5, Issue OOPSLAArticle No.: 154, Pages 1–30https://doi.org/10.1145/3485531Despite extensive testing and correctness certification of their functional semantics, a number of compiler optimizations have been shown to violate security guarantees implemented in source code. While prior work has shed light on how such optimizations ...
-
- ArticleOctober 2021
iCetus: A Semi-automatic Parallel Programming Assistant
Languages and Compilers for Parallel ComputingPages 18–32https://doi.org/10.1007/978-3-030-99372-6_2AbstractThe iCetus tool is a new interactive parallelizer, providing users with a range of capabilities for the source-to-source transformation of C programs using OpenMP directives in shared memory machines. While the tool can parallelize code fully ...
- otherApril 2021
Efficient code development for improving execution performance in high-performance computing centers
The Journal of Supercomputing (JSCO), Volume 77, Issue 4Pages 3261–3288https://doi.org/10.1007/s11227-020-03382-zAbstractThanks to high-performance computing (HPC), it is possible to solve all kinds of highly complex projects from multiple scientific disciplines that require computationally intensive tasks to be undertaken and which otherwise could not be addressed. ...
- research-articleJuly 2018
Porting and optimization of solidification application for CPU-MIC hybrid platforms
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 32, Issue 4Pages 523–539https://doi.org/10.1177/1094342016677740Modern heterogeneous computing platforms have become powerful HPC solutions, which could be applied to a wide range of real-life applications. In particular, the hybrid platforms equipped with Intel Xeon Phi coprocessors offer the advantages of ...
- research-articleApril 2018
On optimization of wireless XOR erasure codes
Physical Communication (PHYCOM), Volume 27, Issue CPages 74–85https://doi.org/10.1016/j.phycom.2018.01.007In this paper we study the problem of optimizing linear erasure codes over GF(2) for single-hop wireless multicasting. We first present an algorithmic optimization technique which minimizes the number of transmissions given the knowledge of packets ...
- articleMarch 2018
High-level synthesis for FPGAs: code optimization strategies for real-time image processing
High-level synthesis (HLS) is a potential solution to increase the productivity of FPGA-based real-time image processing development. It allows designers to reap the benefits of hardware implementation directly from the algorithm behaviors specified ...
- research-articleSeptember 2017
A scalable interface-resolved simulation of particle-laden flow using the lattice Boltzmann method
Parallel Computing (PACO), Volume 67, Issue CPages 20–37https://doi.org/10.1016/j.parco.2017.07.005We examine the scalable implementation of the lattice Boltzmann method (LBM) in the context of interface-resolved direct numerical simulation of wall-bounded turbulent particle-laden flows.Three distinct aspects relevant to performance optimization of ...
- articleJune 2017
Panda: A Compiler Framework for Concurrent CPU$$+$$+GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
International Journal of Parallel Programming (IJPP), Volume 45, Issue 3Pages 711–729https://doi.org/10.1007/s10766-016-0454-1We present a new compiler framework for truly heterogeneous 3D stencil computation on GPU clusters. Our framework consists of a simple directive-based programming model and a tightly integrated source-to-source compiler. Annotated with a small number of ...
- articleSeptember 2016
Improving Code Density with Variable Length Encoding Aware Instruction Scheduling
Journal of Signal Processing Systems (JSPS), Volume 84, Issue 3Pages 435–446https://doi.org/10.1007/s11265-015-1081-6Variable length encoding can considerably decrease code size in VLIW processors by reducing the number of bits wasted on encoding No Operations(NOPs). A processor may have different instruction templates where different execution slots are implicitly ...
- research-articleDecember 2015
Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming
Computers & Geosciences (CGEO), Volume 85, Issue PAPages 210–233https://doi.org/10.1016/j.cageo.2015.09.016The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners and researchers. Despite ...
- articleDecember 2015
Fully Optimized Code Block Segmentation Algorithm for LTE-Advanced
International Journal of Parallel Programming (IJPP), Volume 43, Issue 6Pages 988–1003https://doi.org/10.1007/s10766-014-0324-7In our previous work, we presented a brief analysis of the performance of the code block segmentation procedure adopted by the 3GPP LTE Advanced (LTE-A) Standard as part of its physical layer channel coding scheme. Here, a detailed analysis of its ...
- articleDecember 2015
A multiple-ISA reconfigurable architecture
Design Automation for Embedded Systems (DAES), Volume 19, Issue 4Pages 329–344https://doi.org/10.1007/s10617-015-9159-8In these days, every newly added hardware feature must not change the underlying instruction set architecture (ISA), in order to avoid adaptation or recompilation of existing code. Nevertheless, this need for compatibility imposes a great number of ...
- research-articleApril 2015
Efficient implementation of Galerkin meshfree methods for large-scale problems with an emphasis on maximum entropy approximants
Computers and Structures (CSTR), Volume 150, Issue CPages 52–62https://doi.org/10.1016/j.compstruc.2014.12.005We propose a simple method to implement matrix assembly in Galerkin meshfree methods.By looping over groups of quadrature points, performance is significantly improved.We propose a method to efficiently store the maximum entropy basis functions.It ...
- research-articleFebruary 2014
FALCON or how to compute measures time efficiently on dynamically evolving dense complex networks?
Display Omitted A new time efficient complex network analysis library for dense networks is proposed.It is suited for the analysis, e.g. of several thousands of distributed brain sources.High performance is achieved by combined hard- and software ...
- ArticleDecember 2013
An Effective Way to Generate Complex Instructions for Media Processors
CSA '13: Proceedings of the 2013 International Conference on Computer Sciences and ApplicationsPages 503–507https://doi.org/10.1109/CSA.2013.122With the development of digital signal processing (DSP) processors, the design of retarget able C compiler is necessary. But the C compiler for media processors which is built by porting GCC can't generate effective complex instructions in its way of ...
- articleJune 2013
Code reordering using local random extraction and insertion (LREI) operator for GPGPU-based track-before-detect systems
Soft Computing - A Fusion of Foundations, Methodologies and Applications (SOFC), Volume 17, Issue 6Pages 1095–1106https://doi.org/10.1007/s00500-012-0956-8Track-before-detect (TBD) algorithms are used for tracking systems, where the object's signal is below the noise floor (low-SNR objects). A lot of computations and memory transfers for real-time signal processing are necessary. GPGPU in parallel ...