skip to main content
Volume 121, Issue CSep 2024Current Issue
Reflects downloads up to 03 Jan 2025Bibliometrics
Skip Table Of Content Section
Regular paper
research-article
WBSP: Addressing stragglers in distributed machine learning with worker-busy synchronous parallel
Abstract

Parameter server is widely used in distributed machine learning to accelerate training. However, the increasing heterogeneity of workers’ computing capabilities leads to the issue of stragglers, making parameter synchronization challenging. To ...

research-article
Multi-GPU 3D k-nearest neighbors computation with application to ICP, point cloud smoothing and normals computation
Abstract

The k-Nearest Neighbors algorithm is a fundamental algorithm that finds applications in many fields like Machine Learning, Computer Graphics, Computer Vision, and others. The algorithm determines the closest points (d-dimensional) of a reference ...

research-article
NxtSPR: A deadlock-free shortest path routing dedicated to relaying for Triplet-Based many-core Architecture
Abstract

Deadlock-free routing is a significant challenge in Network-on-Chip (NoC) design as it affects the network’s latency, power consumption, and load balance, impacting the performance of multi-processor systems-on-chip. However, achieving deadlock-...

Highlights

  • The topology-related characteristics of Triplet-Based many-core Architecture are defined systematically using graph and group theory, and its correctness is verified through formal verification (proof-based) methods.
  • A novel and high-...

research-article
Mobilizing underutilized storage nodes via job path: A job-aware file striping approach
Abstract

Users’ limited understanding of the storage system architecture prevents them from fully utilizing the parallel I/O capability of the storage system, leading to a negative impact on the overall performance of supercomputers. Therefore, exploring ...

Special issue on The 15th International Workshop on Programming Models and Applications for Multicores and Manycores
research-article
Abstractions for C++ code optimizations in parallel high-performance applications
Abstract

Many computational problems consider memory throughput a performance bottleneck, especially in the domain of parallel computing. Software needs to be attuned to hardware features like cache architectures or concurrent memory banks to reach a ...

Highlights

  • Proposing novel abstraction for flexible traversals of regular data structures.
  • Designed for traversal-agnostic algorithms in HPC parallel computing.
  • Reduces traversal code complexity, improving separation of concerns and ...

research-article
An automated OpenMP mutation testing framework for performance optimization
Abstract

Performance optimization continues to be a challenge in modern HPC software. Existing performance optimization techniques, including profiling-based and auto-tuning techniques, fail to indicate program modifications at the source level thus ...

Comments