No abstract available.
Proceeding Downloads
Understanding Parallelization Tradeoffs for Linear Pipelines
Pipelining techniques execute some loops with cross-iteration dependences in parallel, by partitioning the loop body into a sequence of stages such that the data dependences are not violated. Obtaining good performance for all kinds of loops is ...
Combining PREM compilation and ILP scheduling for high-performance and predictable MPSoC execution
Many applications require both high performance and predictable timing. High-performance can be provided by COTS Multi-Core System on Chips (MPSoC), however, as cores in these systems share the memory bandwidth they are susceptible to interference from ...
An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs
Emerging high-performance processor architectures show two key trends: longer vector units and deeper memory hierarchies. It is not always possible to exploit both vectorization and locality. Prior optimization techniques have focused on either ...
Fast and Accurate Performance Analysis of Synchronization
Understanding parallel program bottlenecks is critical to designing more efficient and performant parallel architectures. Synchronization is a prime example of a potential bottleneck, but is a necessary evil when writing parallel programs; we must ...
Supporting Fine-grained Dataflow Parallelism in Big Data Systems
Big data systems scale with the number of cores in a cluster for the parts of an application that can be executed in data parallel fashion. It has been recently reported, however, that these systems fail to translate hardware improvements, such as ...
Reduction to Band Form for the Singular Value Decomposition on Graphics Accelerators
In this paper we show that two-stage algorithms for the singular value decomposition (SVD) significantly benefit from an alternative reduction to a intermediate by-product after the first stage that consists of a band matrix with the same upper and ...
Intra-Task Parallelism in Automotive Real-Time Systems
- Remko van Wagensveld,
- Tobias Wägemann,
- Niklas Hehenkamp,
- Ramin Tavakoli Kolagari,
- Ulrich Margull,
- Ralph Mader
Many recent Engine Management Systems (EMSs) have multicore processors. This results in new challenges for the developers of those systems, as most of them are not familiar with multicore programming. Additionally, many of the EMSs have real-time ...
Extending ILUPACK with a Task-Parallel Version of BiCG for Dual-GPU Servers
We target the solution of sparse linear systems via iterative Krylov subspace-based methods enhanced with the ILUPACK preconditioner on graphics processing units (GPUs). Concretely, in this work we extend ILUPACK with an implementation of the BiCG ...
VAIL: A Victim-Aware Cache Policy for Improving Lifetime of Hybrid Memory
Nowadays emerging Non-Volatile Memory (NVM) technologies are introduced to remedy the shortages of the current DRAM-based memory system. However, NVM has limited write endurance, which would severely restrict the performance of memory system. In order ...
Index Terms
- Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores