No abstract available.
Proceeding Downloads
A compiler infrastructure for embedded heterogeneous MPSoCs
- Weihua Sheng,
- Stefan Schürmans,
- Maximilian Odendahl,
- Mark Bertsch,
- Vitaliy Volevach,
- Rainer Leupers,
- Gerd Ascheid
Programming heterogeneous MPSoCs (Multi-Processor Systems on Chip) is a grand challenge for embedded SoC providers and users today. In this paper, we argue for the need and significance of positioning the language and tool design from the perspective of ...
X10-FT: transparent fault tolerance for APGAS language and runtime
The emergence of multicore machines has made exploiting parallelism a necessity to harness the abundant computing resources in both a single machine and clusters. This, however, may hinder programming productivities as threaded and distributed ...
Bulk synchronous visualization
Many visual analytics applications require computationally expensive high resolution visualizations. Large desktop displays and display walls may provide the required resolution, and current multi- and many-core processors often have the required ...
The JStar language philosophy
This paper introduces the JStar parallel programming language, which is a Java-based declarative language aimed at discouraging sequential programming, encouraging massively parallel programming, and giving the compiler and runtime maximum freedom to ...
Pyjama: OpenMP-like implementation for Java, with GUI extensions
Incremental parallelism is an uncomplicated and expressive parallelisation practice and has led to wide adoption of OpenMP. However, the OpenMP specification does not present a binding for the Java language and the OpenMP threading model finds limited ...
A pattern-supported parallelization approach
In the embedded systems domain a trend towards multi-and many-core processors is evident. For the exploitation of these additional processing elements parallel software is inevitable. The pattern-supported parallelization approach, which is introduced ...
Parallel time-space processing model based fast N-body simulation on GPUs
The N-body problems simulate the evolution of a system of N bodies where the force exerted on each body arises due to its interaction with all the other bodies in the system. In this paper, we present a novel parallel implementation of N-body ...
A generate-test-aggregate parallel programming library: systematic parallel programming for MapReduce
Generate-Test-Aggregate (GTA for short) is a novel programming model for MapReduce, dramatically simplifying the development of efficient parallel algorithms. Under the GTA model, a parallel computation is encoded into a simple pattern: generate all ...
libEOMP: a portable OpenMP runtime library based on MCA APIs for embedded systems
In recent years rapid revolution of Multiprocessor System-on-Chip (MPSoC) poses new challenges for programming such architectures in an efficient manner. In order to explore potential hardware concurrency, software developers are still expected to ...
Low power cache architectures with hybrid approach of filtering unnecessary way accesses
Power has been a big issue in processor design for several years. As caches account for more and more CPU die area and power, this paper presents using filtering unnecessary way accesses to reduce dynamic power consumption of unified L2 cache shared by ...
Empirical measurement of instruction level parallelism for four generations of ARM CPUs
Parallel computing at all levels is becoming important in all devices and not least in mobile and embedded systems. Many wireless, mobile and deployable devices make use of the ARM CPU and its variants. We report on investigations into measuring ...
CAP: co-scheduling based on asymptotic profiling in CPU+GPU hybrid systems
Hybrid systems with CPU and GPU have become the new standard in high performance computing. Workloads are split into two parts and distributed to different devices to utilize both CPU and GPU for data parallelism in hybrid systems. But it is challenging ...
Scheduling directives for shared-memory many-core processor systems
We consider many-core processors with task-oriented programming, whereby scheduling constraints among tasks are decided offline, and are then enforced by the runtime system. Here, exposing and beneficially exploiting fine grain data and control ...
Auto-tuning methodology to represent landform attributes on multicore and multi-GPU systems
Auto-Tuning techniques have been used in the design of routines in recent years. The goal is to develop routines which automatically adapt to the conditions of the computational system, in such a way that efficient executions are obtained independently ...
Index Terms
- Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores