The influence of embedded systems is constantly growing. Increasingly powerful and versatile devices are developed and put on the market at a fast pace. Their functionality and number of features is increasing, and so are the constraints on the systems concerning size, performance, energy dissipation and timing predictability. To meet all these constraints, multi-processor systems on a chip (MPSoCs) are becoming popular in embedded systems. In order to meet the performance and energy constraints of embedded applications, heterogeneous architectures incorporating functional units optimized for specific functions are commonly employed. This technological trend has dramatic consequences on the parallelization, mapping, compiler and design technology used to develop these systems. The SCOPES workshop focuses on the software generation process for these modern embedded systems. Topics of interest include all aspects of the compilation and mapping process of embedded single and multiprocessor systems.
Proceeding Downloads
Data-layout optimization based on memory-access-pattern analysis for source-code performance improvement
With the rising impact of the memory wall, selecting the adequate data-structure implementation for a given kernel has become a performance-critical issue. This paper presents a new methodology to solve the data-layout decision problem by adapting an ...
Scheduling of moldable fork-join tasks with inter- and intra-task communications
This paper proposes scheduling techniques for moldable fork-join tasks on multicore architecture. The proposed techniques decide the number of cores and execution start time for each task during scheduling and mapping, with taking into account inter- ...
On the implementation and execution of adaptive streaming applications modeled as MADF
It has been shown that the mode-aware dataflow (MADF) is an advantageous analysis model for adaptive streaming applications. However, no attention has been paid on how to implement and execute an application, modeled and analyzed with the MADF model, on ...
Compiling synchronous languages to optimal move code for exposed datapath architectures
Conventional processor architectures are limited in exploiting instruction level parallelism (ILP). One of the reasons for this limitation is their relatively low number of registers. Thus, recent processor architectures expose their datapaths so that ...
Design space exploration for layer-parallel execution of convolutional neural networks on CGRAs
In this work, we systematically explore the design space of throughput, energy, and hardware costs for layer-parallel mappings of Convolutional Neural Networks (CNNs) onto coarse-grained reconfigurable arrays (CGRAs). We derive an analytical model that ...
Compiler-based WCET prediction performing function specialization
The Worst-Case Execution Time (WCET) is one of the most important criteria of hard real-time systems. Many optimizations have been proposed to improve WCET of an embedded application at compile time. Moreover, since modern embedded systems must also ...
Programming tensor cores from an image processing DSL
Tensor Cores (TCUs) are specialized units first introduced by NVIDIA in the Volta microarchitecture in order to accelerate matrix multiplications for deep learning and linear algebra workloads. While these units have proved to be capable of providing ...
OpenMP to CUDA graphs: a compiler-based transformation to enhance the programmability of NVIDIA devices
Heterogeneous computing is increasingly being used in a diversity of computing systems, ranging from HPC to the real-time embedded domain, to cope with the performance requirements. Due to the variety of accelerators, e.g., FPGAs, GPUs, the use of high-...
Reviewing inference performance of state-of-the-art deep learning frameworks
Deep learning models have replaced conventional methods for machine learning tasks. Efficient inference on edge devices with limited resources is key for broader deployment. In this work, we focus on the tool selection challenge for inference ...
Analog implementation of arithmetic operations on real memristors
The upcoming topic of in-memory-computing tries to support CPUs by taking over simple calculations that can be done in memory. This leads to less performance drain caused by those simple calculations as well as lower energy consumption for the whole ...
Efficient parallel reduction on GPUs with Hipacc
Hipacc is a domain-specific language for ease of programming image processing applications on hardware accelerators such as GPUs. It relieves the burden of manually porting algorithms to hardware for developers with the help of domain- and architecture-...
A secure hardware-software solution based on RISC-V, logic locking and microkernel
- Dominik Šišejković,
- Farhad Merchant,
- Lennart M. Reimann,
- Rainer Leupers,
- Massimiliano Giacometti,
- Sascha Kegreiß
In this paper we present the first generation of a secure platform developed by following a security-by-design approach. The security of the platform is built on top of two pillars: a secured hardware design flow and a secure microkernel. The hardware ...
Exploration of GPU sharing policies under GEMM workloads
Lately, cloud computing has seen explosive growth, due to the flexibility and scalability it offers. The ever-increasing computational demands, especially from the machine learning domain, have forced cloud operators to enhance their infrastructure with ...
Configuring loosely time-triggered wireless control software
In many wireless control networks, sensor data and controller data are exchanged periodically, which requires periodic packet transmissions between the physical plant and the controller. As an alternative, event-triggered control paradigms imply that ...
Portable exploitation of parallel and heterogeneous HPC architectures in neural simulation using SkePU
- Sotirios Panagiotou,
- August Ernstsson,
- Johan Ahlqvist,
- Lazaros Papadopoulos,
- Christoph Kessler,
- Dimitrios Soudris
The complexity of modern HPC systems requires the use of new tools that support advanced programming models and offer portability and programmability of parallel and heterogeneous architectures. In this work we evaluate the use of SkePU framework in an ...
Cross-layer approaches for improving the dependability of deep learning systems
Deep Neural Networks (DNNs) - the state-of-the-art computational models for many Artificial Intelligence (AI) applications - are inherently compute and resource-intensive and, hence, cannot exploit traditional redundancy-based fault mitigation ...
Real-time audio processing for hearing aids using a model-based bayesian inference framework
Development of hearing aid (HA) signal processing algorithms entails an iterative process between two design steps, namely algorithm development and the embedded implementation. Algorithm designers favor high-level programming languages for several ...
Index Terms
- Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
SCOPES '21 | 15 | 7 | 47% |
SCOPES '20 | 13 | 8 | 62% |
SCOPES '17 | 9 | 6 | 67% |
M-SCOPES '13 | 16 | 9 | 56% |
SCOPES '09 | 26 | 8 | 31% |
Overall | 79 | 38 | 48% |