Issue Downloads
Memory-Aware Functional IR for Higher-Level Synthesis of Accelerators
Specialized accelerators deliver orders of a magnitude of higher performance than general-purpose processors. The ever-changing nature of modern workloads is pushing the adoption of Field Programmable Gate Arrays (FPGAs) as the substrate of choice. ...
The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally ...
Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis
Traditional processor architectures utilize an external DRAM for data storage, while they also operate under worst-case timing constraints. Such designs are heavily constrained by the delay costs of the data transfer between the core pipeline and the DRAM,...
GiantVM: A Novel Distributed Hypervisor for Resource Aggregation with DSM-aware Optimizations
We present GiantVM,1 an open-source distributed hypervisor that provides the many-to-one virtualization to aggregate resources from multiple physical machines. We propose techniques to enable distributed CPU and I/O virtualization and distributed shared ...
Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained Applications
Processor resources can be adapted at runtime according to the dynamic behavior of applications to reduce the energy consumption of multicore processors without affecting the Quality-of-Service (QoS). To achieve this, an online resource management scheme ...
Weaving Synchronous Reactions into the Fabric of SSA-form Compilers
We investigate the programming of reactive systems combining closed-loop control with performance-intensive components such as Machine Learning (ML). Reactive control systems are often safety-critical and associated with real-time execution requirements, ...
Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization
This paper describes a new approach to register-pressure-aware instruction scheduling, using Ant Colony Optimization (ACO). ACO is a nature-inspired optimization technique that researchers have successfully applied to NP-hard sequencing problems like the ...
Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores
Exploiting memory-level parallelism (MLP) is crucial to hide long memory and last-level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to ...
MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer Optimizations
- Nandita Vijaykumar,
- Ataberk Olgun,
- Konstantinos Kanellopoulos,
- F. Nisa Bostanci,
- Hasan Hassan,
- Mehrshad Lotfi,
- Phillip B. Gibbons,
- Onur Mutlu
This article introduces the first open-source FPGA-based infrastructure, MetaSys, with a prototype in a RISC-V system, to enable the rapid implementation and evaluation of a wide range of cross-layer techniques in real hardware. Hardware-software ...
ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes
Parallel applications often rely on work stealing schedulers in combination with fine-grained tasking to achieve high performance and scalability. However, reducing the total energy consumption in the context of work stealing runtimes is still challenging,...
Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory
This article points out an important threat that application-level Garbage Collection (GC) creates to the use of non-volatile memory (NVM). Data movements incurred by GC may invalidate the pointers to objects on NVM and, hence, harm the reusability of ...
A Case For Intra-rack Resource Disaggregation in HPC
- George Michelogiannakis,
- Benjamin Klenk,
- Brandon Cook,
- Min Yee Teh,
- Madeleine Glick,
- Larry Dennison,
- Keren Bergman,
- John Shalf
The expected halt of traditional technology scaling is motivating increased heterogeneity in high-performance computing (HPC) systems with the emergence of numerous specialized accelerators. As heterogeneity increases, so does the risk of underutilizing ...