TACO: Vol 19, No 2

Volume 19, Issue 2June 2022

Volume 19, Issue 2

June 2022

Editor:

David Kaeli
Northeastern University, USA

Publisher:

Association for Computing Machinery
New York
NY
United States

ISSN:1544-3566

EISSN:1544-3973

Tags:

Bibliometrics

Issue Downloads

PDFfront matter (TOC, masthead, submission information)

Select All

Export Citations Save to Binder

research-article

Open Access

Memory-Aware Functional IR for Higher-Level Synthesis of Accelerators

Article No.: 16, Pages 1–26https://doi.org/10.1145/3501768

Specialized accelerators deliver orders of a magnitude of higher performance than general-purpose processors. The ever-changing nature of modern workloads is pushing the adoption of Field Programmable Gate Arrays (FPGAs) as the substrate of choice. ...

research-article

Open Access

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

Article No.: 17, Pages 1–25https://doi.org/10.1145/3499424

Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally ...

research-article

Open Access

MAPPER: Managing Application Performance via Parallel Efficiency Regulation *

Article No.: 18, Pages 1–26https://doi.org/10.1145/3501767

State-of-the-art systems, whether in servers or desktops, provide ample computational and storage resources to allow multiple simultaneously executing potentially parallel applications. However, performance tends to be unpredictable, being a function of ...

research-article

Open Access

Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis

Article No.: 19, Pages 1–26https://doi.org/10.1145/3504005

Traditional processor architectures utilize an external DRAM for data storage, while they also operate under worst-case timing constraints. Such designs are heavily constrained by the delay costs of the data transfer between the core pipeline and the DRAM,...

research-article

Open Access

GiantVM: A Novel Distributed Hypervisor for Resource Aggregation with DSM-aware Optimizations

Article No.: 20, Pages 1–27https://doi.org/10.1145/3505251

We present GiantVM,¹ an open-source distributed hypervisor that provides the many-to-one virtualization to aggregate resources from multiple physical machines. We propose techniques to enable distributed CPU and I/O virtualization and distributed shared ...

research-article

Open Access

Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained Applications

Article No.: 21, Pages 1–27https://doi.org/10.1145/3505559

Processor resources can be adapted at runtime according to the dynamic behavior of applications to reduce the energy consumption of multicore processors without affecting the Quality-of-Service (QoS). To achieve this, an online resource management scheme ...

research-article

Open Access

Weaving Synchronous Reactions into the Fabric of SSA-form Compilers

Article No.: 22, Pages 1–25https://doi.org/10.1145/3506706

We investigate the programming of reactive systems combining closed-loop control with performance-intensive components such as Machine Learning (ML). Reactive control systems are often safety-critical and associated with real-time execution requirements, ...

research-article

Open Access

Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization

Article No.: 23, Pages 1–23https://doi.org/10.1145/3505558

This paper describes a new approach to register-pressure-aware instruction scheduling, using Ant Colony Optimization (ACO). ACO is a nature-inspired optimization technique that researchers have successfully applied to NP-hard sequencing problems like the ...

research-article

Open Access

MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation

Article No.: 24, Pages 1–26https://doi.org/10.1145/3506705

The many-body correlation function is a fundamental computation kernel in modern physics computing applications, e.g., Hadron Contractions in Lattice quantum chromodynamics (QCD). This kernel is both computation and memory intensive, involving a series of ...

research-article

Open Access

Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores

Article No.: 25, Pages 1–28https://doi.org/10.1145/3506704

Exploiting memory-level parallelism (MLP) is crucial to hide long memory and last-level cache access latencies. While out-of-order (OoO) cores, and techniques building on them, are effective at exploiting MLP, they deliver poor energy efficiency due to ...

research-article

Open Access

MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer Optimizations

Article No.: 26, Pages 1–29https://doi.org/10.1145/3505250

This article introduces the first open-source FPGA-based infrastructure, MetaSys, with a prototype in a RISC-V system, to enable the rapid implementation and evaluation of a wide range of cross-layer techniques in real hardware. Hardware-software ...

research-article

Open Access

ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes

Article No.: 27, Pages 1–29https://doi.org/10.1145/3510422

Parallel applications often rely on work stealing schedulers in combination with fine-grained tasking to achieve high performance and scalability. However, reducing the total energy consumption in the context of work stealing runtimes is still challenging,...

research-article

Open Access

Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory

Article No.: 28, Pages 1–26https://doi.org/10.1145/3511706

This article points out an important threat that application-level Garbage Collection (GC) creates to the use of non-volatile memory (NVM). Data movements incurred by GC may invalidate the pointers to objects on NVM and, hence, harm the reusability of ...

research-article

Open Access

A Case For Intra-rack Resource Disaggregation in HPC

Article No.: 29, Pages 1–26https://doi.org/10.1145/3514245

The expected halt of traditional technology scaling is motivating increased heterogeneity in high-performance computing (HPC) systems with the emergence of numerous specialized accelerators. As heterogeneity increases, so does the risk of underutilizing ...

ACM Transactions on Architecture and Code Optimization

Sections

Issue Downloads

Memory-Aware Functional IR for Higher-Level Synthesis of Accelerators

The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture

MAPPER: Managing Application Performance via Parallel Efficiency Regulation *

Low-power Near-data Instruction Execution Leveraging Opcode-based Timing Analysis

GiantVM: A Novel Distributed Hypervisor for Resource Aggregation with DSM-aware Optimizations

Cooperative Slack Management: Saving Energy of Multicore Processors by Trading Performance Slack Between QoS-Constrained Applications

Weaving Synchronous Reactions into the Fabric of SSA-form Compilers

Register-Pressure-Aware Instruction Scheduling Using Ant Colony Optimization

MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation

Dependence-aware Slice Execution to Boost MLP in Slice-out-of-order Cores

MetaSys: A Practical Open-source Metadata Management System to Implement and Evaluate Cross-layer Optimizations

ERASE: Energy Efficient Task Mapping and Resource Management for Work Stealing Runtimes

Preserving Addressability Upon GC-Triggered Data Movements on Non-Volatile Memory

A Case For Intra-rack Resource Disaggregation in HPC

Sections

Issue Downloads

Save to Binder

Subjects

Comments