- Sponsor:
- sigarch
It is our great pleasure and honor to welcome you to the program of the 49th IEEE/ACM International Symposium on Computer Architecture (ISCA), the flagship conference of the computer architecture community. After two years in a virtual format, the conference will be held physically in New York. There is an excellent program that has been selected for the conference.
NvMR: non-volatile memory renaming for intermittent computing
Intermittent systems on energy-harvesting devices have to frequently back up data because of an unreliable energy supply to make forward progress. These devices come with non-volatile memories like Flash/FRAM on board that are used to back up the system ...
Free atomics: hardware atomic operations without fences
Atomic Read-Modify-Write (RMW) instructions are primitive synchronization operations implemented in hardware that provide the building blocks for higher-abstraction synchronization mechanisms to programmers. According to publicly available documentation,...
Securing GPU via region-based bounds checking
Graphics processing units (GPUs) have become essential general-purpose computing platforms to accelerate a wide range of workloads, such as deep learning, scientific, and high-performance computing (HPC) applications. However, recent memory corruption ...
täkō: a polymorphic cache hierarchy for general-purpose optimization of data movement
Current systems hide data movement from software behind the load-store interface. Software's inability to observe and respond to data movement is the root cause of many inefficiencies, including the growing fraction of execution time and energy devoted ...
EQC: ensembled quantum computing for variational quantum algorithms
Variational quantum algorithm (VQA), which is comprised of a classical optimizer and a parameterized quantum circuit, emerges as one of the most promising approaches for harvesting the power of quantum computers in the noisy intermediate scale quantum (...
Axiomatic hardware-software contracts for security
We propose leakage containment models (LCMs)---novel axiomatic security contracts which support formally reasoning about the security guarantees of programs when they run on particular microarchitectures. Our core contribution is an axiomatic vocabulary ...
PPMLAC: high performance chipset architecture for secure multi-party computation
Privacy issue is a main concern restricting data sharing and cross-organization collaborations. While Privacy-Preserving Machine Learning techniques such as Multi-Party Computations (MPC), Homomorphic Encryption, and Federated Learning are proposed to ...
INSPIRE: in-storage private information retrieval via protocol and architecture co-design
Private Information Retrieval (PIR) plays a vital role in secure, database-centric applications. However, existing PIR protocols explore a massive working space containing hundreds of GiBs of query and database data. As a consequence, PIR performance is ...
TDGraph: a topology-driven accelerator for high-performance streaming graph processing
- Jin Zhao,
- Yun Yang,
- Yu Zhang,
- Xiaofei Liao,
- Lin Gu,
- Ligang He,
- Bingsheng He,
- Hai Jin,
- Haikun Liu,
- Xinyu Jiang,
- Hui Yu
Many solutions have been recently proposed to support the processing of streaming graphs. However, for the processing of each graph snapshot of a streaming graph, the new states of the vertices affected by the graph updates are propagated irregularly ...
DIMMining: pruning-efficient and parallel graph mining on near-memory-computing
Graph mining, which finds specific patterns in the graph, is becoming increasingly important in various domains. We point out that accelerating graph mining suffers from the following challenges: (1) Heavy comparison for pruning: Pruning technique is ...
NDMiner: accelerating graph pattern mining using near data processing
- Nishil Talati,
- Haojie Ye,
- Yichen Yang,
- Leul Belayneh,
- Kuan-Yu Chen,
- David Blaauw,
- Trevor Mudge,
- Ronald Dreslinski
Graph Pattern Mining (GPM) algorithms mine structural patterns in graphs. The performance of GPM workloads is bottlenecked by control flow and memory stalls. This is because of data-dependent branches used in set intersection and difference operations ...
SoftVN: efficient memory protection via software-provided version numbers
Trusted execution environments (TEEs) in processors protect off-chip memory (DRAM), and ensure its confidentiality and integrity using memory encryption and integrity verification. However, such memory protection can incur significant performance ...
CraterLake: a hardware accelerator for efficient unbounded computation on encrypted data
- Nikola Samardzic,
- Axel Feldmann,
- Aleksandar Krastev,
- Nathan Manohar,
- Nicholas Genise,
- Srinivas Devadas,
- Karim Eldefrawy,
- Chris Peikert,
- Daniel Sanchez
Fully Homomorphic Encryption (FHE) enables offloading computation to untrusted servers with cryptographic privacy. Despite its attractive security, FHE is not yet widely adopted due to its prohibitive overheads, about 10,000X over unencrypted ...
PS-ORAM: efficient crash consistency support for oblivious RAM on NVM
Oblivious RAM (ORAM) is a provable secure primitive to prevent access pattern leakage on the memory bus. By randomly remapping the data blocks and accessing redundant blocks, ORAM prevents access pattern leakage through ob-fuscation. Byte-addressable ...
There's always a bigger fish: a clarifying analysis of a machine-learning-assisted side-channel attack
Machine learning has made it possible to mount powerful attacks through side channels that have traditionally been seen as challenging to exploit. However, due to the black-box nature of machine learning models, these attacks are often difficult to ...
Gearbox: a case for supporting accumulation dispatching and hybrid partitioning in PIM-based accelerators
Processing-in-memory (PIM) minimizes data movement overheads by placing processing units near each memory segment. Recent PIMs employ processing units with a SIMD architecture. However, kernels with random accesses, such as sparse-matrix-dense-vector (...
To PIM or not for emerging general purpose processing in DDR memory systems
As Processing-In-Memory (PIM) hardware matures and starts making its way into normal compute platforms, software has an important role to play in determining what to perform where, and when, on such heterogeneous systems. Taking an emerging class of PIM ...
MeNDA: a near-memory multi-way merge solution for sparse transposition and dataflows
Near-memory processing has been extensively studied to optimize memory intensive workloads. However, none of the proposed designs address sparse matrix transposition, an important building block in sparse linear algebra applications. Prior work shows ...
CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process
Today, reconfigurable spatial architectures (RSAs) have sprung up as accelerators for compute- and data-intensive domains because they deliver energy and area efficiency close to ASICs and still retain sufficient programmability to keep the development ...
FFCCD: fence-free crash-consistent concurrent defragmentation for persistent memory
Persistent Memory (PM) is increasingly supplementing or substituting DRAM as main memory. Prior work have focused on reusability and memory leaks of persistent memory but have not addressed a problem amplified by persistence, persistent memory ...
LightPC: hardware and software co-design for energy-efficient full system persistence
We propose LightPC, a lightweight persistence-centric platform to make the system robust against power loss. LightPC consists of hardware and software subsystems, each being referred to as open-channel PMEM (OC-PMEM) and persistence-centric OS (PecOS). ...
ASAP: architecture support for asynchronous persistence
Supporting atomic durability of updates for persistent memories is typically achieved with Write-Ahead Logging (WAL). WAL flushes log entries to persistent memory before making the actual data persistent to ensure that a consistent state can be ...
Sibyl: adaptive and extensible data placement in hybrid storage systems using online reinforcement learning
- Gagandeep Singh,
- Rakesh Nadig,
- Jisung Park,
- Rahul Bera,
- Nastaran Hajinazar,
- David Novo,
- Juan Gómez-Luna,
- Sander Stuijk,
- Henk Corporaal,
- Onur Mutlu
Hybrid storage systems (HSS) use multiple different storage devices to provide high and scalable storage capacity at high performance. Data placement across different devices is critical to maximize the benefits of such a hybrid system. Recent research ...
A synthesis framework for stitching surface code with superconducting quantum devices
Quantum error correction (QEC) is the central building block of fault-tolerant quantum computation but the design of QEC codes may not always match the underlying hardware. To tackle the discrepancy between the quantum hardware and QEC codes, we propose ...
2QAN: a quantum compiler for 2-local qubit hamiltonian simulation algorithms
Simulating quantum systems is one of the most important potential applications of quantum computers. The high-level circuit defining the simulation needs to be compiled into one that complies with hardware limitations such as qubit architecture (...
XQsim: modeling cross-technology control processors for 10+K qubit quantum computers
- Ilkwon Byun,
- Junpyo Kim,
- Dongmoon Min,
- Ikki Nagaoka,
- Kosuke Fukumitsu,
- Iori Ishikawa,
- Teruo Tanimoto,
- Masamitsu Tanaka,
- Koji Inoue,
- Jangwoo Kim
10+K qubit quantum computer is essential to achieve a true sense of quantum supremacy. With the recent effort towards the large-scale quantum computer, architects have revealed various scalability issues including the constraints in a quantum control ...
Geyser: a compilation framework for quantum computing with neutral atoms
Compared to widely-used superconducting qubits, neutral-atom quantum computing technology promises potentially better scalability and flexible arrangement of qubits to allow higher operation parallelism and more relaxed cooling requirements. The high ...
X-cache: a modular architecture for domain-specific caches
With Dennard scaling ending, architects are turning to domain-specific accelerators (DSAs). State-of-the-art DSAs work with sparse data [37] and indirectly-indexed data structures [18, 30]. They introduce non-affine and dynamic memory accesses [7, 35], ...
Register file prefetching
The memory wall continues to limit the performance of modern out-of-order (OOO) processors, despite the expensive provisioning of large multi-level caches and advancements in memory prefetching. In this paper, we put forth an important observation that ...
GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs
Analytical models can greatly help computer architects perform orders of magnitude faster early-stage design space exploration than using cycle-level simulators. To facilitate rapid design space exploration for graphics processing units (GPUs), prior ...