Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2024
SPID-Join: A Skew-resistant Processing-in-DIMM Join Algorithm Exploiting the Bank- and Rank-level Parallelisms of DIMMs
- Suhyun Lee,
- Chaemin Lim,
- Jinwoo Choi,
- Heelim Choi,
- Chan Lee,
- Yongjun Park,
- Kwanghyun Park,
- Hanjun Kim,
- Youngsok Kim
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 6Article No.: 251, Pages 1–27https://doi.org/10.1145/3698827Recent advances in Dual In-line Memory Modules (DIMMs) allow DIMMs to support Processing-In-DIMM (PID) by placing In-DIMM Processors (IDPs) near their memory banks. Prior studies have shown that in-memory joins can benefit from PID by offloading their ...
- research-articleDecember 2024
PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures
- Christina Giannoula,
- Peiming Yang,
- Ivan Fernandez,
- Jiacheng Yang,
- Sankeerth Durvasula,
- Yu Xin Li,
- Mohammad Sadrosadati,
- Juan Gomez Luna,
- Onur Mutlu,
- Gennady Pekhimenko
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 3Article No.: 43, Pages 1–36https://doi.org/10.1145/3700434Graph Neural Networks (GNNs) are emerging models to analyze graph-structure data. GNN execution involves both compute-intensive and memory-intensive kernels. The latter kernels dominate execution time, because they are significantly bottlenecked by data ...
- research-articleNovember 2024JUST ACCEPTED
An Efficient ReRAM-based Accelerator for Asynchronous Iterative Graph Processing
- Jin Zhao,
- Yu Zhang,
- Donghao He,
- Qikun Li,
- Weihang Yin,
- Hui Yu,
- Hao Qi,
- Xiaofei Liao,
- Hai Jin,
- Haikun Liu,
- Linchen Yu,
- Zhang Zhan
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3689335Graph processing has become a central concern for many real-world applications and is well-known for its low compute-to-communication ratios and poor data locality. By integrating computing logic into memory, resistive random access memory (ReRAM) tackles ...
- research-articleNovember 2024
OPTIMA: Design-Space Exploration of Discharge-Based In-SRAM Computing: Quantifying Energy-Accuracy Trade-offs
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 331, Pages 1–6https://doi.org/10.1145/3649329.3661852In-SRAM computing promises energy efficiency, but circuit nonlinearities and PVT variations pose major challenges in designing robust accelerators. To address this, we introduce OPTIMA, a modeling framework that aids in analyzing bitline discharge and ...
- research-articleNovember 2024
Improving the Efficiency of In-Memory-Computing Macro with a Hybrid Analog-Digital Computing Mode for Lossless Neural Network Inference
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 313, Pages 1–6https://doi.org/10.1145/3649329.3658472Analog in-memory-computing (IMC) is an attractive technique with a higher energy efficiency to process machine learning workloads. However, the analog computing scheme suffers from large interface circuit overhead. In this work, we propose a macro with a ...
-
- research-articleNovember 2024
ReCG: ReRAM-Accelerated Sparse Conjugate Gradient
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 269, Pages 1–6https://doi.org/10.1145/3649329.3656515Solving sparse linear systems is crucial in scientific computing. Sparse Conjugate Gradient (CG) is one of the most well-known iterative solvers with high efficiency and low storage requirements. However, the performance of sparse CG solvers implemented ...
- research-articleNovember 2024
A Combined Content Addressable Memory and In-Memory Processing Approach for k-Clique Counting Acceleration
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 257, Pages 1–6https://doi.org/10.1145/3649329.3656513k-Clique counting problem plays an important role in graph mining which has seen a growing number of applications. However, current k-Clique counting accelerators cannot meet the performance requirement mainly because they struggle with high data ...
- research-articleNovember 2024
Accelerating Regular Path Queries over Graph Database with Processing-in-Memory
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation ConferenceArticle No.: 89, Pages 1–6https://doi.org/10.1145/3649329.3656235Regular path queries (RPQs) in graph databases are bottlenecked by the memory wall. Emerging processing-in-memory (PIM) technologies offer a promising solution to dispatch and execute path matching tasks in parallel within PIM modules. We present ...
- research-articleJune 2024
Load Balanced PIM-Based Graph Processing
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 4Article No.: 61, Pages 1–22https://doi.org/10.1145/3659951Graph processing is widely used for many modern applications, such as social networks, recommendation systems, and knowledge graphs. However, processing large-scale graphs on traditional Von Neumann architectures is challenging due to the irregular graph ...
- abstractJune 2024
Scalability Limitations of Processing-in-Memory using Real System Evaluations
- Gilbert Jonatan,
- Haeyoon Cho,
- Hyojun Son,
- Xiangyu Wu,
- Neal Livesay,
- Evelio Mora,
- Kaustubh Shivdikar,
- José L. Abellán,
- Ajay Joshi,
- David Kaeli,
- John Kim
SIGMETRICS/PERFORMANCE '24: Abstracts of the 2024 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer SystemsPages 63–64https://doi.org/10.1145/3652963.3655079Processing-in-memory (PIM) has been widely explored in academia and industry to accelerate numerous workloads. By reducing the data movement and increasing parallelism, PIM offers great performance and energy efficiency. A large amount of cores or nodes ...
Also Published in:
ACM SIGMETRICS Performance Evaluation Review: Volume 52 Issue 1 - research-articleApril 2024
IANUS: Integrated Accelerator based on NPU-PIM Unified Memory System
- Minseok Seo,
- Xuan Truong Nguyen,
- Seok Joong Hwang,
- Yongkee Kwon,
- Guhyun Kim,
- Chanwook Park,
- Ilkon Kim,
- Jaehan Park,
- Jeongbin Kim,
- Woojae Shin,
- Jongsoon Won,
- Haerang Choi,
- Kyuyoung Kim,
- Daehan Kwon,
- Chunseok Jeong,
- Sangheon Lee,
- Yongseok Choi,
- Wooseok Byun,
- Seungcheol Baek,
- Hyuk-Jae Lee,
- John Kim
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3Pages 545–560https://doi.org/10.1145/3620666.3651324Accelerating end-to-end inference of transformer-based large language models (LLMs) is a critical component of AI services in datacenters. However, the diverse compute characteristics of LLMs' end-to-end inference present challenges as previously ...
- research-articleApril 2024
PIM-STM: Software Transactional Memory for Processing-In-Memory Systems
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 897–911https://doi.org/10.1145/3620665.3640428Processing-In-Memory (PIM) is a novel approach that augments existing DRAM memory chips with lightweight logic. By allowing to offload computations to the PIM system, this architecture allows for circumventing the data-bottleneck problem that affects ...
- research-articleApril 2024
AttAcc! Unleashing the Power of PIM for Batched Transformer-based Generative Model Inference
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 103–119https://doi.org/10.1145/3620665.3640422The Transformer-based generative model (TbGM), comprising summarization (Sum) and generation (Gen) stages, has demonstrated unprecedented generative performance across a wide range of applications. However, it also demands immense amounts of compute and ...
- research-articleFebruary 2024
Scalability Limitations of Processing-in-Memory using Real System Evaluations
- Gilbert Jonatan,
- Haeyoon Cho,
- Hyojun Son,
- Xiangyu Wu,
- Neal Livesay,
- Evelio Mora,
- Kaustubh Shivdikar,
- José L. Abellán,
- Ajay Joshi,
- David Kaeli,
- John Kim
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 1Article No.: 5, Pages 1–28https://doi.org/10.1145/3639046Processing-in-memory (PIM), where the compute is moved closer to the memory or the data, has been widely explored to accelerate emerging workloads. Recently, different PIM-based systems have been announced by memory vendors to minimize data movement and ...
- research-articleFebruary 2024
Extension VM: Interleaved Data Layout in Vector Memory
ACM Transactions on Architecture and Code Optimization (TACO), Volume 21, Issue 1Article No.: 18, Pages 1–23https://doi.org/10.1145/3631528While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates ...
- research-articleDecember 2023
ARCHER: a ReRAM-based accelerator for compressed recommendation systems
Frontiers of Computer Science: Selected Publications from Chinese Universities (FCS), Volume 18, Issue 5https://doi.org/10.1007/s11704-023-3397-xAbstractModern recommendation systems are widely used in modern data centers. The random and sparse embedding lookup operations are the main performance bottleneck for processing recommendation systems on traditional platforms as they induce abundant data ...
- research-articleNovember 2023
ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear Solvers
SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 75, Pages 1–15https://doi.org/10.1145/3581784.3607077Resistive random access memory (ReRAM) is a promising technology that can perform low-cost and in-situ matrix-vector multiplication (MVM) in analog domain. Scientific computing requires high-precision floating-point (FP) processing. However, performing ...
- posterJuly 2023
PIM-tree: A Skew-resistant Index for Processing-in-Memory (Abstract)
HOPC '23: Proceedings of the 2023 ACM Workshop on Highlights of Parallel ComputingPages 13–14https://doi.org/10.1145/3597635.3598029Processing-in-memory (PIM) is an emerging technology to alleviate the high cost of data movement by pushing computation into/near memory modules. There is an inherent tension, however, between minimizing communication (data movement) and achieving load ...
- research-articleJune 2023
Design and Analysis of a Processing-in-DIMM Join Algorithm: A Case Study with UPMEM DIMMs
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 2Article No.: 113, Pages 1–27https://doi.org/10.1145/3589258Modern dual in-line memory modules (DIMMs) support processing-in-memory (PIM) by implementing in-DIMM processors (IDPs) located near memory banks. PIM can greatly accelerate in-memory join, whose performance is frequently bounded by main-memory accesses, ...
- research-articleJune 2023
Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues
- Dibei Chen,
- Tairan Zhang,
- Yi Huang,
- Jianfeng Zhu,
- Yang Liu,
- Pengfei Gou,
- Chunyang Feng,
- Binghua Li,
- Shaojun Wei,
- Leibo Liu
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 11, Pages 1–14https://doi.org/10.1145/3579371.3589046Modern out-of-order processors call for more aggressive scheduling techniques such as priority scheduling and out-of-order commit to make use of increasing core resources. Since these approaches prioritize the issue or commit of certain instructions, ...