Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024JUST ACCEPTED
DTAP: Accelerating Strongly-Typed Programs with Data Type-Aware Hardware Prefetching
- Yingshuai Dong,
- Chencheng Ye,
- Haikun Liu,
- Liting Tang,
- Xiaofei Liao,
- Hai Jin,
- Cheng Chen,
- Yanjiang Li,
- Yi Wang
ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3701994Queries on linked data structures, such as trees and graphs, often suffer from frequent cache misses and significant performance loss due to dependent and random pointer-chasing memory accesses. In this paper, we propose a software-hardware co-designed ...
- research-articleAugust 2024
DLHT: A Non-blocking Resizable Hashtable with Fast Deletes and Memory-awareness
HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed ComputingPages 186–199https://doi.org/10.1145/3625549.3658682This paper presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing ...
- research-articleNovember 2023
FetchBench: Systematic Identification and Characterization of Proprietary Prefetchers
- Till Schlüter,
- Amit Choudhari,
- Lorenz Hetterich,
- Leon Trampert,
- Hamed Nemati,
- Ahmad Ibrahim,
- Michael Schwarz,
- Christian Rossow,
- Nils Ole Tippenhauer
CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications SecurityPages 975–989https://doi.org/10.1145/3576915.3623124Prefetchers speculatively fetch memory using predictions on future memory use by applications. Different CPUs may use different prefetcher types, and two implementations of the same prefetcher can differ in details of their characteristics, leading to ...
- research-articleDecember 2023
Treelet Prefetching For Ray Tracing
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 742–755https://doi.org/10.1145/3613424.3614288Ray tracing is traditionally only used in offline rendering to produce images of high fidelity because it is computationally expensive. Recent Graphics Processing Units (GPUs) have included dedicated accelerators to bring ray tracing to real-time ...
- research-articleDecember 2023
Decoupled Vector Runahead
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 17–31https://doi.org/10.1145/3613424.3614255We present Decoupled Vector Runahead (DVR), an in-core prefetching technique, executing separately to the main application thread, that exploits massive amounts of memory-level parallelism to improve the performance of applications featuring indirect ...
-
- research-articleAugust 2023
GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching
- Avinash Maurya,
- M. Mustafa Rafique,
- Thierry Tonellot,
- Hussain J. AlSalem,
- Franck Cappello,
- Bogdan Nicolae
HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed ComputingPages 73–85https://doi.org/10.1145/3588195.3592987Checkpointing is an I/O intensive operation increasingly used by High-Performance Computing (HPC) applications to revisit previous intermediate datasets at scale. Unlike the case of resilience, where only the last checkpoint is needed for application ...
- research-articleJune 2023
Towards Reproducible Execution of Closed-Source Applications from Internet Archives
ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and ReplicabilityPages 15–26https://doi.org/10.1145/3589806.3600035Olive enables execution of closed-source applications decades after their creation. With appropriate authentication and authorization, anyone on the Internet can execute any archived application with no more effort than a mouse click. User experience is ...
- research-articleJune 2023
Prefetching Using Principles of Hippocampal-Neocortical Interaction
- Michael Wu,
- Ketaki Joshi,
- Andrew Sheinberg,
- Guilherme Cox,
- Anurag Khandelwal,
- Raghavendra Pradyumna Pothukuchi,
- Abhishek Bhattacharjee
HOTOS '23: Proceedings of the 19th Workshop on Hot Topics in Operating SystemsPages 53–60https://doi.org/10.1145/3593856.3595901Memory prefetching improves performance across many systems layers. However, achieving high prefetch accuracy with low overhead is challenging, as memory hierarchies and application memory access patterns become more complicated. Furthermore, a ...
- research-articleJune 2023
Optimizing CPU Performance for Recommendation Systems At-Scale
- Rishabh Jain,
- Scott Cheng,
- Vishwas Kalagi,
- Vrushabh Sanghavi,
- Samvit Kaul,
- Meena Arunachalam,
- Kiwan Maeng,
- Adwait Jog,
- Anand Sivasubramaniam,
- Mahmut Taylan Kandemir,
- Chita R. Das
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 77, Pages 1–15https://doi.org/10.1145/3579371.3589112Deep Learning Recommendation Models (DLRMs) are very popular in personalized recommendation systems and are a major contributor to the data-center AI cycles. Due to the high computational and memory bandwidth needs of DLRMs, specifically the embedding ...
- research-articleApril 2023
Adaptive Selection and Clustering of Partial Reconfiguration Modules for Modern FPGA Design Flow
ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 16, Issue 2Article No.: 27, Pages 1–24https://doi.org/10.1145/3567427Dynamic Partially Reconfiguration (DPR) on FPGA has attracted significant research interest in recent years since it provides benefits such as reduced area and flexible functionality. However, due to the lack of supporting synthesis tools in the current ...
- research-articleJune 2023
Spidermine: Low Overhead User-Level Prefetching
SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingPages 1332–1341https://doi.org/10.1145/3555776.3577754Spidermine monitors the rate at which read requests are issued by an application, and thus detects bursts of disk reads. It then determines an address at which to insert a breakpoint into the application code or a library before each burst, and logs each ...
ReSemble: reinforced ensemble framework for data prefetching
SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 81, Pages 1–14Data prefetching hides memory latency by predicting and loading necessary data into cache beforehand. Most prefetchers in the literature are efficient for specific memory address patterns thereby restricting their utility to specialized applications-...
- research-articleOctober 2022
Bandwidth-Efficient Multi-video Prefetching for Short Video Streaming
- Xutong Zuo,
- Yishu Li,
- Mohan Xu,
- Wei Tsang Ooi,
- Jiangchuan Liu,
- Junchen Jiang,
- Xinggong Zhang,
- Kai Zheng,
- Yong Cui
MM '22: Proceedings of the 30th ACM International Conference on MultimediaPages 7084–7088https://doi.org/10.1145/3503161.3551584Applications that allow sharing of user-created short videos exploded in popularity in recent years. A typical short video application allows a user to swipe away the current video being watched and start watching the next video in a video queue. Such ...
- research-articleDecember 2023
Page Size Aware Cache Prefetching
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 956–974https://doi.org/10.1109/MICRO56248.2022.00070The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system performance due to the disparity between processor and memory speeds. Prefetching ...
- research-articleMay 2022
Fine-grained address segmentation for attention-based variable-degree prefetching
CF '22: Proceedings of the 19th ACM International Conference on Computing FrontiersPages 103–112https://doi.org/10.1145/3528416.3530236Machine learning algorithms have shown potential to improve prefetching performance by accurately predicting future memory accesses. Existing approaches are based on the modeling of text prediction, considering prefetching as a classification problem ...
- research-articleFebruary 2022
CRISP: critical slice prefetching
ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 300–313https://doi.org/10.1145/3503222.3507745The high access latency of DRAM continues to be a performance challenge for contemporary microprocessor systems. Prefetching is a well-established technique to address this problem, however, existing implemented designs fail to provide any performance ...
- research-articleJanuary 2022
Pattern-Based Prefetching with Adaptive Cache Management Inside of Solid-State Drives
ACM Transactions on Storage (TOS), Volume 18, Issue 1Article No.: 7, Pages 1–25https://doi.org/10.1145/3474393This article proposes a pattern-based prefetching scheme with the support of adaptive cache management, at the flash translation layer of solid-state drives (SSDs). It works inside of SSDs and has features of OS dependence and uses transparency. ...
- research-articleOctober 2021
Post-Fabrication Microarchitecture
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1270–1281https://doi.org/10.1145/3466752.3480119Microarchitectural enhancements that improve performance generally, across many workloads, are favored in superscalar processor design. Targeting general performance is necessary but it also constrains some microarchitecture innovation. We explore ...
- research-articleMay 2022
MAPCP: Memory Access Pattern Classifying Prefetcher
MEMSYS '21: Proceedings of the International Symposium on Memory SystemsArticle No.: 8, Pages 1–12https://doi.org/10.1145/3488423.3519328Prefetching is a technique used to improve system performance by bringing data or instructions in the cache before it is demanded by the core. Several prefetching techniques have been proposed, in both hardware and software, to predict the data to be ...
- research-articleSeptember 2021
cDLRM: Look Ahead Caching for Scalable Training of Recommendation Models
RecSys '21: Proceedings of the 15th ACM Conference on Recommender SystemsPages 263–272https://doi.org/10.1145/3460231.3474246Deep learning recommendation models (DLRMs) are typically composed of two sets of parameters: large embedding tables to handle sparse categorical inputs, and neural networks such as multi-layer perceptrons (MLPs) to handle dense non-categorical inputs. ...