Keyword: prefetching : Search

research-article

Open Access

JUST ACCEPTED

DTAP: Accelerating Strongly-Typed Programs with Data Type-Aware Hardware Prefetching

ACM Transactions on Architecture and Code Optimization (TACO), Just Accepted https://doi.org/10.1145/3701994

Queries on linked data structures, such as trees and graphs, often suffer from frequent cache misses and significant performance loss due to dependent and random pointer-chasing memory accesses. In this paper, we propose a software-hardware co-designed ...

research-article

DLHT: A Non-blocking Resizable Hashtable with Fast Deletes and Memory-awareness

HPDC '24: Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed ComputingPages 186–199https://doi.org/10.1145/3625549.3658682

This paper presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing ...

research-article

FetchBench: Systematic Identification and Characterization of Proprietary Prefetchers

CCS '23: Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications SecurityPages 975–989https://doi.org/10.1145/3576915.3623124

Prefetchers speculatively fetch memory using predictions on future memory use by applications. Different CPUs may use different prefetcher types, and two implementations of the same prefetcher can differ in details of their characteristics, leading to ...

research-article

Treelet Prefetching For Ray Tracing

MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 742–755https://doi.org/10.1145/3613424.3614288

Ray tracing is traditionally only used in offline rendering to produce images of high fidelity because it is computationally expensive. Recent Graphics Processing Units (GPUs) have included dedicated accelerators to bring ray tracing to real-time ...

research-article

Decoupled Vector Runahead

MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 17–31https://doi.org/10.1145/3613424.3614255

We present Decoupled Vector Runahead (DVR), an in-core prefetching technique, executing separately to the main application thread, that exploits massive amounts of memory-level parallelism to improve the performance of applications featuring indirect ...

research-article

Public Access

GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching

HPDC '23: Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed ComputingPages 73–85https://doi.org/10.1145/3588195.3592987

Checkpointing is an I/O intensive operation increasingly used by High-Performance Computing (HPC) applications to revisit previous intermediate datasets at scale. Unlike the case of resilience, where only the last checkpoint is needed for application ...

research-article

Open Access

Towards Reproducible Execution of Closed-Source Applications from Internet Archives

ACM REP '23: Proceedings of the 2023 ACM Conference on Reproducibility and ReplicabilityPages 15–26https://doi.org/10.1145/3589806.3600035

Olive enables execution of closed-source applications decades after their creation. With appropriate authentication and authorization, anyone on the Internet can execute any archived application with no more effort than a mouse click. User experience is ...

research-article

Open Access

Prefetching Using Principles of Hippocampal-Neocortical Interaction

HOTOS '23: Proceedings of the 19th Workshop on Hot Topics in Operating SystemsPages 53–60https://doi.org/10.1145/3593856.3595901

Memory prefetching improves performance across many systems layers. However, achieving high prefetch accuracy with low overhead is challenging, as memory hierarchies and application memory access patterns become more complicated. Furthermore, a ...

research-article

Public Access

Optimizing CPU Performance for Recommendation Systems At-Scale

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 77, Pages 1–15https://doi.org/10.1145/3579371.3589112

Deep Learning Recommendation Models (DLRMs) are very popular in personalized recommendation systems and are a major contributor to the data-center AI cycles. Due to the high computational and memory bandwidth needs of DLRMs, specifically the embedding ...

research-article

Adaptive Selection and Clustering of Partial Reconfiguration Modules for Modern FPGA Design Flow

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 16, Issue 2Article No.: 27, Pages 1–24https://doi.org/10.1145/3567427

Dynamic Partially Reconfiguration (DPR) on FPGA has attracted significant research interest in recent years since it provides benefits such as reduced area and flexible functionality. However, due to the lack of supporting synthesis tools in the current ...

research-article

Spidermine: Low Overhead User-Level Prefetching

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingPages 1332–1341https://doi.org/10.1145/3555776.3577754

Spidermine monitors the rate at which read requests are issued by an application, and thus detects bursts of disk reads. It then determines an address at which to insert a breakpoint into the application code or a library before each burst, and logs each ...

research-article

ReSemble: reinforced ensemble framework for data prefetching

SC '22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and AnalysisArticle No.: 81, Pages 1–14

Data prefetching hides memory latency by predicting and loading necessary data into cache beforehand. Most prefetchers in the literature are efficient for specific memory address patterns thereby restricting their utility to specialized applications-...

research-article

Open Access

Bandwidth-Efficient Multi-video Prefetching for Short Video Streaming

MM '22: Proceedings of the 30th ACM International Conference on MultimediaPages 7084–7088https://doi.org/10.1145/3503161.3551584

Applications that allow sharing of user-created short videos exploded in popularity in recent years. A typical short video application allows a user to swipe away the current video being watched and start watching the next video in a video queue. Such ...

research-article

Page Size Aware Cache Prefetching

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 956–974https://doi.org/10.1109/MICRO56248.2022.00070

The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system performance due to the disparity between processor and memory speeds. Prefetching ...

research-article

Open Access

Fine-grained address segmentation for attention-based variable-degree prefetching

CF '22: Proceedings of the 19th ACM International Conference on Computing FrontiersPages 103–112https://doi.org/10.1145/3528416.3530236

Machine learning algorithms have shown potential to improve prefetching performance by accurately predicting future memory accesses. Existing approaches are based on the modeling of text prediction, considering prefetching as a classification problem ...

research-article

Open Access

CRISP: critical slice prefetching

ASPLOS '22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating SystemsPages 300–313https://doi.org/10.1145/3503222.3507745

The high access latency of DRAM continues to be a performance challenge for contemporary microprocessor systems. Prefetching is a well-established technique to address this problem, however, existing implemented designs fail to provide any performance ...

research-article

Pattern-Based Prefetching with Adaptive Cache Management Inside of Solid-State Drives

ACM Transactions on Storage (TOS), Volume 18, Issue 1Article No.: 7, Pages 1–25https://doi.org/10.1145/3474393

This article proposes a pattern-based prefetching scheme with the support of adaptive cache management, at the flash translation layer of solid-state drives (SSDs). It works inside of SSDs and has features of OS dependence and uses transparency. ...

research-article

Public Access

Post-Fabrication Microarchitecture

MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 1270–1281https://doi.org/10.1145/3466752.3480119

Microarchitectural enhancements that improve performance generally, across many workloads, are favored in superscalar processor design. Targeting general performance is necessary but it also constrains some microarchitecture innovation. We explore ...

research-article

MAPCP: Memory Access Pattern Classifying Prefetcher

MEMSYS '21: Proceedings of the International Symposium on Memory SystemsArticle No.: 8, Pages 1–12https://doi.org/10.1145/3488423.3519328

Prefetching is a technique used to improve system performance by bringing data or instructions in the cache before it is demanded by the core. Several prefetching techniques have been proposed, in both hardware and software, to predict the data to be ...

research-article

Open Access

cDLRM: Look Ahead Caching for Scalable Training of Recommendation Models

RecSys '21: Proceedings of the 15th ACM Conference on Recommender SystemsPages 263–272https://doi.org/10.1145/3460231.3474246

Deep learning recommendation models (DLRMs) are typically composed of two sets of parameters: large embedding tables to handle sparse categorical inputs, and neural networks such as multi-layer perceptrons (MLPs) to handle dense non-categorical inputs. ...

Applied Filters

People

Names

Institutions

Authors

Reviewers

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Upcoming Conferences