research-article

Page Size Aware Cache Prefetching

Authors:

Georgios Vavouliotis,

Marc CasasAuthors Info & Claims

MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 956 - 974

https://doi.org/10.1109/MICRO56248.2022.00070

Published: 18 December 2023 Publication History

Get Access

Abstract

The increase in working set sizes of contemporary applications outpaces the growth in cache sizes, resulting in frequent main memory accesses that deteriorate system performance due to the disparity between processor and memory speeds. Prefetching data blocks into the cache hierarchy ahead of demand accesses has proven successful at attenuating this bottleneck. However, spatial cache prefetchers operating in the physical address space leave significant performance on the table by limiting their pattern detection within 4KB physical page boundaries when modern systems use page sizes larger than 4KB to mitigate the address translation overheads.

This paper exploits the high usage of large pages in modern systems to increase the effectiveness of spatial cache prefetching. We design and propose the Page-size Propagation Module (PPM), a μarchitectural scheme that propagates the page size information to the lower-level cache prefetchers, enabling safe prefetching beyond 4KB physical page boundaries when the accessed blocks reside in large pages, at the cost of augmenting the first-level caches' Miss Status Holding Register (MSHR) entries with one additional bit. PPM is compatible with any cache prefetcher without implying design modifications. We capitalize on PPM's benefits by designing a module that consists of two page size aware prefetchers that inherently use different page sizes to drive prefetching. The composite module uses adaptive logic to dynamically enable the most appropriate page size aware prefetcher. Finally, we show that the proposed designs are transparent to which cache prefetcher is used.

We apply the proposed page size exploitation techniques to four state-of-the-art spatial cache prefetchers. Our evaluation shows that our proposals improve single-core geomean performance by up to 8.1% (2.1% at minimum) over the original implementation of the considered prefetchers, across 80 memory-intensive workloads. In multi-core contexts, we report geomean speedups up to 7.7% across different cache prefetchers and core configurations.

References

[1]

W. A. Wulf and S. A. McKee, "Hitting the memory wall: Implications of the obvious," SIGARCH Computer Architecture News, vol. 23, 1995.

Abstract

References

Index Terms

Recommendations

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

Increasing hardware data prefetching performance using the second-level cache

Filtering Translation Bandwidth with Virtual Caching

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations