research-article

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

Authors:

Anant Vithal Nori,

Jayesh Gaur,

Siddharth Rai,

Sreenivas Subramoney,

Hong WangAuthors Info & Claims

ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

Pages 96 - 109

https://doi.org/10.1109/ISCA.2018.00019

Published: 02 June 2018 Publication History

Get Access

Abstract

On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches are typically built as a multi-level cache hierarchy. One such popular hierarchy that has been adopted by modern microprocessors is the three level cache hierarchy. Building a three level cache hierarchy enables a low average hit latency since most requests are serviced from faster inner level caches. This has motivated recent microprocessors to deploy large level-2 (L2) caches that can help further reduce the average hit latency.

In this paper, we do a fundamental analysis of the popular three level cache hierarchy and understand its performance delivery using program criticality. Through our detailed analysis we show that the current trend of increasing L2 cache sizes to reduce average hit latency is, in fact, an inefficient design choice. We instead propose Criticality Aware Tiered Cache Hierarchy (CATCH) that utilizes an accurate detection of program criticality in hardware and using a novel set of inter-cache prefetchers ensures that on-die data accesses that lie on the critical path of execution are served at the latency of the fastest level-1 (L1) cache. The last level cache (LLC) serves the purpose of reducing slow memory accesses, thereby making the large L2 cache redundant for most applications. The area saved by eliminating the L2 cache can then be used to create more efficient processor configurations. Our simulation results show that CATCH outperforms the three level cache hierarchy with a large 1 MB L2 and exclusive LLC by an average of 8.4%, and a baseline with 256 KB L2 and inclusive LLC by 10.3%. We also show that CATCH enables a powerful framework to explore broad chip-level area, performance and power tradeoffs in cache hierarchy design. Supported by CATCH, we evaluate radical architecture directions such as eliminating the L2 altogether and show that such architectures can yield 4.5% performance gain over the baseline at nearly 30% lesser area or improve the performance by 7.3% at the same area while reducing energy consumption by 11%.

References

[1]

B. Fields, S. Rubin, and R. Bodk. Focusing processor policies via critical-path prediction, in ISCA '01

Abstract

References

Cited By

Index Terms

Recommendations

Increasing hardware data prefetching performance using the second-level cache

Page Size Aware Cache Prefetching

Modeling LRU cache with invalidation

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations