skip to main content
10.1007/978-3-031-29927-8_35guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

A Profiling-Based Approach to Cache Partitioning of Program Data

Published: 08 April 2023 Publication History

Abstract

Cache efficiency is important to avoid unnecessary data transfers and to keep processors active. Cache partitioning, a technique to virtually divide a cache into multiple partitions, has become available in recent hardware. Cache partitioning can improve efficiency by isolating data with high temporal locality to avoid its early eviction before reuse. However, deciding on the partitioning is challenging, because it depends on the locality of reference. To facilitate the decision-making, we propose a profiling-based approach that measures locality, providing knowledge for cache partitioning without requiring manual code analysis. We present a profiling tool and confirm its benefits through experiments on Fujitsu’s A64FX processor, which supports the cache partitioning mechanism called sector cache. Our results show ways to optimize program codes to improve cache efficiency.

References

[1]
Alappat C et al. Execution-Cache-Memory modeling and performance tuning of sparse matrix-vector multiplication and Lattice quantum chromodynamics on A64FX Concurr. Comput.: Pract. Experience 2022 34 20
[2]
Bailey, D.H., et al.: The NAS parallel benchmarks-summary and preliminary results. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, pp. 158–165. ACM (1991).
[3]
Belady LA A study of replacement algorithms for a virtual-storage computer IBM Syst. J. 1966 5 2 78-101
[4]
Beyls, K., D’Hollander, E.: Reuse distance as a metric for cache behavior. In: Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems, pp. 617–622 (2001)
[5]
El-Sayed, N., et al.: KPart: a hybrid cache partitioning-sharing technique for commodity multicores. In: 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 104–117 (2018).
[6]
Fujitsu Limited: A64FX Microarchitecture Manual, version 1.5 edn. (2021). https://github.com/fujitsu/A64FX/blob/master/doc/
[7]
Intel Corporation: Improving real-time performance by utilizing cache allocation technology. Intel Corporation (2015)
[8]
Jiang Y, Zhang EZ, Tian K, and Shen X Gupta R Is reuse distance applicable to data locality analysis on chip multiprocessors? Compiler Construction 2010 Heidelberg Springer 264-282
[9]
Kim YH et al. Implementing stack simulation for highly-associative memories SIGMETRICS Perform. Eval. Rev. 1991 19 1 212-213
[10]
Kumar, S., Singh, P.K.: An overview of modern cache memory and performance analysis of replacement policies. In: 2016 IEEE International Conference on Engineering and Technology, pp. 210–214 (2016).
[11]
Lu, Q., Lin, J., et al.: Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning. In: 2009 18th International Conference on Parallel Architectures and Compilation Techniques, pp. 246–257 (2009).
[12]
Löff, J., et al.: The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures. Future Gener. Comput. Syst. 125(C), 743–757 (2021).
[13]
Mellor-Crummey JM and Scott ML Synchronization without contention SIGPLAN Not. 1991 26 4 269-278
[14]
Mittal, S.: A survey of techniques for cache partitioning in multicore processors. ACM Comput. Surv. 50(2) (2017).
[15]
Mucci, P.J., Browne, S., et al.: PAPI: a portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference, vol. 710. Citeseer (1999)
[16]
Perarnau, S., Sato, M.: Toward automated cache partitioning for the K computer. IPSJ SIG-HPC (2012)
[17]
Sabarimuthu JM and Venkatesh T Analytical miss rate calculation of L2 cache from the RD profile of L1 cache IEEE Trans. Comput. 2017 67 1 9-15
[18]
Sasongko, M.A., Chabbi, M., et al.: ReuseTracker: fast yet accurate multicore reuse distance analyzer. ACM Trans. Archit. Code Optim. 19(1) (2021).
[19]
Schuff, D.L., Kulkarni, M., Pai, V.S.: Accelerating multicore reuse distance analysis with sampling and parallelization. In: Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, pp. 53–64 (2010).
[20]
Schuff, D.L., Parsons, B.S., Pai, V.S.: Multicore-aware reuse distance analysis. In: 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp. 1–8 (2010).
[21]
Wang, Q., Liu, X., Chabbi, M.: Featherlight reuse-distance measurement. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 440–453. IEEE (2019).
[22]
Wu, M.J., Yeung, D.: Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis. In: Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, pp. 2–11 (2012).
[23]
Yoshida T, Hondo M, Kan R, and Sugizaki G SPARC64 VIIIfx: CPU for the K computer Fujitsu Sci. Tech. J 2012 48 3 274-279
[24]
Zhong Y, Dropsho SG, et al. Miss rate prediction across program inputs and cache configurations IEEE Trans. Comput. 2007 56 3 328-343

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Parallel and Distributed Computing, Applications and Technologies: 23rd International Conference, PDCAT 2022, Sendai, Japan, December 7–9, 2022, Proceedings
Dec 2022
525 pages
ISBN:978-3-031-29926-1
DOI:10.1007/978-3-031-29927-8

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 April 2023

Author Tags

  1. Cache partitioning
  2. Reuse distance
  3. A64FX
  4. Sector cache

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media