skip to main content
research-article

High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems

Published: 11 September 2024 Publication History

Abstract

We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared with a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core’s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC-OPT, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC-OPT is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC-OPT shows average-case performance speedups of 1.89×, 3.36×, and 6.24× compared with the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared with the original ZCLLC that does not have any optimizations, ZCLLC-OPT shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared with ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.

References

[1]
Benny Akesson, Mitra Nasri, Geoffrey Nelissen, Sebastian Altmeyer, and Robert I. Davis. 2021. A comprehensive survey of industry practice in real-time systems. Real-Time Systems 58, 3 (2021), 358–398.
[2]
Altmeyer, Sebastian and Douma, Roeland and Lunniss, Will and Davis, Robert I.2014. OUTSTANDING PAPER: Evaluation of cache partitioning for hard real-time systems. In 2014 26th Euromicro Conference on Real-Time Systems, 15–26. DOI:
[3]
Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, and Vaishnav Srinivas. 2017. CACTI 7: New tools for interconnect exploration in innovative off-chip memories. ACM Transactions on Architecture and Code Optimization (TACO) 14, 2 (2017), 1–25.
[4]
Ayoosh Bansal, Jayati Singh, Yifan Hao, Jen-Yang Wen, Renato Mancuso, and Marco Caccamo. 2020. Reconciling predictability and coherent caching. In 2020 9th Mediterranean Conference on Embedded Computing (MECO). 1–6. DOI:
[5]
Brian N. Bershad, Dennis Lee, Theodore H. Romer, and J. Bradley Chen. 1994. Avoiding conflict misses dynamically in large direct-mapped caches. ACM, 158–170. DOI:
[6]
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 simulator. 39, 2 (aug2011), 1–7. DOI:
[7]
Jon Perez Cerrolaza, Roman Obermaisser, Jaume Abella, Francisco J Cazorla, Kim Grüttner, Irune Agirre, Hamidreza Ahmadian, and Imanol Allende. 2020. Multi-core devices for safety-critical systems: A survey. ACM Computing Surveys (CSUR) 53, 4 (2020), 1–38.
[8]
Mainak Chaudhuri. 2021. Zero directory eviction victim: Unbounded coherence directory and core cache isolation. In 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 277–290. DOI:
[9]
Mainak Chaudhuri. 2021. Zero inclusion victim: Isolating core caches from inclusive last-level cache evictions. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 71–84. DOI:
[10]
Chisholm, Micaiah and Kim, Namhoon and Ward, Bryan C. and Otterness, Nathan and Anderson, James H. and Smith, F. Donelson. 2016. Reconciling the tension between hardware isolation and data sharing in mixed-criticality, multicore systems. In 2016 IEEE Real-Time Systems Symposium (RTSS), 57–68. DOI:
[11]
Jongouk Choi, Jianping Zeng, Dongyoon Lee, Changwoo Min, and Changhee Jung. 2023. Write-light cache for energy harvesting systems. In Proceedings of the 50th Annual International Symposium on Computer Architecture. 1–13.
[12]
Pepijn de Langen and Ben Juurlink. 2009. Limiting the number of dirty cache lines. In 2009 Design, Automation & Test in Europe Conference & Exhibition. IEEE, 670–675.
[13]
El-Sayed, Nosayba and Mukkara, Anurag and Tsai, Po-An and Kasture, Harshad and Ma, Xiaosong and Sanchez, Daniel. 2018. KPart: A hybrid cache partitioning-sharing technique for commodity multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), 104–117. DOI:
[14]
Christian Ferdinand, Florian Martin, Reinhard Wilhelm, and Martin Alt. 1999. Cache behavior prediction by abstract interpretation. Science of Computer Programming 35, 2 (1999), 163–189.
[15]
Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, and Rodolfo Pellizzoni. 2015. A survey on cache management mechanisms for real-time embedded systems. ACM Computing Surveys (CSUR) 48, 2, Article 32 (nov2015), 36 pages. DOI:
[16]
Damien Hardy and Isabelle Puaut. 2008. WCET analysis of multi-level non-inclusive set-associative instruction caches. In 2008 Real-Time Systems Symposium. IEEE. DOI:
[17]
Damien Hardy and Isabelle Puaut. 2011. WCET analysis of instruction cache hierarchies. Journal of Systems Architecture 57, 7 (aug2011), 677–694. DOI:
[18]
Farouk Hebbache, Florian Brandner, Mathieu Jan, and Laurent Pautet. 2020. Work-conserving dynamic time-division multiplexing for multi-criticality systems. Real-Time Syst. 56, 2 (apr2020), 124–170. DOI:
[19]
Farouk Hebbache, Florian Brandner, Mathieu Jan, and Laurent Pautet. 2020. Work-conserving dynamic time-division multiplexing for multi-criticality systems. Real-Time Systems 56, 2 (2020), 124–170.
[20]
Farouk Hebbache, Mathieu Jan, Florian Brandner, and Laurent Pautet. 2018. Shedding the shackles of time-division multiplexing. In 2018 IEEE Real-Time Systems Symposium (RTSS). 456–468. DOI:
[21]
John L. Hennessy and David A. Patterson. 2017. Computer Architecture, Sixth Edition: A Quantitative Approach (6th ed.). Morgan Kaufmann Publishers Inc., San Francisco.
[22]
Salah Hessien and Mohamed Hassan. 2022. PISCOT: A pipelined split-transaction COTS-coherent bus for multi-core real-time systems. ACM Trans. Embed. Comput. Syst. 22, 1, Article 16 (oct2022), 27 pages. DOI:
[23]
Intel. 2015. Improving real-time performance by utilizing cache allocation technology. Intel Corporation (2015). https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/cache-allocation-technology-white-paper.pdf
[24]
Anirudh Mohan Kaushik, Mohamed Hassan, and Hiren Patel. 2020. Designing predictable cache coherence protocols for multi-core real-time systems. IEEE Trans. Comput. 70, 12 (2020), 1–1. DOI:
[25]
Anirudh Mohan Kaushik and Hiren Patel. 2021. A systematic approach to achieving tight worst-case latency and high-performance under predictable cache coherence. In 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS). 105–117. DOI:
[26]
Anirudh Mohan Kaushik and Hiren Patel. 2022. Automatic construction of predictable and high-performance cache coherence protocols for multicore real-time systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 10 (2022), 3318–3331. DOI:
[27]
Anirudh Mohan Kaushik, Paulos Tegegn, Zhuanhao Wu, and Hiren Patel. 2019. CARP: A data communication mechanism for multi-core mixed-criticality systems. In 2019 IEEE Real-Time Systems Symposium (RTSS). 419–432. DOI:
[28]
Hyoseung Kim and Ragunathan (Raj) Rajkumar. 2017. Predictable shared cache management for multi-core real-time virtualization. ACM Trans. Embed. Comput. Syst. 17, 1, Article 22 (dec2017), 27 pages. DOI:
[29]
Namhoon Kim, Bryan C. Ward, Micaiah Chisholm, Cheng-Yang Fu, James H. Anderson, and F. Donelson Smith. 2016. Attacking the one-out-of-m multicore problem by combining hardware management with mixed-criticality provisioning. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). 1–12. DOI:
[30]
Adam Kostrzewa, Selma Saidi, Leonardo Ecco, and Rolf Ernst. 2015. Flexible TDM-based resource management in on-chip networks. In Proceedings of the 23rd International Conference on Real Time and Networks Systems (Lille, France) (RTNS ’15). Association for Computing Machinery, New York, NY, 151–160. DOI:
[31]
An-Chow Lai and B. Falsafi. 2000. Selective, accurate, and timely self-invalidation using last-touch prediction. In Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201). 139–148. DOI:
[32]
H.H.S. Lee, G.S. Tyson, and M.K. Farrens. 2000. Eager writeback-a technique for improving bandwidth utilization. In Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000. 11–21. DOI:
[33]
Benjamin Lesage, Damien Hardy, and Isabelle Puaut. 2009. WCET analysis of multi-level set-associative data caches. In 9th International Workshop on Worst-Case Execution Time Analysis (WCET’09)(OpenAccess Series in Informatics (OASIcs), Vol. 10), Niklas Holsti (Ed.). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 1–12. DOI:also published in print by Austrian Computer Society (OCG) with ISBN 978-3-85403-252-6.
[34]
Yonghui Li, Benny Akesson, and Kees Goossens. 2016. Architecture and analysis of a dynamically-scheduled real-time memory controller. Real-Time Systems 52, 5 (2016), 675–729.
[35]
Vijay Nagarajan, Daniel J. Sorin, Mark D. Hill, and David A. Wood. 2020. A primer on memory consistency and cache coherence, second edition. Synthesis Lectures on Computer Architecture 15, 1 (Feb.2020), 1–294. DOI:
[36]
NXP Semiconductors. 2022. Ultra-Reliable MPC574XB/c/G Mcus for Automotive and Industrial Control and Gateway. Retrieved from https://www.nxp.com/
[37]
Bahareh Pourshirazi, Majed Valad Beigi, Zhichun Zhu, and Gokhan Memik. 2019. Writeback-aware LLC management for PCM-based main memory systems. ACM Transactions on Design Automation of Electronic Systems (TODAES) 24, 2 (2019), 1–19.
[38]
Renesas. 2022. RH850/C1M-AX. Retrieved from https://www.renesas.com/
[39]
Shahin Roozkhosh and Renato Mancuso. 2020. The potential of programmable logic in the middle: Cache bleaching. In 2020 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). 296–309. DOI:
[40]
Christos Sakalis, Carl Leonardsson, Stefanos Kaxiras, and Alberto Ros. 2016. Splash-3: A properly synchronized benchmark suite for contemporary research. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 101–111. DOI:
[41]
Mladen Slijepcevic, Leonidas Kosmidis, Jaume Abella, Eduardo Quiñones, and Francisco J. Cazorla. 2014. Time-analysable non-partitioned shared caches for real-time multicore systems. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6. DOI:
[42]
Shekhar Srikantaiah, Mahmut Kandemir, and Mary Jane Irwin. 2008. Adaptive set pinning: Managing shared caches in chip multiprocessors. SIGARCH Comput. Archit. News 36, 1 (mar2008), 135–144. DOI:
[43]
Sriram Srinivasan and William L Walker. 2018. Shadow tag memory to monitor state of cachelines at different cache level. US Patent 10,073,776.
[44]
Vivy Suhendra and Tulika Mitra. 2008. Exploring locking & partitioning for predictable shared caches on multi-cores. In Proceedings of the 45th Annual Design Automation Conference. 300–303.
[45]
Prathap Kumar Valsan, Heechul Yun, and Farzad Farshchi. 2016. Taming non-blocking caches to improve isolation in multicore real-time systems. In 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). 1–12. DOI:
[46]
Bryan C. Ward, Jonathan L. Herman, Christopher J. Kenna, and James H. Anderson. 2013. Making shared caches more predictable on multicore platforms. In 2013 25th Euromicro Conference on Real-Time Systems. IEEE, 157–167.
[47]
Zhuahao Wu, Anirudh Kaushik, and Hiren Patel. 2023. ZCLLC. Retrieved from https://github.com/zhuanhao-wu/gem5-zcllc
[48]
Zhuanhao Wu, Anirudh Kaushik, and Hiren Patel. 2023. ZeroCost-LLC: Shared LLCs at no cost to WCL. In 2023 IEEE 29th Real-Time and Embedded Technology and Applications Symposium (RTAS). 249–261. DOI:
[49]
Zhuanhao Wu and Hiren Patel. 2022. Predictable sharing of last-level cache partitions for multi-core safety-critical systems. In Proceedings of the 59th ACM/IEEE Design Automation Conference (San Francisco, California) (DAC ’22). Association for Computing Machinery, New York, NY, 1273–1278.
[50]
Zheng Pei Wu, Yogen Krish, and Rodolfo Pellizzoni. 2013. Worst case analysis of DRAM latency in multi-requestor systems. In 2013 IEEE 34th Real-Time Systems Symposium. 372–383. DOI:
[51]
Ying Ye, Richard West, Zhuoqun Cheng, and Ye Li. 2014. COLORIS: A dynamic cache partitioning system using page coloring. In 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT). 381–392. DOI:
[52]
Wei Zhang, Mingsong Lv, Wanli Chang, and Lei Ju. 2022. Precise and scalable shared cache contention analysis for WCET estimation. In Proceedings of Design Automation Conference (DAC). 1–6.
[53]
Zhenkai Zhang, Zhishan Guo, and Xenofon Koutsoukos. 2017. Handling write backs in multi-level cache analysis for WCET estimation. In Proceedings of the 25th International Conference on Real-Time Networks and Systems (RTNS ’17). Association for Computing Machinery, New York, NY, 208–217. DOI:
[54]
Li Zhao, Ravi Iyer, Srihari Makineni, Don Newell, and Liqun Cheng. 2010. NCID: A non-inclusive cache, inclusive directory architecture for flexible and efficient cache hierarchies. In Proceedings of the 7th ACM International Conference on Computing Frontiers (CF ’10). Association for Computing Machinery, New York, NY, 121–130.

Index Terms

  1. High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 23, Issue 6
      November 2024
      505 pages
      EISSN:1558-3465
      DOI:10.1145/3613645
      • Editor:
      • Tulika Mitra
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 11 September 2024
      Online AM: 08 August 2024
      Accepted: 12 July 2024
      Revised: 06 July 2024
      Received: 31 December 2023
      Published in TECS Volume 23, Issue 6

      Check for updates

      Author Tags

      1. Last-level cache
      2. inclusive cache
      3. safety-critical systems
      4. worst-case latency analysis
      5. back invalidation

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 157
        Total Downloads
      • Downloads (Last 12 months)157
      • Downloads (Last 6 weeks)42
      Reflects downloads up to 27 Dec 2024

      Other Metrics

      Citations

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media