skip to main content
10.1145/3698783.3699377acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Lupin: Tolerating Partial Failures in a CXL Pod

Published: 03 November 2024 Publication History

Abstract

A compute express link (CXL) pod is a collection of hosts attached to a CXL memory module. It provides an opportunity to port single-host shared-memory programs to execute on multiple hosts in a CXL pod, where the ported application achieves higher performance than a distributed application that uses network for coordination. The cost of performance scaling on a CXL pod is that applications should tolerate partial failures, where one process or operating system fails or reboots. Lupin is system software that includes kernel modifications and user-level libraries to help applications remain available while they recover from partial failures using the contents of CXL memory.

References

[1]
Compute Express Link (CXL) Specification, Revision 3.1. (Accessed: April 2024). URL: https://computeexpresslink.org/wp-content/uploads/2024/02/CXL-3.1-Specification.pdf.
[2]
pthread_spin_init(3) --- Linux manual page. (Accessed: April 2024). URL: https://man7.org/linux/man-pages/man3/pthread_spin_init.3.html.
[3]
queued_spinlock. (Accessed: April 2024). URL: https://github.com/ARM-software/synchronization-benchmarks/blob/master/ext/linux/queued_spinlock.h.
[4]
A single header buddy memory allocator for c. (Accessed: Dec 2023). URL: https://github.com/spaskalev/buddy_alloc.
[5]
Apache HTTP Server Project. https://httpd.apache.org/, accessed in 2023.
[6]
Memcached. https://memcached.org/, accessed in 2023.
[7]
Marcos K. Aguilera, Nadav Amit, Irina Calciu, Xavier Deguillard, Jayneel Gandhi, Pratap Subrahmanyam, Lalith Suresh, Kiran Tati, Rajesh Venkatasubramanian, and Michael Wei. Remote memory in the age of fast networks. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC '17, page 121--127, New York, NY, USA, 2017. Association for Computing Machinery.
[8]
Andy M Rudoff. Deprecating the PCOMMIT Instruction. https://www.intel.com/content/www/us/en/developer/articles/technical/deprecate-pcommit-instruction.html, 2016.
[9]
Hagit Attiya, Ohad Ben-Baruch, Panagiota Fatourou, Danny Hendler, and Eleftherios Kosmas. Detectable recovery of lock-free data structures. In Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '22, page 262--277, New York, NY, USA, 2022. Association for Computing Machinery. 145/3503221.3508444.
[10]
Naama Ben-David, Guy E. Blelloch, Michal Friedman, and Yuanhao Wei. Delay-free concurrency on faulty persistent memory. In The 31st ACM Symposium on Parallelism in Algorithms and Architectures, SPAA '19, page 253--264, New York, NY, USA, 2019. Association for Computing Machinery.
[11]
Daniel S. Berger, Daniel Ernst, Huaicheng Li, Pantea Zardoshti, Monish Shah, Samir Rajadnya, Scott Lee, Lisa Hsu, Ishwar Agarwal, Mark D. Hill, and Ricardo Bianchini. Design tradeoffs in cxl-based memory pools for public cloud platforms. IEEE Micro, 43(2):30--38, 2023.
[12]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, PACT '08, page 72--81, New York, NY, USA, 2008. Association for Computing Machinery. URL: https://doi-org.ezproxy.lib.utexas.edu/10.1145/1454115.1454128.
[13]
P. Bohannon, D. Lieuwen, and A. Silbershatz. Recovering scalable spin locks. In Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing, pages 314--322, 1996.
[14]
Qingchao Cai, Wentian Guo, Hao Zhang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi, Kian-Lee Tan, Yong Meng Teo, and Sheng Wang. Efficient distributed memory management with rdma and caching. Proc. VLDB Endow., 11(11):1604--1617, jul 2018.
[15]
Wentao Cai, Haosen Wen, H. Alan Beadle, Chris Kjellqvist, Mohammad Hedayati, and Michael L. Scott. Understanding and optimizing persistent memory allocation. In Proceedings of the 2020 ACM SIGPLAN International Symposium on Memory Management, ISMM 2020, page 60--73, New York, NY, USA, 2020. Association for Computing Machinery.
[16]
Prakash Chauhan, Chris Petersen, Brian Morris, and Jerome Glisse. Hyperscale cxl tiered memory expander specification v1.2, 2023. URL: https://www.opencompute.org/documents/hyperscale-cxl-tiered-memory-expander-for-ocp-base-specification-1-pdf.
[17]
Guoyang Chen, Lei Zhang, Richa Budhiraja, Xipeng Shen, and Youfeng Wu. Efficient support of position independence on non-volatile memory. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 '17, page 191--203, New York, NY, USA, 2017. Association for Computing Machinery.
[18]
Kyeongmin Cho, Seungmin Jeon, Azalea Raad, and Jeehoon Kang. Memento: A framework for detectable recoverability in persistent memory. Proc. ACM Program. Lang., 7(PLDI), jun 2023.
[19]
Joel Coburn, Adrian M. Caulfield, Ameen Akel, Laura M. Grupp, Rajesh K. Gupta, Ranjit Jhala, and Steven Swanson. Nv-heaps: Making persistent objects fast and safe with next-generation, non-volatile memories. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, 2011.
[20]
Samsung corporation. Samsung cxl solutions - cmm-h. 2024. URL: https://semiconductor.samsung.com/news-events/tech-blog/samsung-cxl-solutions-cmm-h.
[21]
Craig Howard. What is a ballpark figure for PCIe interrupt-to-userspace latency?, 2011. (Accessed: Jul 2024). URL: https://community.osr.com/t/whats-a-ballpark-figure-for-pcie-interrupt-to-userspace-latency/41717.
[22]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. Commun. ACM, 51(1):107--113, jan 2008.
[23]
Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nightingale, Matthew Renzelmann, Alex Shamis, Anirudh Badam, and Miguel Castro. No compromises: distributed transactions with consistency, availability, and performance. In Ethan L. Miller and Steven Hand, editors, Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4-7, 2015, pages 54--70. ACM, 2015.
[24]
Michal Friedman, Maurice Herlihy, Virendra Marathe, and Erez Petrank. A persistent lock-free queue for non-volatile memory. SIGPLAN Not., 53(1):28--40, feb 2018.
[25]
Rachid Guerraoui, Hugo Guiroux, Renaud Lachaize, Vivien Quéma, and Vasileios Trigonakis. Lock-unlock: Is that all? a pragmatic analysis of locking in software systems. ACM Trans. Comput. Syst., 36(1), mar 2019.
[26]
Trinabh Gupta, Joshua B. Leners, Marcos K. Aguilera, and Michael Walfish. Improving availability in distributed systems with failure informers. In 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13), pages 427--441, Lombard, IL, April 2013. USENIX Association. URL: https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/leners.
[27]
Intel Corporation. eADR: New Opportunities for Persistent Memory Applications. https://www.intel.com/content/www/us/en/developer/articles/technical/eadr-new-opportunities-for-persistent-memory-applications.html, 2021.
[28]
Intel Corporation. Intel Xeon Platinum 8460H Processor. https://www.intel.com/content/www/us/en/products/sku/231744/intel-xeon-pl atinum-8460h-processor-105m-cache-2-20-ghz/specifications.html, accessed in 2023.
[29]
Intel Corporation. Intel® memory latency checker v3.10. https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html, accessed in 2023.
[30]
IntelliProp. Introducing omega fabric based on cxl, 2024. URL: https://www.intelliprop.com/products-page.
[31]
Joseph Izraelevitz, Hammurabi Mendes, and Michael L. Scott. Linearizability of persistent memory objects under a full-system-crash failure model. In Cyril Gavoille and David Ilcinkas, editors, Distributed Computing - 30th International Symposium, DISC 2016, Paris, France, September 27-29, 2016. Proceedings, volume 9888 of Lecture Notes in Computer Science, pages 313--327. Springer, 2016.
[32]
Sunita Jain, Nagaradhesh Yeleswarapu, Hasan Al Maruf, and Rita Gupta. Memory sharing with cxl: Hardware and software design approaches, 2024.
[33]
Junhyeok Jang, Hanjin Choi, Hanyeoreum Bae, Seungjun Lee, Miryeong Kwon, and Myoungsoo Jung. CXL-ANNS: Software-Hardware collaborative memory disaggregation and computation for Billion-Scale approximate nearest neighbor search. In 2023 USENIX Annual Technical Conference (USENIX ATC 23), pages 585--600, Boston, MA, July 2023. USENIX Association. URL: https://www.usenix.org/conference/atc23/presentation/jang.
[34]
Prasad Jayanti and Anup Joshi. Recoverable mutual exclusion with abortability. Computing, 104(10):2225--2252, aug 2022. URL: https://doi.org/10.1007%2Fs00607-022-01105-1.
[35]
Anuj Kalia, Michael Kaminsky, and David G. Andersen. FaSST: Fast, scalable and simple distributed transactions with Two-Sided (RDMA) datagram RPCs. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 185--201, Savannah, GA, November 2016. USENIX Association. URL: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/kalia.
[36]
Daniel Katzan and Adam Morrison. Recoverable, abortable, and adaptive mutual exclusion with sublogarithmic RMR complexity. In Quentin Bramas, Rotem Oshman, and Paolo Romano, editors, 24th International Conference on Principles of Distributed Systems, OPODIS 2020, December 14-16, 2020, Strasbourg, France (Virtual Conference), volume 184 of LIPIcs, pages 15:1--15:16. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020.
[37]
Kimberly Keeton, Sharad Singhal, and Michael Raymond. The open-fam api: A programming model for disaggregated persistent memory. In Swaroop Pophale, Neena Imam, Ferrol Aderholdt, and Manjunath Gorentla Venkata, editors, OpenSHMEM and Related Technologies. OpenSHMEM in the Era of Extreme Heterogeneity, pages 70--89, Cham, 2019. Springer International Publishing.
[38]
Linux kernel documentation. Kernel connector. Date retrieved: December 2023. URL: https://docs.kernel.org/driver-api/connector.html.
[39]
Taehyung Lee, Sumit Kumar Monga, Changwoo Min, and Young Ik Eom. Memtis: Efficient memory tiering with dynamic page classification and page size determination. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23, page 17--34, New York, NY, USA, 2023. Association for Computing Machinery.
[40]
Youngmoon Lee, Hasan Al Maruf, Mosharaf Chowdhury, Asaf Cidon, and Kang G. Shin. Hydra: Resilient and highly available remote memory. In 20th USENIX Conference on File and Storage Technologies (FAST 22), pages 181--198, Santa Clara, CA, February 2022. USENIX Association. URL: https://www.usenix.org/conference/fast22/presentation/lee.
[41]
Joshua B. Leners, Hao Wu, Wei-Lun Hung, Marcos K. Aguilera, and Michael Walfish. Detecting failures in distributed systems with the falcon spy network. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, page 279--294, New York, NY, USA, 2011. Association for Computing Machinery.
[42]
Philip Levis, Kun Lin, and Amy Tai. A case against cxl memory pooling. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks, HotNets '23, page 18--24, New York, NY, USA, 2023. Association for Computing Machinery.
[43]
Huaicheng Li, Daniel S. Berger, Lisa Hsu, Daniel Ernst, Pantea Zardoshti, Stanko Novakovic, Monish Shah, Samir Rajadnya, Scott Lee, Ishwar Agarwal, Mark D. Hill, Marcus Fontoura, and Ricardo Bianchini. Pond: Cxl-based memory pooling systems for cloud platforms. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, ASPLOS 2023, page 574--587, New York, NY, USA, 2023. Association for Computing Machinery. URL: https://doi-org.ezproxy.lib.utexas.edu/10.1145/3575693.3578835.
[44]
Teng Ma, Mingxing Zhang, Kang Chen, Zhuo Song, Yongwei Wu, and Xuehai Qian. Asymnvm: An efficient framework for implementing persistent data structures on asymmetric nvm architecture. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '20, page 757--773, New York, NY, USA, 2020. Association for Computing Machinery. URL: https://doi-org.ezproxy.lib.utexas.edu/10.1145/3373376.3378511.
[45]
Yandong Mao, Robert Morris, and Frans Kaashoek. Optimizing mapreduce for multicore architectures. Technical report, Technical Report MIT-CSAIL-TR-2010-020, MIT, 2010.
[46]
Hasan Al Maruf, Hao Wang, Abhishek Dhanotia, Johannes Weiner, Niket Agarwal, Pallab Bhattacharya, Chris Petersen, Mosharaf Chowdhury, Shobhit Kanaujia, and Prakash Chauhan. Tpp: Transparent page placement for cxl-enabled tiered-memory. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, ASPLOS 2023, page 742--755, New York, NY, USA, 2023. Association for Computing Machinery.
[47]
John M. Mellor-Crummey and Michael L. Scott. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst., 9(1):21--65, feb 1991.
[48]
Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, and Christos Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. In 2007 IEEE 13th International Symposium on High Performance Computer Architecture, pages 13--24, Feb 2007.
[49]
Zhenyuan Ruan, Malte Schwarzkopf, Marcos K. Aguilera, and Adam Belay. AIFM: High-Performance, Application-Integrated far memory. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 315--332. USENIX Association, November 2020. URL: https://www.usenix.org/conference/osdi20/presentation/ruan.
[50]
Andy Rudoff, Chet Douglas, and Tiffany Kasanicky. Persistent memory in CXL. 2021. URL: https://www.snia.org/sites/default/files/PM-Summit/2021/snia-pm-cs-summit-Rudoff-PM-in-CXL-2021.pdf.
[51]
Jiacheng Shen, Pengfei Zuo, Xuchuan Luo, Tianyi Yang, Yuxin Su, Yangfan Zhou, and Michael R. Lyu. FUSEE: A fully Memory-Disaggregated Key-Value store. In 21st USENIX Conference on File and Storage Technologies (FAST 23), pages 81--98, Santa Clara, CA, February 2023. USENIX Association. URL: https://www.usenix.org/conference/fast23/presentation/shen.
[52]
Yan Sun, Yifan Yuan, Zeduo Yu, Reese Kuper, Chihun Song, Jinghan Huang, Houxiang Ji, Siddharth Agarwal, Jiaqi Lou, Ipoom Jeong, Ren Wang, Jung Ho Ahn, Tianyin Xu, and Nam Sung Kim. Demystifying CXL memory with genuine CXL-ready systems and devices, 2023.
[53]
Inc Super Micro Computer. What is a Baseboard Management Controller? (BMC), 2024. (Accessed: Jul 2024). URL: https://www.supermicrocom/en/glossary/baseboard-management-controller.
[54]
Bijan Tabatabai, Mark Mansi, and Michael M. Swift. FBMM: using the VFS for extensibility in kernel memory management. In Malte Schwarzkopf, Andrew Baumann, and Natacha Crooks, editors, Proceedings of the 19th Workshop on Hot Topics in Operating Systems, HOTOS 2023, Providence, RI, USA, June 22-24, 2023, pages 181--187. ACM, 2023.
[55]
Bijan Tabatabai, James Sorenson, and Michael M. Swift. FBMM: Making memory management extensible with filesystems. In 2024 USENIX Annual Technical Conference (USENIX ATC 24), pages 785--798, Santa Clara, CA, July 2024. USENIX Association. URL: https://www.usenix.org/conference/atc24/presentation/tabatabai.
[56]
Shin-Yeh Tsai, Yizhou Shan, and Yiying Zhang. Disaggregating persistent memory and controlling them remotely: An exploration of passive disaggregated Key-Value stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20), pages 33--48. USENIX Association, July 2020. URL: https://www.usenix.org/conference/atc20/presentation/tsai.
[57]
Haris Volos, Andres Jaan Tack, and Michael M. Swift. Mnemosyne: Lightweight persistent memory. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, 2011.
[58]
Qing Wang, Youyou Lu, Erci Xu, Junru Li, Youmin Chen, and Jiwu Shu. Concordia: Distributed shared memory with {In-Network} cache coherence. In 19th USENIX Conference on File and Storage Technologies (FAST 21), pages 277--292, 2021.
[59]
Xingda Wei, Haotian Wang, Tianxia Wang, Rong Chen, Jinyu Gu, Pengfei Zuo, and Haibo Chen. Transactional indexes on (rdma or cxl-based) disaggregated memory with repairable transaction, 2023. arXiv:2308.02501.
[60]
Yingjun Wu and Kian-Lee Tan. Scalable In-Memory transaction processing with HTM. In 2016 USENIX Annual Technical Conference (USENIX ATC 16), pages 365--377, Denver, CO, June 2016. USENIX Association. URL: https://www.usenix.org/conference/atc16/technical-sessions/presentation/wu.
[61]
Yu Xia, Xiangyao Yu, Andrew Pavlo, and Srinivas Devadas. Taurus: lightweight parallel logging for in-memory database management systems. Proc. VLDB Endow., 14(2):189--201, oct 2020.
[62]
Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Steven Swanson, and Andy Rudoff. Nova-fortis: A fault-tolerant non-volatile main memory file system. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP '17, page 478--496, New York, NY, USA, 2017. Association for Computing Machinery.
[63]
Ming Zhang, Yu Hua, Pengfei Zuo, and Lurong Liu. FORD: Fast onesided RDMA-based distributed transactions for disaggregated persistent memory. In 20th USENIX Conference on File and Storage Technologies (FAST 22), pages 51--68, Santa Clara, CA, February 2022. USENIX Association. URL: https://www.usenix.org/conference/fast22/presentation/zhang-ming.
[64]
Mingxing Zhang, Teng Ma, Jinqi Hua, Zheng Liu, Kang Chen, Ning Ding, Fan Du, Jinlei Jiang, Tao Ma, and Yongwei Wu. Partial failure resilient memory management system for (cxl-based) distributed shared memory. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP '23, page 658--674, New York, NY, USA, 2023. Association for Computing Machinery.
[65]
Yang Zhou, Hassan M. G. Wassel, Sihang Liu, Jiaqi Gao, James Mickens, Minlan Yu, Chris Kennelly, Paul Turner, David E. Culler, Henry M. Levy, and Amin Vahdat. Carbink: Fault-Tolerant far memory. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), pages 55--71, Carlsbad, CA, July 2022. USENIX Association. URL: https://www.usenix.org/conference/osdi22/presentation/zhou-yang.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DIMES '24: Proceedings of the 2nd Workshop on Disruptive Memory Systems
November 2024
70 pages
ISBN:9798400713033
DOI:10.1145/3698783
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. CXL
  2. Partial failure tolerance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

SOSP '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 8 of 17 submissions, 47%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 189
    Total Downloads
  • Downloads (Last 12 months)189
  • Downloads (Last 6 weeks)157
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media