research-article

Open access

Yggdrasil: Reducing Network I/O Tax with (CXL-Based) Distributed Shared Memory

Authors:

Bin Yu,

Xin YangAuthors Info & Claims

ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing

Pages 597 - 606

https://doi.org/10.1145/3673038.3673138

Published: 12 August 2024 Publication History

All formats PDF

Abstract

In communication-intensive applications that run on hosts with high-speed network hardware, a common challenge arises from the significant burden placed on the native socket system within the OS. Researchers have devoted considerable effort to optimizing the kernel networking stack and moving the TCP/IP stack to user-space. In this paper, we describe a novel socket replacement solution, Yggdrasil, a CXL-based user-space high-performance socket system. Yggdrasil is fully compatible with Linux socket, making it a drop-in replacement for existing applications without the need for code modifications. In order to optimize performance, Yggdrasil employs CXL-based distributed shared memory (DSM) for inter-host communication whenever it is available. In cases where DSM is not accessible, Yggdrasil transparently switches back to Linux socket for communication. A key element in achieving isolation in Yggdrasil involves a trusted user-space monitoring daemon responsible for managing control plane operations like connection setup and access control. Within the data plane of Yggdrasil, a peer-to-peer model is adopted for communication between processes. To bridge the semantic gap between socket and DSM, we exploit several techniques to ensure compatibility and performance, including (1) transparent dynamic fast/slow data path navigation, (2) decentralized CXL memory management, (3) lock-free queue based QoS-aware dynamic data polling, and (4) semantics-aware memory page migration. By evaluating Yggdrasil on both emulated and real CXL hardware, we show that Yggdrasil outperforms Linux socket in Memcached throughput by 8.2 × and reduces latency by 24 ∼ 320 × in a micro benchmark across different message sizes.

Supplemental Material

PDF File - Appendix: Artifact Description/Artifact Evaluation

This artifact includes a qcow2 image file of a virtual machine (VM) designed to operate within an x86_64 environment, complete with accompanying startup instructions. Within the image file are the necessary components such as source codes, the compiled libnav.so, executable binary files essential for conducting experiments, and shell scripts for testing purposes. These resources are provided to enable straightforward replication of the experiments.

Download
335.13 KB

PDF File - Appendix: Artifact Description/Artifact Evaluation

Download
335.13 KB

References

[1]

Mohammad Alian and Nam Sung Kim. 2019. NetDIMM: Low-latency near-memory network interface architecture. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 699–711.

Abstract

Supplemental Material

References

Index Terms

Recommendations

Polaris: Enhancing CXL-based Memory Expanders with Memory-side Prefetching

An Effective Memory Optimization for Virtual Machine-Based Systems

Improving machine virtualisation with 'hotplug memory'

Comments

Information

Published In

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

HTML Format

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations