skip to main content
10.1145/3663408.3663413acmotherconferencesArticle/Chapter ViewAbstractPublication PagescommConference Proceedingsconference-collections
research-article

Rethinking Intra-host Congestion Control in RDMA Networks

Published: 03 August 2024 Publication History

Abstract

RDMA has been widely deployed in production datacenters. The conventional wisdom believes that the intra-host network delivers stable and high performance. However, intra-host resources witness a relative stagnation in technology trends compared to the evolving RDMA NIC (RNIC). Thus, the RNIC traffic may not get sufficient intra-host resources when it contends with intra-host traffic. A line of recent works from large-scale production datacenter operators demonstrates the emergence of intra-host congestion and associated performance collapse, which forces us to rethink the practice of intra-host congestion control. However, the ability to efficiently control RDMA intra-host networks is far less mature than inter-host networks, which brings challenges in congestion monitoring, intra-host resource allocation and RNIC traffic adjustment. In this paper, we propose RDMA intra-Host Congestion Control (RHCC), which combines sub-RTT granularity intra-host traffic congestion avoidance and proactive RNIC traffic adjustment. We implement RHCC on commodity servers and RNICs and conduct experiments to evaluate the performance. The results show that RHCC can increase/decrease the network throughput/latency by up to 2 × and 1.4 ×, respectively.

References

[1]
2012. Intel® Data Direct I/O Technology (Intel® DDIO): A Primer. https://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf.
[2]
2018. Mellanox perftest package.https://community.mellanox.com/docs/DOC- 2802.
[3]
2019. Intel Xeon Processor Scalable Family Datasheet.https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/2nd-gen-xeon-scalable-datasheet-vol-1.pdf.
[4]
2019. Introduction to Memory Bandwidth Allocation. https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-allocation.html.
[5]
2022. Intel® performance counter monitor.https://www.intel.com/software/pcm.
[6]
2022. Mellanox NEO-Host.https://support.mellanox.com/s/productdetails/a2v50000000N2OlAAK/mellanox-neohost.
[7]
2022. Nvidia dgx a100.https://www.nvidia.com/en-us/data-center/dgx-a100/.
[8]
2023. Github-Terabit-Ethernet/hostcc.https://github.com/Terabit-Ethernet/hostCC/tree/main.
[9]
2023. Intel. 2023. Intel® Memory Latency Checker. (2023).https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html.
[10]
2023. Intel® 64 and IA-32 Architectures Software Developer Manuals.https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html.
[11]
2023. Intel® Resource Director Technology (Intel® RDT). https://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html.
[12]
2023. NVLink and NVSwitch:Fastest HPC Data Center Platform. https://www.nvidia.com/en-us/data-center/nvlink/.
[13]
2023. Understanding MLX5 Linux counters and status parameters. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-linux-counters-and-status-parameters.
[14]
2024. Intel® 64 and IA-32 Architectures Software Developer’s Manual. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sdm.html.
[15]
Saksham Agarwal, Rachit Agarwal, Behnam Montazeri, Masoud Moshref, Khaled Elmeleegy, Luigi Rizzo, Marc Asher de Kruijf, Gautam Kumar, Sylvia Ratnasamy, David Culler, 2022. Understanding host interconnect congestion. In HotNets.
[16]
Saksham Agarwal, Arvind Krishnamurthy, and Rachit Agarwal. 2023. Host Congestion Control. In SIGCOMM.
[17]
Wei Bai, Shanim Sainul Abdeen, Ankit Agrawal, Krishan Kumar Attre, Paramvir Bahl, Ameya Bhagat, Gowri Bhaskara, Tanya Brokhman, Lei Cao, Ahmad Cheema, 2023. Empowering azure storage with RDMA. In NSDI.
[18]
Jianbo Dong, Zheng Cao, Tao Zhang, Jianxi Ye, Shaochuang Wang, Fei Feng, Li Zhao, Xiaoyong Liu, Liuyihan Song, Liwei Peng, 2020. Eflops: Algorithm and system co-design for a high performance distributed training platform. In HPCA.
[19]
Alireza Farshin, Amir Roozbeh, Gerald Q Maguire Jr, and Dejan Kostić. 2020. Reexamining Direct Cache Access to Optimize { I/O} Intensive Applications for Multi-hundred-gigabit Networks. In ATC.
[20]
Yixiao Gao, Qiang Li, Lingbo Tang, Yongqing Xi, Pengcheng Zhang, Wenwen Peng, Bo Li, Yaohui Wu, Shaozong Liu, Lei Yan, 2021. When cloud storage meets rdma. In NSDI.
[21]
Yimin Jiang, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. 2020. A unified architecture for accelerating distributed { DNN} training in heterogeneous { GPU/CPU} clusters. In OSDI.
[22]
Xinhao Kong, Jiaqi Lou, Wei Bai, Nam Sung Kim, and Danyang Zhuo. 2023. Towards a Manageable Intra-Host Network. In Proceedings of the 19th Workshop on Hot Topics in Operating Systems. 206–213.
[23]
Xinhao Kong, Yibo Zhu, Huaping Zhou, Zhuo Jiang, Jianxi Ye, Chuanxiong Guo, and Danyang Zhuo. 2022. Collie: Finding Performance Anomalies in { RDMA} Subsystems. In NSDI.
[24]
Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan MG Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, 2020. Swift: Delay is simple and effective for congestion control in the datacenter. In SIGCOMM.
[25]
Qiang Li, Qiao Xiang, Derui Liu, Yuxin Wang, Haonan Qiu, Xiaoliang Wang, Jie Zhang, Ridi Wen, Haohao Song, Gexiao Tian, Chenyang Huang, Lulu Chen, Shaozong Liu, Yaohui Wu, Zhiwu Wu, Zicheng Luo, Yuchao Shao, Chao Han, Zhongjie Wu, Jianbo Dong, Zheng Cao, Jinbo Wu, Jiwu Shu, and Jiesheng Wu. 2023. From RDMA to RDCA: Toward High-Speed Last Mile of Data Center Networks Using Remote Direct Cache Access.
[26]
Qiang Li, Qiao Xiang, Yuxin Wang, Haohao Song, Ridi Wen, Wenhui Yao, Yuanyuan Dong, Shuqi Zhao, Shuo Huang, Zhaosheng Zhu, 2023. More than capacity: performance-oriented evolution of Pangu in Alibaba. In FAST.
[27]
Kefei Liu, Zhuo Jiang, Jiao Zhang, Haoran Wei, Xiaolong Zhong, Lizhuang Tan, Tian Pan, and Tao Huang. 2023. Hostping: Diagnosing intra-host network bottlenecks in { RDMA} servers. In NSDI.
[28]
Xiaoyi Lu, Nusrat S Islam, Md Wasi-Ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, and Dhabaleswar K Panda. 2013. High-performance design of hadoop rpc with rdma over infiniband. In IPCC. IEEE.
[29]
Xiaoyi Lu, Md Wasi Ur Rahman, Nusrat Islam, Dipti Shankar, and Dhabaleswar K Panda. 2014. Accelerating spark with RDMA for big data processing: Early experiences. In 2014 IEEE 22nd Annual Symposium on High-Performance Interconnects. IEEE.
[30]
Rolf Neugebauer, Gianni Antichi, José Fernando Zazo, Yury Audzevich, Sergio Lpez-Buedo, and Andrew W Moore. 2018. Understanding PCIe performance for end host networking. In SIGCOMM.
[31]
NVIDIA. 2021. BlueField-2. https://resources.nvidia.com/en-us-accelerated-networking-resource-library/bluefield-2-dpu-datasheet?lx=LbHvpR&topic=networking-cloud.
[32]
NVIDIA. 2022. BlueField-3. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-3-dpu.pdf.
[33]
Satadru Pan, Theano Stavrinos, Yunqiao Zhang, Atul Sikaria, Pavel Zakharov, Abhinav Sharma, Mike Shuey, Richard Wareing, Monika Gangapuram, Guanglei Cao, 2021. Facebook’s tectonic filesystem: Efficiency from exascale. In FAST.
[34]
Debendra Das Sharma. 2022. Compute Express Link (CXL): Enabling heterogeneous data-centric computing with heterogeneous memory hierarchy. IEEE Micro (2022).
[35]
Yuval Shpigelman, Idan Burstein, Noam Bloch, Reut Zuck, and Roee Moyal. 2021. Programmable Congestion Control Communication Scheme. US Patent App. 16/986,428.
[36]
Amin Tootoonchian, Aurojit Panda, Chang Lan, Melvin Walls, Katerina Argyraki, Sylvia Ratnasamy, and Scott Shenker. 2018. ResQ: Enabling { SLOs} in Network Function Virtualization. In NSDI.
[37]
Minhu Wang, Mingwei Xu, and Jianping Wu. 2022. Understanding I/O direct cache access performance for end host networking. Proceedings of the ACM on Measurement and Analysis of Computing Systems (2022).
[38]
Zilong Wang, Layong Luo, Qingsong Ning, Chaoliang Zeng, Wenxue Li, Xinchen Wan, Peng Xie, Tao Feng, Ke Cheng, Xiongfei Geng, 2023. SRNIC: A Scalable Architecture for { RDMA}{ NICs}. In NSDI.
[39]
Yifan Yuan, Mohammad Alian, Yipeng Wang, Ren Wang, Ilia Kurakin, Charlie Tai, and Nam Sung Kim. 2021. Don’t forget the I/O when allocating your LLC. In ISCA. IEEE.
[40]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. SIGCOMM (2015).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
APNet '24: Proceedings of the 8th Asia-Pacific Workshop on Networking
August 2024
230 pages
ISBN:9798400717581
DOI:10.1145/3663408
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 August 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Congestion control
  2. RDMA datacenter transport

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

APNet 2024

Acceptance Rates

APNet '24 Paper Acceptance Rate 50 of 118 submissions, 42%;
Overall Acceptance Rate 50 of 118 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 118
    Total Downloads
  • Downloads (Last 12 months)118
  • Downloads (Last 6 weeks)27
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media