research-article

Open access

Understanding host interconnect congestion

Authors:

Saksham Agarwal,

Rachit Agarwal,

Behnam Montazeri,

Masoud Moshref,

Khaled Elmeleegy,

Marc Asher de Kruijf,

Sylvia Ratnasamy,

Amin VahdatAuthors Info & Claims

HotNets '22: Proceedings of the 21st ACM Workshop on Hot Topics in Networks

Pages 198 - 204

https://doi.org/10.1145/3563766.3564110

Published: 14 November 2022 Publication History

Abstract

We present evidence and characterization of host congestion in production clusters: adoption of high-bandwidth access links leading to emergence of bottlenecks within the host interconnect (NIC-to-CPU data path). We demonstrate that contention on existing IO memory management units and/or the memory subsystem can significantly reduce the available NIC-to-CPU bandwidth, resulting in hundreds of microseconds of queueing delays and eventual packet drops at hosts (even when running a state-of-the-art congestion control protocol that accounts for CPU-induced host congestion). We also discuss implications of host interconnect congestion to design of future host architecture, network stacks and network protocols.

References

[1]

Jasmin Ajanovic. 2008. PCI Express 3.0 Accelerator Features. (2008). https://www.intel.com.ec/content/dam/doc/white-paper/pci-express3-accelerator-white-paper.pdf

[2]

Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data Center TCP (DCTCP). In ACM SIGCOMM.

[3]

Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. 2013. pFabric: Minimal near-optimal datacenter transport. In ACM SIGCOMM.

[4]

Nadav Amit, Muli Ben-Yehuda, Dan Tsafrir, and Assaf Schuster. 2011. vIOMMU: efficient IOMMU emulation. In USENIX ATC.

[5]

Nadav Amit, Muli Ben-Yehuda, and Ben-Ami Yassour. 2010. IOMMU: strategies for mitigating the IOTLB bottleneck. In IEEE ISCA.

[6]

Arm. 2021. Memory system resource partitioning and monitoring (MPAM). (2021). https://developer.arm.com/documentation/ddi0598/latest

[7]

Lars Bergstrom. 2011. Measuring NUMA effects with the STREAM benchmark. arXiv:1103.3225 (2011).

[8]

Qizhe Cai, Shubham Chaudhary, Midhul Vuppalapati, Jaehyun Hwang, and Rachit Agarwal. 2021. Understanding host network stack overheads. In ACM SIGCOMM.

[9]

Qizhe Cai, Midhul Vuppalapati, Jaehyun Hwang, Christos Kozyrakis, and Rachit Agarwal. 2022. Towards μs tail latency and terabit ethernet: disaggregating the host network stack. In ACM SIGCOMM.

[10]

Alireza Farshin, Amir Roozbeh, Gerald Q Maguire Jr, and Dejan Kostić. 2020. Reexamining direct cache access to optimize I/O intensive applications for multi-hundred-gigabit networks. In USENIX ATC.

[11]

Matthew P Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert NM Watson, Andrew W Moore, Steven Hand, and Jon Crowcroft. 2015. Queues don't matter when you can jump them!. In USENIX NSDI.

Digital Library

[12]

Sangtae Ha, Injong Rhee, and Lisong Xu. 2008. CUBIC: a new TCP-friendly high-speed TCP variant. In ACM SIGOPS OSR.

[13]

Intel. 2019. Introduction to Memory Bandwidth Allocation. (2019). https://www.intel.com/content/www/us/en/developer/articles/technical/introduction-to-memory-bandwidth-allocation.html

[14]

Anuj Kalia, Michael Kaminsky, and David Andersen. 2019. Datacenter RPCs can be General and Fast. In USENIX NSDI.

[15]

Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan MG Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, Michael Ryan, David Wetherall, and Amin Vahdat. 2020. Swift: Delay is simple and effective for congestion control in the datacenter. In ACM SIGCOMM.

[16]

Chang Joo Lee, Onur Mutlu, Veynu Narasiman, and Yale N Patt. 2008. Prefetch-aware DRAM controllers. In IEEE/ACM MICRO.

[17]

Bojie Li, Zhenyuan Ruan, Wencong Xiao, Yuanwei Lu, Yongqiang Xiong, Andrew Putnam, Enhong Chen, and Lintao Zhang. 2017. KV-direct: High-performance in-memory key-value store with programmable nic. In ACM SOSP.

[18]

John DC Little and Stephen C Graves. 2008. Little's law. In Building intuition. Springer.

[19]

Moshe Malka, Nadav Amit, Muli Ben-Yehuda, and Dan Tsafrir. 2015. rIOMMU: efficient IOMMU for I/O devices that employ ring buffers. ACM SIGPLAN Notices.

[20]

Ilias Marinos, Robert NM Watson, Mark Handley, and Randall R Stewart. 2017. Disk| Crypt| Net: rethinking the stack for high-performance video streaming. In ACM SIGCOMM.

[21]

Alex Markuze, Igor Smolyar, Adam Morrison, and Dan Tsafrir. 2018. DAMN: Overhead-free IOMMU protection for networking. In ACM ASPLOS.

Digital Library

[22]

Michael Marty, Marc de Kruijf, Jacob Adriaens, Christopher Alfeld, Sean Bauer, Carlo Contavalli, Michael Dalton, Nandita Dukkipati, William C Evans, Steve Gribble, et al. 2019. Snap: A microkernel approach to host networking. In ACM SOSP.

[23]

Hassan Mujtaba. 2020. Intel Sapphire Rapids Xeon Scalable CPUs. (2020). https://wccftech.com/intel-sapphire-rapids-xeon-scalable-cpus-volume-ramp-rumored-for-2023/

[24]

Onur Mutlu and Thomas Moscibroda. 2008. Parallelism-aware batch scheduling: enhancing both performance and fairness of shared DRAM systems. In IEEE ISCA.

[25]

Rolf Neugebauer, Gianni Antichi, José Fernando Zazo, Yury Audzevich, Sergio López-Buedo, and Andrew W Moore. 2018. Understanding PCIe performance for end host networking. In ACM SIGCOMM.

[26]

NVIDIA. 2022. ConnectX-5. (2022). https://www.nvidia.com/en-us/networking/ethernet/connectx-5/

[27]

NVIDIA. 2022. ConnectX-6. (2022). https://www.nvidia.com/en-us/networking/ethernet/connectx-6/

[28]

NVIDIA. 2022. ConnectX-7. (2022). https://nvdam.widen.net/s/srdqzxgdr5/connectx-7-datasheet

[29]

Omer Peleg, Adam Morrison, Benjamin Serebrin, and Dan Tsafrir. 2015. Utilizing the IOMMU scalably. In USENIX ATC.

[30]

Boris Pismenny, Liran Liss, Adam Morrison, and Dan Tsafrir. 2022. The benefits of general-purpose on-NIC memory. In ACM ASPLOS.

[31]

Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. 2014. SENIC: scalable NIC for end-host rate limiting. In USENIX NSDI.

[32]

Shelby Thomas, Geoffrey M Voelker, and George Porter. 2018. Cachecloud: towards speed-of-light datacenter communication. In USENIX HotCloud.

[33]

Stephen Van Doren. 2019. HOTI 2019: Compute Express Link. In IEEE HOTI.

[34]

Yimeng Zhao, Ahmed Saeed, Mostafa Ammar, and Ellen Zegura. 2021. Scouting the path to a million-client server. In PAM.

Cited By

Khalilov MChrapek MShen SVezzu ABenz TDi Girolamo SSchneider TDe Sensi DBenini LHoefler TBagchi SZhang Y(2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692007
Skiadopoulos AXie ZZhao MCai QAgarwal SAdelmann JAhern DContavalli CGoldflam MMayatskikh VRaja RWalton DAgarwal RMukherjee SKozyrakis CGavrilovska ATerry D(2024)High-throughput and flexible host networking for accelerated computingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691960(405-423)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691960
Pereira FRamos FPedrosa LVanbever LZhang I(2024)Automatic parallelization of software network functionsProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691910(1531-1550)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691910
Show More Cited By

Index Terms

Understanding host interconnect congestion
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Networking hardware
2. Networks
  1. Network performance evaluation
    1. Network performance analysis
  2. Network protocols
    1. Transport protocols

Recommendations

Host Congestion Control
ACM SIGCOMM '23: Proceedings of the ACM SIGCOMM 2023 Conference

The conventional wisdom in systems and networking communities is that congestion happens primarily within the network fabric. However, adoption of high-bandwidth access links and relatively stagnant technology trends for resources within hosts have led ...
Congestion Control for Large-Scale RDMA Deployments
SIGCOMM '15: Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication

Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-...
Congestion Control for Large-Scale RDMA Deployments
SIGCOMM'15

Modern datacenter applications demand high throughput (40Gbps) and ultra-low latency (< 10 μs per hop) from the network, with low CPU overhead. Standard TCP/IP stacks cannot meet these requirements, but Remote Direct Memory Access (RDMA) can. On IP-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

HotNets '22: Proceedings of the 21st ACM Workshop on Hot Topics in Networks

November 2022

252 pages

ISBN:9781450398992

DOI:10.1145/3563766

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 November 2022

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSF

Conference

HotNets '22

Sponsor:

SIGCOMM

HotNets '22: The 21st ACM Workshop on Hot Topics in Networks

November 14 - 15, 2022

Texas, Austin

Acceptance Rates

Overall Acceptance Rate 110 of 460 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
1,848
Total Downloads

Downloads (Last 12 months)743
Downloads (Last 6 weeks)72

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khalilov MChrapek MShen SVezzu ABenz TDi Girolamo SSchneider TDe Sensi DBenini LHoefler TBagchi SZhang Y(2024)OSMOSISProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692007(247-263)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691992.3692007
Skiadopoulos AXie ZZhao MCai QAgarwal SAdelmann JAhern DContavalli CGoldflam MMayatskikh VRaja RWalton DAgarwal RMukherjee SKozyrakis CGavrilovska ATerry D(2024)High-throughput and flexible host networking for accelerated computingProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691960(405-423)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.5555/3691938.3691960
Pereira FRamos FPedrosa LVanbever LZhang I(2024)Automatic parallelization of software network functionsProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691910(1531-1550)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.5555/3691825.3691910
Yin XSong EYang YWang YYang BLu JLi XLyu BWen RHe SShu YZhu S(2024)vSwitchLB: Stratified Load Balancing for vSwitch Efficiency in Data CentersProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663422(95-101)Online publication date: 3-Aug-2024
https://doi.org/10.1145/3663408.3663422
Wan ZZhang JWang YLiu KPan HHuang T(2024)Rethinking Intra-host Congestion Control in RDMA NetworksProceedings of the 8th Asia-Pacific Workshop on Networking10.1145/3663408.3663413(31-37)Online publication date: 3-Aug-2024
https://doi.org/10.1145/3663408.3663413
Khalilov MGirolamo SChrapek MNudelman RBloch GHoefler T(2024)Network-Offloaded Bandwidth-Optimal Broadcast and Allgather for Distributed AISC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00109(1-17)Online publication date: 17-Nov-2024
https://doi.org/10.1109/SC41406.2024.00109
Huang YChen LHuang CZhang RZhou YYan MWu J(2024)Mitigating Intra-host Network Congestion with SmartNIC2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682939(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682939
Umeike JAgarwal SLazarev NAlian M(2024)Userspace Networking in gem52024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS61541.2024.00026(179-191)Online publication date: 5-May-2024
https://doi.org/10.1109/ISPASS61541.2024.00026
Jamil HChung JBicer TKosar TKettimuthu R(2023)Throughput Optimization with a NUMA-Aware Runtime System for Efficient Scientific Data StreamingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3624593(795-805)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3624062.3624593
Oubaydallah YIbrahimi KAzouzi R(2023)Recent Advances in Data Intensive Applications: Survey2023 10th International Conference on Wireless Networks and Mobile Communications (WINCOM)10.1109/WINCOM59760.2023.10322920(1-6)Online publication date: 26-Oct-2023
https://doi.org/10.1109/WINCOM59760.2023.10322920
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents