skip to main content
10.1145/3673038.3673085acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Coupling Congestion Control and Flow Pausing in Data Center Network

Published: 12 August 2024 Publication History

Abstract

To achieve high throughput and low latency for data center applications, there are two broad lines of work: end-to-end congestion control algorithms and flow pausing mechanisms. It is challenging for end-to-end congestion control algorithms without complex signals to achieve fast convergence to a stable equilibrium state while effectively handling the transient congestion. Additionally, flow pausing mechanisms are decoupled from congestion control, which leads to long convergence time after transient state and incomplete queue elimination in equilibrium state. We propose a transport protocol that combines the advantages of flow pausing and congestion control, called FAR. The key idea is coupling the bandwidth-estimation based congestion control and the end-to-end flow pausing mechanisms. FAR quickly explores the available bandwidth with binary-search based packet train probe to achieve high throughput and low latency. Extensive evaluation results demonstrate that our protocol achieves accurate bandwidth estimation and reduces the tail flow completion time (FCT) by up to 67<Formula format="inline"><TexMath><?TeX $\%$?></TexMath><AltText>Math 1</AltText><File name="icpp24-47-inline1" type="svg"/></Formula> compared with the state-of-the-art designs.

References

[1]
2020. NS-3 Network Simulator. https://www.nsnam.org/releases/ns-3-24/.
[2]
Vamsi Addanki, Oliver Michel, and Stefan Schmid. 2022. PowerTCP: Pushing the performance limits of datacenter networks. In Proceedings of USENIX NSDI. 51–70.
[3]
Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. 2010. Data center tcp (dctcp). In Proceedings of ACM SIGCOMM. 63–74.
[4]
Robert L Carter and Mark E Crovella. 1996. Measuring bottleneck link speed in packet-switched networks. Performance evaluation 27 (1996), 297–318.
[5]
Dah-Ming Chiu and Raj Jain. 1989. Analysis of the increase and decrease algorithms for congestion avoidance in computer networks. Computer Networks and ISDN systems 17, 1 (1989), 1–14.
[6]
Intel DPDK. 2023. DPDK Plane Development Kit. https://www.dpdk.org/.
[7]
Emanuele Goldoni, Giuseppe Rossi, and Alberto Torelli. 2009. Assolo, a new method for available bandwidth estimation. In Proceedings of IEEE ICIMP. 130–136.
[8]
Prateesh Goyal, Preey Shah, Naveen Kr Sharma, Mohammad Alizadeh, and Thomas E Anderson. 2022. Backpressure Flow Control. In Proceedings of USENIX NSDI. 779–805.
[9]
Cesar D Guerrero and Miguel A Labrador. 2010. Traceband: A fast, low overhead and accurate tool for available bandwidth estimation and monitoring. Computer Networks 54, 6 (2010), 977–990.
[10]
Ningning Hu and Peter Steenkiste. 2003. Evaluation and characterization of available bandwidth probing techniques. IEEE journal on Selected Areas in Communications 21, 6 (2003), 879–894.
[11]
Ningning Hu and Peter Steenkiste. 2003. Evaluation and characterization of available bandwidth probing techniques. IEEE journal on Selected Areas in Communications 21, 6 (2003), 879–894.
[12]
Shuihai Hu, Yibo Zhu, Peng Cheng, Chuanxiong Guo, Kun Tan, Jitendra Padhye, and Kai Chen. 2016. Deadlocks in datacenter networks: Why do they form, and how to avoid them. In Proceedings of ACM HotNets. 92–98.
[13]
Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure coding in windows azure storage. In Proceedings of USENIX ATC. 15–26.
[14]
Manish Jain and Constantinos Dovrolis. 2002. End-to-end available bandwidth: Measurement methodology, dynamics, and relation with TCP throughput. In Proceedings of ACM SIGCOMM. 295–308.
[15]
Changhoon Kim, Anirudh Sivaraman, Naga Katta, Antonin Bas, Advait Dixit, and Lawrence J Wobker. 2015. In-band network telemetry via programmable dataplanes. In Proceedings of ACM SIGCOMM. 1–2.
[16]
Gautam Kumar, Nandita Dukkipati, Keon Jang, Hassan MG Wassel, Xian Wu, Behnam Montazeri, Yaogong Wang, Kevin Springborn, Christopher Alfeld, and Michael Ryan. 2020. Swift: Delay is simple and effective for congestion control in the datacenter. In Proceedings of ACM SIGCOMM. 514–528.
[17]
Sang Gyu Kwak and Jong Hae Kim. 2017. Central limit theorem: the cornerstone of modern statistics. Korean journal of anesthesiology 70, 2 (2017), 144–156.
[18]
Li Lao, Constantine Dovrolis, and MY Sanadidi. 2006. The probe gap model can underestimate the available bandwidth of multihop paths. In Proceedings of ACM SIGCOMM. 29–34.
[19]
Yuliang Li, Rui Miao, Hongqiang Harry Liu, Yan Zhuang, Fei Feng, Lingbo Tang, Zheng Cao, Ming Zhang, Frank Kelly, and Mohammad Alizadeh. 2019. HPCC: High precision congestion control. In Proceedings of ACM SIGCOMM. 44–58.
[20]
Shiyu Liu, Ahmad Ghalayini, Mohammad Alizadeh, Balaji Prabhakar, Mendel Rosenblum, and Anirudh Sivaraman. 2021. Breaking the Transience-Equilibrium nexus: A new approach to datacenter packet transport. In Proceedings of USENIX NSDI. 47–63.
[21]
Wenjun Lyu, Jiawei Huang, Jingling Liu, Zhaoyi Li, Shaojun Zou, Weihe Li, Jianxin Wang, and Desheng Zhang. 2021. Mitigating port starvation for shallow-buffered switches in datacenter networks. In Proceedings of IEEE ICDCS. 921–931.
[22]
Bo Mao, Suzhen Wu, and Hong Jiang. 2015. Exploiting workload characteristics and service diversity to improve the availability of cloud storage systems. IEEE Transactions on Parallel and Distributed Systems 27, 7 (2015), 2010–2021.
[23]
Jyotiprasad Medhi. 2002. Stochastic models in queueing theory.
[24]
Bob Melander, Mats Bjorkman, and Per Gunningberg. 2000. A new end-to-end probing and analysis method for estimating bandwidth bottlenecks. In Proceedings of IEEE Globecom. 415–420.
[25]
Radhika Mittal, Vinh The Lam, Nandita Dukkipati, Emily Blem, Hassan Wassel, Monia Ghobadi, Amin Vahdat, Yaogong Wang, David Wetherall, and David Zats. 2015. TIMELY: RTT-based congestion control for the datacenter. In Proceedings of ACM SIGCOMM. 537–550.
[26]
Radhika Mittal, Alexander Shpiner, Aurojit Panda, Eitan Zahavi, Arvind Krishnamurthy, Sylvia Ratnasamy, and Scott Shenker. 2018. Revisiting network support for RDMA. In Proceedings of ACM SIGCOMM. 313–326.
[27]
Vern Edward Paxson. 1997. Measurements and analysis of end-to-end Internet dynamics. University of California, Berkeley.
[28]
Ronald K Pearson. 2002. Outliers in process modeling and identification. IEEE Transactions on control systems technology 10, 1 (2002), 55–63.
[29]
Kun Qian, Tong Zhang, and Fengyuan Ren. 2019. Gentle flow control: avoiding deadlock in lossless networks. In Proceedings of ACM SIGCOMM. 75–89.
[30]
Kadangode Ramakrishnan, Sally Floyd, and David Black. 2001. The addition of explicit congestion notification (ECN) to IP. Technical Report.
[31]
Joel Sommers, Paul Barford, and Walter Willinger. 2006. A proposed framework for calibration of available bandwidth estimation tools. In Proceedings of IEEE ISCC. 709–718.
[32]
Jacob Strauss, Dina Katabi, and Frans Kaashoek. 2003. A measurement study of available bandwidth estimation tools. In Proceedings of ACM SIGCOMM. 39–44.
[33]
P Subramanya, K S Vinayaka, H L Gururaj, and B Ramesh. 2014. Performance evaluation of high speed TCP variants in dumbbell network. IOSR Journal of Computer Engineering 16, 2 (2014), 49–53.
[34]
Robert Williams and Bahadir Erimli. 2005. Method and apparatus for performing priority-based flow control. US Patent 6,957,269.
[35]
Yibo Zhu, Haggai Eran, Daniel Firestone, Chuanxiong Guo, Marina Lipshteyn, Yehonatan Liron, Jitendra Padhye, Shachar Raindel, Mohamad Haj Yahia, and Ming Zhang. 2015. Congestion control for large-scale RDMA deployments. In Proceedings of ACM SIGCOMM. 523–536.

Index Terms

  1. Coupling Congestion Control and Flow Pausing in Data Center Network

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing
    August 2024
    1279 pages
    ISBN:9798400717932
    DOI:10.1145/3673038
    This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2024

    Check for updates

    Author Tags

    1. Bandwidth Estimation
    2. Data Center
    3. Transport Protocol

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICPP '24

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 27
      Total Downloads
    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)27
    Reflects downloads up to 14 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media