research-article

Fastpass: a centralized "zero-queue" datacenter network

Authors:

Jonathan Perry,

Amy Ousterhout,

Hari Balakrishnan,

Hans FugalAuthors Info & Claims

ACM SIGCOMM Computer Communication Review, Volume 44, Issue 4

Pages 307 - 318

https://doi.org/10.1145/2740070.2626309

Published: 17 August 2014 Publication History

Abstract

An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control---to a centralized arbiter---of when each packet should be transmitted and what path it should follow.

This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook's datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240x reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200x reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5x reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.

References

[1]

Packet processing on intel architecture. http://www.intel.com/go/dpdk.

[2]

Intel 64 and IA-32 Architectures Optimization Reference Manual. Number 248966-029. March 2014.

[3]

M. Ajmone Marsan, E. Leonardi, M. Mellia, and F. Neri. On the stability of input-buffer cell switches with speed-up. In INFOCOM, 2000.

[4]

M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM, 2008.

Digital Library

[5]

M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI, 2010.

Digital Library

[6]

M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, M. Sridharan, C. Faster, and D. Maltz. DCTCP: Efficient Packet Transport for the Commoditized Data Center. In SIGCOMM, 2010.

[7]

M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In NSDI, 2012.

Digital Library

[8]

M. Alizadeh, S. Yang, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. Deconstructing Datacenter Packet Transport. In HotNets, 2012.

Digital Library

[9]

T. E. Anderson, S. S. Owicki, J. B. Saxe, and C. P. Thacker. High-Speed Switch Scheduling for Local-Area Networks. ACM Trans. on Comp. Sys., 11(4):319--352, 1993.

Digital Library

[10]

L. A. Barroso, J. Dean, and U. Holzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, 23(2):22--28, 2003.

Digital Library

[11]

M. Chowdhury, M. Zaharia, J. Ma, M. Jordan, and I. Stoica. Managing Data Transfers in Computer Clusters with Orchestra. In SIGCOMM, 2011.

Digital Library

[12]

R. Cole, K. Ost, and S. Schirra. Edge-Coloring Bipartite Multigraphs in O(E log D) Time. Combinatorica, 21(1):5--12, 2001.

[13]

J. Dai and B. Prabhakar. The throughput of data switches with and without speedup. In INFOCOM, 2000.

[14]

J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks. Morgan Kaufmann, 2003.

[15]

A. Elwalid, C. Jin, S. Low, and I. Widjaja. MATE: MPLS Adaptive Traffic Engineering. In INFOCOM, 2001.

[16]

N. Farrington and A. Andreyev. Facebook's Data Center Network Architecture. In IEEE Optical Interconnects Conf., 2013.

[17]

N. Farrington, G. Porter, Y. Fainman, G. Papen, and A. Vahdat. Hunting Mice with Microsecond Circuit Switches. In HotNets, 2012.

Digital Library

[18]

A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009.

Digital Library

[19]

C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer. Achieving High Utilization with Software-Driven WAN. In SIGCOMM, 2013.

Digital Library

[20]

Hong, C. Y. and Caesar, M. and Godfrey, P. Finishing Flows Quickly with Preemptive Scheduling. SIGCOMM, 2012.

Digital Library

[21]

F. Hwang. Control Algorithms for Rearrangeable Clos Networks. IEEE Trans. on Comm., 31(8):952--954, 1983.

[22]

V. Jeyakumar, M. Alizadeh, D. Mazieres, B. Prabhakar, and C. Kim. EyeQ: Practical Network Performance Isolation for the Multi-Tenant Cloud. In HotCloud, 2012.

Digital Library

[23]

A. Kapoor and R. Rizzi. Edge-coloring bipartite graphs. Journal of Algorithms, 34(2):390--396, 2000.

Digital Library

[24]

N. McKeown. The iSLIP Scheduling Algorithm for Input-Queued Switches. IEEE/ACM Trans. on Net., 7(2):188--201, 1999.

Digital Library

[25]

N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Trans. Comm., 47(8):1260--1267, 1999.

[26]

D. Nagle, D. Serenyi, and A. Matthews. The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage. In Supercomputing, 2004.

Digital Library

[27]

R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. In SIGCOMM, 2009.

Digital Library

[28]

R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, et al. Scaling memcache at facebook. In NSDI, 2013.

Digital Library

[29]

P. Ohly, D. N. Lombard, and K. B. Stanton. Hardware Assisted Precision Time Protocol. Design and Case Study. In LCI Intl. Conf. on High-Perf. Clustered Comp., 2008.

[30]

D. Shah. Maximal matching scheduling is good enough. In GLOBECOM, 2003.

[31]

D. Shah, N. Walton, and Y. Zhong. Optimal Queue-Size Scaling in Switched Networks. In SIGMETRICS, 2012.

Digital Library

[32]

A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha. Sharing the Data Center Network. In NSDI, 2011.

Digital Library

[33]

R. Takano, T. Kudoh, Y. Kodama, and F. Okazaki. High-Resolution Timer-Based Packet Pacing Mechanism on the Linux Operating System. IEICE Trans. on Comm., 2011.

[34]

Y. Tamir and H.-C. Chi. Symmetric crossbar arbiters for VLSI communication switches. IEEE Trans. Par. Dist. Sys., 4(1):13--27, 1993.

Digital Library

[35]

B. C. Vattikonda, G. Porter, A. Vahdat, and A. C. Snoeren. Practical TDMA for Datacenter Ethernet. In EuroSys, 2012.

Digital Library

[36]

C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never Than Late: Meeting Deadlines in Datacenter Networks. In SIGCOMM, 2011.

Digital Library

[37]

H. Wu, Z. Feng, C. Guo, and Y. Zhang. ICTCP: Incast Congestion Control for TCP in Data Center Networks. In CoNext, 2010.

Digital Library

[38]

X. Wu and X. Yang. DARD: Distributed Adaptive Routing for Datacenter Networks. In ICDCS, 2012.

Digital Library

Cited By

Gao WZhong JPeng CLi XLiao X(2024)Fine-grained load balancing with proactive prediction and adaptive rerouting in data centerJournal of High Speed Networks10.3233/JHS-23000330:1(83-96)Online publication date: 10-Jan-2024
https://doi.org/10.3233/JHS-230003
Canel CMadhavan BSundaresan SSpring NKannan PZhang YLin KSeshan SVallina-Rodríguez NSuarez-Tángil GLevin DPelsser C(2024)Understanding Incast Bursts in Modern DatacentersProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689028(674-680)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3646547.3689028
Ye JYu TLi ZHuang J(2024)SAR: Receiver-Driven Transport Protocol With Micro-Burst Prediction in Data Center NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345059721:6(6409-6422)Online publication date: Dec-2024
https://doi.org/10.1109/TNSM.2024.3450597
Show More Cited By

Index Terms

Fastpass: a centralized "zero-queue" datacenter network
1. Networks
  1. Network protocols
  2. Network types
    1. Packet-switching networks

Recommendations

Homa: a receiver-driven low-latency transport protocol using network priorities
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication

Homa is a new transport protocol for datacenter networks. It provides exceptionally low latency, especially for workloads with a high volume of very short messages, and it also supports large messages and high network utilization. Homa uses in-network ...
Fastpass: a centralized "zero-queue" datacenter network
SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM

An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) ...
Friends, not foes: synthesizing existing transport strategies for data center networks
SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM

Many data center transports have been proposed in recent times (e.g., DCTCP, PDQ, pFabric, etc). Contrary to the common perception that they are competitors (i.e., protocol A vs. protocol B), we claim that the underlying strategies used in these ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGCOMM Computer Communication Review

ACM SIGCOMM Computer Communication Review Volume 44, Issue 4

SIGCOMM'14

October 2014

672 pages

ISSN:0146-4833

DOI:10.1145/2740070

Editors:
Konstantina Papagiannaki
Telefonica Research, Barcelona, Spain
,
Katerina Argyraki
EPFL, Switzerland
,
Hitesh Ballani
Microsoft Research Cambridge, UK
,
Fabián Bustamante
Northwestern University, USA
,
Joseph Camp
SMU, USA
,
Augustin Chaintreau
Columbia University, USA
,
Phillipa Gill
Stony Brook University, USA
,
Marco Mellia
Politecnico di Torino, Italy
,
Bhaskaran Raman
IIT Bombay, India
,
Joel Sommers
Colgate University, USA
,
Aline Carneiro Viana
INRIA, France

Issue’s Table of Contents

SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM
August 2014
662 pages
ISBN:9781450328364
DOI:10.1145/2619239
General Chairs:
Fabián E. Bustamante
Northwestern University, USA
,
Y. Charlie Hu
Purdue University, USA
,
Program Chairs:
Arvind Krishnamurthy
University of Washington, USA
,
Sylvia Ratnasamy
University of California, Berkeley, USA

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2014

Published in SIGCOMM-CCR Volume 44, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Information and Intelligent Systems

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

362
Total Citations
View Citations
4,496
Total Downloads

Downloads (Last 12 months)536
Downloads (Last 6 weeks)71

Reflects downloads up to 27 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao WZhong JPeng CLi XLiao X(2024)Fine-grained load balancing with proactive prediction and adaptive rerouting in data centerJournal of High Speed Networks10.3233/JHS-23000330:1(83-96)Online publication date: 10-Jan-2024
https://doi.org/10.3233/JHS-230003
Canel CMadhavan BSundaresan SSpring NKannan PZhang YLin KSeshan SVallina-Rodríguez NSuarez-Tángil GLevin DPelsser C(2024)Understanding Incast Bursts in Modern DatacentersProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689028(674-680)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3646547.3689028
Ye JYu TLi ZHuang J(2024)SAR: Receiver-Driven Transport Protocol With Micro-Burst Prediction in Data Center NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345059721:6(6409-6422)Online publication date: Dec-2024
https://doi.org/10.1109/TNSM.2024.3450597
Jiang XWang ZYang XJiao YYu TWang XFu WSun YSun Z(2024)Hebo: FPGA-based Transfer Time Planning for Volatile Traffic in TSN2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS)10.1109/IWQoS61813.2024.10682946(1-10)Online publication date: 19-Jun-2024
https://doi.org/10.1109/IWQoS61813.2024.10682946
Wan ZZhang JYu MLiu JYao JZhao XHuang T(2024)BiCC: Bilateral Congestion Control in Cross-datacenter RDMA NetworksIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621412(1381-1390)Online publication date: 20-May-2024
https://doi.org/10.1109/INFOCOM52122.2024.10621412
Guo FSun SHu JZhang NLv Z(2024)CIPO: Efficient, lightweight and programmable packet schedulingComputer Networks10.1016/j.comnet.2024.110355245(110355)Online publication date: May-2024
https://doi.org/10.1016/j.comnet.2024.110355
Picone MMamei MZambonelli F(2023)A Flexible and Modular Architecture for Edge Digital Twin: Implementation and EvaluationACM Transactions on Internet of Things10.1145/35732064:1(1-32)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1145/3573206
Toczé KFahs APierre GNadjm-Tehrani S(2023)VioLinn: Proximity-aware Edge Placementwith Dynamic and Elastic Resource ProvisioningACM Transactions on Internet of Things10.1145/35731254:1(1-31)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1145/3573125
Lubbers MKoopman PRamsingh ASinger JTrinder P(2023)Could Tierless Languages Reduce IoT Development Grief?ACM Transactions on Internet of Things10.1145/35729014:1(1-35)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1145/3572901
Zhao LGuo DXie JLuo LShen Y(2023)A Closed-loop Hybrid Supervision Framework of Cryptocurrency Transactions for Data Trading in IoTACM Transactions on Internet of Things10.1145/35681714:1(1-26)Online publication date: 23-Feb-2023
https://dl.acm.org/doi/10.1145/3568171
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents