research-article

Free access

Fastpass: a centralized "zero-queue" datacenter network

Authors:

Jonathan Perry,

Amy Ousterhout,

Hari Balakrishnan,

Hans FugalAuthors Info & Claims

SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM

Pages 307 - 318

https://doi.org/10.1145/2619239.2626309

Published: 17 August 2014 Publication History

Abstract

An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control---to a centralized arbiter---of when each packet should be transmitted and what path it should follow.

This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook's datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240x reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200x reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5x reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.

References

[1]

Packet processing on intel architecture. http://www.intel.com/go/dpdk.

[2]

Intel 64 and IA-32 Architectures Optimization Reference Manual. Number 248966-029. March 2014.

[3]

M. Ajmone Marsan, E. Leonardi, M. Mellia, and F. Neri. On the stability of input-buffer cell switches with speed-up. In INFOCOM, 2000.

[4]

M. Al-Fares, A. Loukissas, and A. Vahdat. A Scalable, Commodity Data Center Network Architecture. In SIGCOMM, 2008.

Digital Library

[5]

M. Al-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat. Hedera: Dynamic Flow Scheduling for Data Center Networks. In NSDI, 2010.

Digital Library

[6]

M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, M. Sridharan, C. Faster, and D. Maltz. DCTCP: Efficient Packet Transport for the Commoditized Data Center. In SIGCOMM, 2010.

[7]

M. Alizadeh, A. Kabbani, T. Edsall, B. Prabhakar, A. Vahdat, and M. Yasuda. Less is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center. In NSDI, 2012.

Digital Library

[8]

M. Alizadeh, S. Yang, S. Katti, N. McKeown, B. Prabhakar, and S. Shenker. Deconstructing Datacenter Packet Transport. In HotNets, 2012.

Digital Library

[9]

T. E. Anderson, S. S. Owicki, J. B. Saxe, and C. P. Thacker. High-Speed Switch Scheduling for Local-Area Networks. ACM Trans. on Comp. Sys., 11(4):319--352, 1993.

Digital Library

[10]

L. A. Barroso, J. Dean, and U. Holzle. Web Search for a Planet: The Google Cluster Architecture. IEEE Micro, 23(2):22--28, 2003.

Digital Library

[11]

M. Chowdhury, M. Zaharia, J. Ma, M. Jordan, and I. Stoica. Managing Data Transfers in Computer Clusters with Orchestra. In SIGCOMM, 2011.

Digital Library

[12]

R. Cole, K. Ost, and S. Schirra. Edge-Coloring Bipartite Multigraphs in O(E log D) Time. Combinatorica, 21(1):5--12, 2001.

[13]

J. Dai and B. Prabhakar. The throughput of data switches with and without speedup. In INFOCOM, 2000.

[14]

J. Duato, S. Yalamanchili, and L. Ni. Interconnection Networks. Morgan Kaufmann, 2003.

[15]

A. Elwalid, C. Jin, S. Low, and I. Widjaja. MATE: MPLS Adaptive Traffic Engineering. In INFOCOM, 2001.

[16]

N. Farrington and A. Andreyev. Facebook's Data Center Network Architecture. In IEEE Optical Interconnects Conf., 2013.

[17]

N. Farrington, G. Porter, Y. Fainman, G. Papen, and A. Vahdat. Hunting Mice with Microsecond Circuit Switches. In HotNets, 2012.

Digital Library

[18]

A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A. Maltz, P. Patel, and S. Sengupta. VL2: A Scalable and Flexible Data Center Network. In SIGCOMM, 2009.

Digital Library

[19]

C.-Y. Hong, S. Kandula, R. Mahajan, M. Zhang, V. Gill, M. Nanduri, and R. Wattenhofer. Achieving High Utilization with Software-Driven WAN. In SIGCOMM, 2013.

Digital Library

[20]

Hong, C. Y. and Caesar, M. and Godfrey, P. Finishing Flows Quickly with Preemptive Scheduling. SIGCOMM, 2012.

Digital Library

[21]

F. Hwang. Control Algorithms for Rearrangeable Clos Networks. IEEE Trans. on Comm., 31(8):952--954, 1983.

[22]

V. Jeyakumar, M. Alizadeh, D. Mazieres, B. Prabhakar, and C. Kim. EyeQ: Practical Network Performance Isolation for the Multi-Tenant Cloud. In HotCloud, 2012.

Digital Library

[23]

A. Kapoor and R. Rizzi. Edge-coloring bipartite graphs. Journal of Algorithms, 34(2):390--396, 2000.

Digital Library

[24]

N. McKeown. The iSLIP Scheduling Algorithm for Input-Queued Switches. IEEE/ACM Trans. on Net., 7(2):188--201, 1999.

Digital Library

[25]

N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Trans. Comm., 47(8):1260--1267, 1999.

[26]

D. Nagle, D. Serenyi, and A. Matthews. The Panasas ActiveScale Storage Cluster: Delivering Scalable High Bandwidth Storage. In Supercomputing, 2004.

Digital Library

[27]

R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri, S. Radhakrishnan, V. Subramanya, and A. Vahdat. PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric. In SIGCOMM, 2009.

Digital Library

[28]

R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C. Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, et al. Scaling memcache at facebook. In NSDI, 2013.

Digital Library

[29]

P. Ohly, D. N. Lombard, and K. B. Stanton. Hardware Assisted Precision Time Protocol. Design and Case Study. In LCI Intl. Conf. on High-Perf. Clustered Comp., 2008.

[30]

D. Shah. Maximal matching scheduling is good enough. In GLOBECOM, 2003.

[31]

D. Shah, N. Walton, and Y. Zhong. Optimal Queue-Size Scaling in Switched Networks. In SIGMETRICS, 2012.

Digital Library

[32]

A. Shieh, S. Kandula, A. Greenberg, C. Kim, and B. Saha. Sharing the Data Center Network. In NSDI, 2011.

Digital Library

[33]

R. Takano, T. Kudoh, Y. Kodama, and F. Okazaki. High-Resolution Timer-Based Packet Pacing Mechanism on the Linux Operating System. IEICE Trans. on Comm., 2011.

[34]

Y. Tamir and H.-C. Chi. Symmetric crossbar arbiters for VLSI communication switches. IEEE Trans. Par. Dist. Sys., 4(1):13--27, 1993.

Digital Library

[35]

B. C. Vattikonda, G. Porter, A. Vahdat, and A. C. Snoeren. Practical TDMA for Datacenter Ethernet. In EuroSys, 2012.

Digital Library

[36]

C. Wilson, H. Ballani, T. Karagiannis, and A. Rowtron. Better Never Than Late: Meeting Deadlines in Datacenter Networks. In SIGCOMM, 2011.

Digital Library

[37]

H. Wu, Z. Feng, C. Guo, and Y. Zhang. ICTCP: Incast Congestion Control for TCP in Data Center Networks. In CoNext, 2010.

Digital Library

[38]

X. Wu and X. Yang. DARD: Distributed Adaptive Routing for Datacenter Networks. In ICDCS, 2012.

Digital Library

Cited By

Rajasekaran SNarang SZabreyko AGhobadi M(2024)MLTCP: A Distributed Technique to Approximate Centralized Flow Scheduling For Machine LearningProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696878(167-176)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696878
Bothra RArun VGodfrey BNarayan ASaeed A(2024)Lightweight Automated Reasoning for Network ArchitecturesProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696865(237-245)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696865
Seyhani AZhao JGupta AWalker DArashloo M(2024)Buffy: A Formal Language-Based Framework for Network Performance AnalysisProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696854(95-102)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696854
Show More Cited By

Index Terms

Fastpass: a centralized "zero-queue" datacenter network
1. Networks
  1. Network protocols
  2. Network types
    1. Packet-switching networks

Recommendations

Homa: a receiver-driven low-latency transport protocol using network priorities
SIGCOMM '18: Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication

Homa is a new transport protocol for datacenter networks. It provides exceptionally low latency, especially for workloads with a high volume of very short messages, and it also supports large messages and high network utilization. Homa uses in-network ...
Fastpass: a centralized "zero-queue" datacenter network
SIGCOMM'14

An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) ...
Friends, not foes: synthesizing existing transport strategies for data center networks
SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM

Many data center transports have been proposed in recent times (e.g., DCTCP, PDQ, pFabric, etc). Contrary to the common perception that they are competitors (i.e., protocol A vs. protocol B), we claim that the underlying strategies used in these ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM

August 2014

662 pages

ISBN:9781450328364

DOI:10.1145/2619239

General Chairs:
Fabián E. Bustamante
Northwestern University, USA
,
Y. Charlie Hu
Purdue University, USA
,
Program Chairs:
Arvind Krishnamurthy
University of Washington, USA
,
Sylvia Ratnasamy
University of California, Berkeley, USA

ACM SIGCOMM Computer Communication Review Volume 44, Issue 4
SIGCOMM'14
October 2014
672 pages
ISSN:0146-4833
DOI:10.1145/2740070
Editors:
Konstantina Papagiannaki
Telefonica Research, Barcelona, Spain
,
Katerina Argyraki
EPFL, Switzerland
,
Hitesh Ballani
Microsoft Research Cambridge, UK
,
Fabián Bustamante
Northwestern University, USA
,
Joseph Camp
SMU, USA
,
Augustin Chaintreau
Columbia University, USA
,
Phillipa Gill
Stony Brook University, USA
,
Marco Mellia
Politecnico di Torino, Italy
,
Bhaskaran Raman
IIT Bombay, India
,
Joel Sommers
Colgate University, USA
,
Aline Carneiro Viana
INRIA, France
Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 August 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Division of Information and Intelligent Systems

Conference

SIGCOMM'14

Sponsor:

SIGCOMM

SIGCOMM'14: ACM SIGCOMM 2014 Conference

August 17 - 22, 2014

Illinois, Chicago, USA

Acceptance Rates

SIGCOMM '14 Paper Acceptance Rate 45 of 242 submissions, 19%;

Overall Acceptance Rate 462 of 3,389 submissions, 14%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

362
Total Citations
View Citations
4,490
Total Downloads

Downloads (Last 12 months)530
Downloads (Last 6 weeks)73

Reflects downloads up to 25 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rajasekaran SNarang SZabreyko AGhobadi M(2024)MLTCP: A Distributed Technique to Approximate Centralized Flow Scheduling For Machine LearningProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696878(167-176)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696878
Bothra RArun VGodfrey BNarayan ASaeed A(2024)Lightweight Automated Reasoning for Network ArchitecturesProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696865(237-245)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696865
Seyhani AZhao JGupta AWalker DArashloo M(2024)Buffy: A Formal Language-Based Framework for Network Performance AnalysisProceedings of the 23rd ACM Workshop on Hot Topics in Networks10.1145/3696348.3696854(95-102)Online publication date: 18-Nov-2024
https://dl.acm.org/doi/10.1145/3696348.3696854
Wang ZWan XLi LSun YXie PWei XNing QZhang JChen KSekar VYu MSeneviratne AVeitch D(2024)Fast, Scalable, and Accurate Rate Limiter for RDMA NICsProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672215(568-580)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672215
Li ZLi DChen YChen KZhang Y(2024)Decentralized Scheduling for Data-Parallel Tasks in the CloudACM Transactions on Parallel Computing10.1145/365185811:2(1-23)Online publication date: 8-Jun-2024
https://dl.acm.org/doi/10.1145/3651858
Azorin RMonterubbiano ACastellano GGallo MPontarelli SRossi D(2024)Taming the Elephants: Affordable Flow Length Prediction in the Data PlaneProceedings of the ACM on Networking10.1145/36494732:CoNEXT1(1-24)Online publication date: 28-Mar-2024
https://dl.acm.org/doi/10.1145/3649473
Liu ZZhao YFan ZYang TLi XZhang RYang KJiang ZZhong ZHuang YLiu CHu JXie GCui B(2024)BurstBalancer: Do Less, Better Balance for Large-Scale Data Center TrafficIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.329545435:6(932-949)Online publication date: Jun-2024
https://doi.org/10.1109/TPDS.2023.3295454
Zhao XWu CZhu X(2024)Dynamic Flow Scheduling for DNN Training Workloads in Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345067021:6(6643-6657)Online publication date: Dec-2024
https://doi.org/10.1109/TNSM.2024.3450670
Zhuang RHan JXue KLi JSun QLu J(2024)ProactMP: A Proactive Multipath Transport Protocol for Low-Latency DatacentersIEEE Transactions on Network and Service Management10.1109/TNSM.2024.339902821:4(3919-3932)Online publication date: Aug-2024
https://doi.org/10.1109/TNSM.2024.3399028
Tao XQian XHan LFan WShi YZhu XLi ZWei SXu R(2024)Key Flow First Prioritized Flow Scheduling Strategy in Multi-Tenant Data CentersIEEE Transactions on Network and Service Management10.1109/TNSM.2024.336414921:3(3264-3277)Online publication date: Jun-2024
https://doi.org/10.1109/TNSM.2024.3364149
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents