BBQ: a fast and scalable integer priority queue for hardware packet scheduling
Article No.: 26, Pages 455 - 475
Abstract
The need for fairness, strong isolation, and fine-grained control over network traffic in multi-tenant cloud settings has engendered a rich literature on packet scheduling in switches and programmable hardware. Recent proposals for hardware scheduling primitives (e.g., PIFO, PIEO, BMW-Tree) have enabled run-time programmable packet schedulers, considerably expanding the suite of scheduling policies that can be applied to network traffic. However, no existing solution can be practically deployed on modern switches and NICs because they either do not scale to the number of elements required by these devices or fail to deliver good throughput, thus requiring an impractical number of replicas.
In this work, we ask: is it possible to achieve priority packet scheduling at line-rate while supporting a large number of flows? Our key insight is to leverage a scheduling primitive used previously in software - called Hierarchical Find First Set - and port this to a highly pipeline-parallel hardware design. We present the architecture and implementation of the Bitmapped Bucket Queue (BBQ), a hardware-based integer priority queue that supports a wide range of scheduling policies (via a PIFO-like abstraction). BBQ, for the first time, supports hundreds of thousands of concurrent flows while guaranteeing 100 Gbps line rate (148.8 Mpps) on FPGAs and 1 Tbps (1,488 Mpps) line rate on ASICs. We demonstrate this by implementing BBQ on a commodity FPGA where it is capable of supporting over 100K flows and 32K priorities at 300 MHz, 3× the packet rate of similar hardware priority queue designs. On ASIC, we can synthesize 100K elements at 3.1 GHz using a 7nm process.
References
[1]
Alexandru Agache, Razvan Deaconescu, and Costin Raiciu. Increasing datacenter network utilisation with GRIN. In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), pages 29-42, 2015.
[2]
Albert Gran Alcoz, Alexander Dietmüller, and Laurent Vanbever. SP-PIFO: Approximating Push-In First-Out behaviors using Strict-Priority queues. In 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI '20, pages 59-76, Santa Clara, CA, February 2020. USENIX Association.
[3]
Albert Gran Alcoz, Balázs Vass, Gábor Rétvári, and Laurent Vanbever. Everything matters in programmable packet scheduling. arXiv preprint arXiv:2308.00797, 2023.
[4]
Mohammad Alizadeh, Shuang Yang, Milad Sharif, Sachin Katti, Nick McKeown, Balaji Prabhakar, and Scott Shenker. pFabric: Minimal near-optimal datacenter transport. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, pages 435-446, New York, NY, USA, 2013. Association for Computing Machinery.
[5]
AMD. AMD EPYC 4th gen 9004 & 8004 series server processors - details, 2023. https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series.html#specs.
[6]
Mina Tahmasbi Arashloo, Alexey Lavrov, Manya Ghobadi, Jennifer Rexford, David Walker, and David Wentzlaff. Enabling programmable transport protocols in high-speed NICs. In 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI '20, pages 93-109, Santa Clara, CA, February 2020. USENIX Association.
[7]
Nirav Atre, Hugo Sadok, Erica Chiang, Weina Wang, and Justine Sherry. SurgeProtector: Mitigating temporal algorithmic complexity attacks using adversarial scheduling. In Proceedings of the ACM SIGCOMM 2022 Conference, SIGCOMM '22, pages 723-738, New York, NY, USA, August 2022. Association for Computing Machinery.
[8]
Wei Bai, Li Chen, Kai Chen, Dongsu Han, Chen Tian, and Weicheng Sun. Pias: Practical information-agnostic flow scheduling for data center networks. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks, HotNets-XIII, page 1-7, New York, NY, USA, 2014. Association for Computing Machinery.
[9]
J. C. R. Bennett and Hui Zhang. WF2Q: Worst-case fair weighted fair queueing. In Proceedings of IEEE INFOCOM '96. Conference on Computer Communications, volume 1 of INFOCOM '96, pages 120-128 vol.1, 1996.
[10]
R. Bhagwan and B. Lin. Fast and scalable priority queue architecture for high-speed network switches. In Proceedings IEEE INFOCOM 2000 Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Society, volume 2 of INFOCOM 2000, pages 538-547 vol.2, 2000.
[11]
Pat Bosshart, Glen Gibb, Hun-Seok Kim, George Varghese, Nick McKeown, Martin Izzard, Fernando Mujica, and Mark Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM, SIGCOMM '13, pages 99-110, New York, NY, USA, 2013. Association for Computing Machinery.
[12]
Jonathan Chang, Yen-Huei Chen, Wei-Min Chan, Sahil Preet Singh, Hank Cheng, Hidehiro Fujiwara, Jih-Yu Lin, Kao-Cheng Lin, John Hung, Robin Lee, Hung-Jen Liao, Jhon-Jhy Liaw, Quincy Li, Chih-Yung Lin, Mu-Chi Chiang, and Shien-Yang Wu. A 7nm 256Mb SRAM in high-k metal-gate FinFET technology with write-assist circuitry for low-VMIN applications. In 2017 IEEE International Solid-State Circuits Conference (ISSCC), pages 206-207, 2017.
[13]
Lawrence T. Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. ASAP7: A 7-nm finFET predictive process design kit. Microelectronics Journal, 53:105-115, 2016.
[14]
Daniel Firestone. VFP: A virtual switch platform for host SDN in the public cloud. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI '17, pages 315-328, Boston, MA, March 2017. USENIX Association.
[15]
Daniel Firestone, Andrew Putnam, Sambhrama Mundkur, Derek Chiou, Alireza Dabagh, Mike Andrewartha, Hari Angepat, Vivek Bhanu, Adrian Caulfield, Eric Chung, Harish Kumar Chandrappa, Somesh Chaturmohta, Matt Humphrey, Jack Lavier, Norman Lam, Fengfen Liu, Kalin Ovtcharov, Jitu Padhye, Gautham Popuri, Shachar Raindel, Tejas Sapre, Mark Shaw, Gabriel Silva, Madhan Sivakumar, Nisheeth Srivastava, Anshuman Verma, Qasim Zuhair, Deepak Bansal, Doug Burger, Kushagra Vaid, David A. Maltz, and Albert Greenberg. Azure accelerated networking: SmartNICs in the public cloud. In 15th USENIX Symposium on Networked Systems Design and Implementation, NSDI '18, pages 51-66, Renton, WA, April 2018. USENIX Association.
[16]
Alex Forencich, Alex C. Snoeren, George Porter, and George Papen. Corundum: An open-source 100-Gbps NIC. In 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM '20, pages 38-46. IEEE, 2020.
[17]
Peixuan Gao, Anthony Dalleggio, Yang Xu, and H. Jonathan Chao. Gearbox: A hierarchical packet scheduler for approximate weighted fair queuing. In 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI '22, pages 551-565, Renton, WA, April 2022. USENIX Association.
[18]
Matthew P Grosvenor, Malte Schwarzkopf, Ionel Gog, Robert NM Watson, Andrew W Moore, Steven Hand, and Jon Crowcroft. Queues don't matter when you can JUMP them! In 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15), pages 1-14, 2015.
[19]
Zhiqiang He, Dongyang Wang, Binzhang Fu, Kun Tan, Bei Hua, Zhi-Li Zhang, and Kai Zheng. MasQ: RDMA for virtual private cloud. In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication on the Applications, Technologies, Architectures, and Protocols for Computer Communication, SIGCOMM '20, pages 1-14, New York, NY, USA, 2020. Association for Computing Machinery.
[20]
Intel. Intel, Baidu drive intelligent infrastructure transformation, 2020. https://www.intel.com/content/www/us/en/newsroom/news/baidu-intelligent-infrastructuretransformation.html#gs.5vl4ru.
[21]
Intel. FPGA Design Software - Intel Quartus Prime, 2023. https://www.intel.com/content/www/us/en/products/details/fpga/developmenttools/quartus-prime.html.
[22]
Intel. Intel Agilex 7 FPGAs and SoCs product brief, 2023. https://www.intel.com/content/www/us/en/content-details/762901/intel-agilex-7-fpgas-and-socs-product-brief.html.
[23]
Intel. Intel infrastructure processing unit (Intel IPU) platform (codename: Oak Springs Canyon), 2023. https://www.intel.com/content/www/us/en/products/platforms/details/oak-springscanyon.html.
[24]
Intel. Intel Stratix 10 MX 2100 FPGA, 2023. https://ark.intel.com/content/www/us/en/ark/products/210297/intel-stratix-10-mx-2100-fpga.html.
[25]
Aggelos Ioannou and Manolis G. H. Katevenis. Pipelined heap (priority queue) management for advanced scheduling in high-speed networks. IEEE/ACM Transactions on Networking, 15(2):450-461, April 2007.
[26]
Jiaxin Lin, Kiran Patel, Brent E. Stephens, Anirudh Sivaraman, and Aditya Akella. PANIC: A high-performance programmable NIC for multi-tenant networks. In 14th USENIX Symposium on Operating Systems Design and Implementation, OSDI '20, pages 243-259. USENIX Association, November 2020.
[27]
Rui Miao, Lingjun Zhu, Shu Ma, Kun Qian, Shujun Zhuang, Bo Li, Shuguang Cheng, Jiaqi Gao, Yan Zhuang, Pengcheng Zhang, Rong Liu, Chao Shi, Binzhang Fu, Jiaji Zhu, Jiesheng Wu, Dennis Cai, and Hongqiang Harry Liu. From luna to solar: The evolutions of the compute-to-storage networks in Alibaba Cloud. In Proceedings of the ACM SIGCOMM 2022 Conference, SIGCOMM '22, pages 753-766, New York, NY, USA, 2022. Association for Computing Machinery.
[28]
Radhika Mittal, Rachit Agarwal, Sylvia Ratnasamy, and Scott Shenker. Universal packet scheduling. In 13th USENIX Symposium on Networked Systems Design and Implementation, NSDI '16, pages 501-521, 2016.
[29]
Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, and Scott Shenker. Recursively cautious congestion control. In 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pages 373-385, Seattle, WA, April 2014. USENIX Association.
[30]
Behnam Montazeri, Yilong Li, Mohammad Alizadeh, and John Ousterhout. Homa: A receiver-driven low-latency transport protocol using network priorities. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 221-235, 2018.
[31]
Ali Munir, Ghufran Baig, Syed Mohammad Irteza, Ihsan Ayyub Qazi, Alex X Liu, and Fahad Rafique Dogar. Pase: synthesizing existing transport strategies for near-optimal data center transport. IEEE/ACM Transactions on Networking, 25(1):320-334, 2016.
[32]
Nvidia. ConnectX-7 400G Adapters: Smart, accelerated networking for modern data center infrastructures, 2023. https://nvdam.widen.net/s/csf8rmnqwl/infiniband-ethernet-datasheet-connectx-7-ds-nv-us-2544471.
[33]
Nvidia. Nvidia spectrum sn4000 series switches, 2023. https://www.nvidia.com/content/dam/en-zz/Solutions/networking/br-sn4000-series.pdf.
[34]
Sivasankar Radhakrishnan, Yilong Geng, Vimalkumar Jeyakumar, Abdul Kabbani, George Porter, and Amin Vahdat. SENIC: Scalable NIC for end-host rate limiting. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI '14, pages 475-488, Seattle, WA, April 2014. USENIX Association.
[35]
Hugo Sadok, Nirav Atre, Zhipeng Zhao, Daniel S. Berger, James C. Hoe, Aurojit Panda, Justine Sherry, and Ren Wang. Ensō: A streaming interface for NIC-application communication. In 17th USENIX Symposium on Operating Systems Design and Implementation, OSDI '23, pages 1005-1025, Boston, MA, July 2023. USENIX Association.
[36]
Hugo Sadok, Aurojit Panda, and Justine Sherry. Of apples and oranges: Fair comparisons in heterogenous systems evaluation. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks, HotNets '23, pages 1-8, New York, NY, USA, 2023. Association for Computing Machinery.
[37]
Hugo Sadok, Zhipeng Zhao, Valerie Choung, Nirav Atre, Daniel S. Berger, James C. Hoe, Aurojit Panda, and Justine Sherry. We need kernel interposition over the network dataplane. In Proceedings of the Workshop on Hot Topics in Operating Systems, HotOS '21, pages 152-158, New York, NY, USA, 2021. Association for Computing Machinery.
[38]
Ahmed Saeed, Yimeng Zhao, Nandita Dukkipati, Ellen Zegura, Mostafa Ammar, Khaled Harras, and Amin Vahdat. Eiffel: Efficient and flexible software packet scheduling. In 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI '19, pages 17-32, Boston, MA, February 2019. USENIX Association.
[39]
Naveen Kr. Sharma, Chenxingyu Zhao, Ming Liu, Pravein G Kannan, Changhoon Kim, Arvind Krishnamurthy, and Anirudh Sivaraman. Programmable calendar queues for high-speed packet scheduling. In 17th USENIX Symposium on Networked Systems Design and Implementation, NSDI '20, pages 685-699, Santa Clara, CA, February 2020. USENIX Association.
[40]
Vishal Shrivastav. Fast, scalable, and programmable packet scheduler in hardware. In Proceedings of the 2019 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM '19, pages 367-379, New York, NY, USA, 2019. Association for Computing Machinery.
[41]
Anirudh Sivaraman, Suvinay Subramanian, Mohammad Alizadeh, Sharad Chole, Shang-Tse Chuang, Anurag Agrawal, Hari Balakrishnan, Tom Edsall, Sachin Katti, and Nick McKeown. Programmable packet scheduling at line rate. In Proceedings of the 2016 ACM SIGCOMM Conference, SIGCOMM '16, pages 44-57, New York, NY, USA, 2016. Association for Computing Machinery.
[42]
Brent Stephens, Aditya Akella, and Michael M. Swift. Loom: Flexible and efficient NIC packet scheduling. In 16th USENIX Symposium on Networked Systems Design and Implementation, NSDI '19, pages 33-46, Boston, MA, February 2019. USENIX Association.
[43]
Synopsys. Design Compiler, 2023. https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/dc-ultra.html.
[44]
Vinay Vashishtha, Manoj Vangala, and Lawrence T. Clark. ASAP7 predictive design kit development and cell design technology co-optimization: Invited paper. In 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD, pages 992-998, 2017.
[45]
Hao Wang and Bill Lin. Per-flow queue management with succinct priority indexing structures for high speed packet scheduling. IEEE Transactions on Parallel and Distributed Systems, 24(7):1380-1389, 2013.
[46]
Shien-Yang Wu, C.Y. Lin, M.C. Chiang, J.J. Liaw, J.Y. Cheng, S.H. Yang, C.H. Tsai, P.N. Chen, T. Miyashita, C.H. Chang, V.S. Chang, K.H. Pan, J.H. Chen, Y.S. Mor, K.T. Lai, C.S. Liang, H.F. Chen, S.Y. Chang, C.J. Lin, C.H. Hsieh, R.F. Tsui, C.H. Yao, C.C. Chen, R. Chen, C.H. Lee, H.J. Lin, C.W. Chang, K.W. Chen, M.H. Tsai, K.S. Chen, Y. Ku, and S. M. Jang. A 7nm CMOS platform technology featuring 4th generation FinFET transistors with a 0.027um2 high density 6-T SRAM cell for mobile SoC applications. In 2016 IEEE International Electron Devices Meeting (IEDM), pages 2.6.1-2.6.4, 2016.
[47]
Ruyi Yao, Zhiyu Zhang, Gaojian Fang, Peixuan Gao, Sen Liu, Yibo Fan, Yang Xu, and H. Jonathan Chao. BMW tree: Large-scale, high-throughput and modular PIFO implementation using balanced multi-way sorting tree. In Proceedings of the ACM SIGCOMM 2023 Conference, SIGCOMM '23, pages 208-219, New York, NY, USA, 2023. Association for Computing Machinery.
[48]
Zhuolong Yu, Chuheng Hu, Jingfeng Wu, Xiao Sun, Vladimir Braverman, Mosharaf Chowdhury, Zhenhua Liu, and Xin Jin. Programmable packet scheduling with a single queue. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference, SIGCOMM '21, pages 179-193, New York, NY, USA, August 2021. Association for Computing Machinery.
[49]
Danyang Zhuo, Kaiyuan Zhang, Yibo Zhu, Hongqiang Harry Liu, Matthew Rockett, Arvind Krishnamurthy, and Thomas Anderson. Slim: OS kernel support for a low-overhead container overlay network. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 331-344, 2019.
Index Terms
- BBQ: a fast and scalable integer priority queue for hardware packet scheduling
Index terms have been assigned to the content through auto-classification.
Recommendations
Synthesizable Standard Cell FPGA Fabrics Targetable by the Verilog-to-Routing CAD Flow
Special Section on Field Programmable Logic and Applications 2015 and Regular PapersIn this article, we consider implementing field-programmable gate arrays (FPGAs) using a standard cell design methodology and present a framework for the automated generation of synthesizable FPGA fabrics. The open-source Verilog-to-Routing (VTR) FPGA ...
Comments
Information & Contributors
Information
Published In
Copyright © 2024 The USENIX Association.
Sponsors
- Meta
- FUTUREWEI
- NSF
- Microsort
- Google Inc.
Publisher
USENIX Association
United States
Publication History
Published: 16 April 2024
Qualifiers
- Research-article
- Research
- Refereed limited
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- 0Total Citations
- 0Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025