research-article

Open access

High Performance Packet Processing with FlexNIC

Authors:

Antoine Kaufmann,

Naveen Kr. Sharma,

Thomas Anderson,

Arvind KrishnamurthyAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 44, Issue 2

Pages 67 - 81

https://doi.org/10.1145/2980024.2872367

Published: 25 March 2016 Publication History

Abstract

The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current commodity network interface cards (NICs). We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. FlexNIC allows services to install packet processing rules into the NIC, which then executes simple operations on packets while exchanging them with host memory. Thus, our proposal moves some of the packet processing traditionally done in software to the NIC, where it can be done flexibly and at high speed.

We quantify the potential benefits of FlexNIC by emulating the proposed FlexNIC functionality with existing hardware or in software. We show that significant gains in application performance are possible, in terms of both latency and throughput, for several widely used applications, including a key-value store, a stream processing system, and an intrusion detection system.

References

[1]

http://ictf.cs.ucsb.edu/ictfdata/2010/dumps/.

[2]

http://memcached.org/.

[3]

G. Banga, P. Druschel, and J. C. Mogul. Resource containers: A new facility for resource management in server systems. In 3rd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1999.

[4]

B. W. Barrett, R. Brightwell, S. Hemmert, K. Pedretti, K. Wheeler, K. Underwood, R. Riesen, A. B. Maccabee, and T. Hudson. The Portals 4.0.1 Network Programming Interface. Sandia National Laboratories, sand2013--3181 edition, Apr. 2013.

[5]

A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.

[6]

N. L. Binkert, A. G. Saidi, and S. K. Reinhardt. Integrated network interfaces for high-bandwidth TCP/IP. In 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2006.

Digital Library

[7]

M. Blott, K. Karras, L. Liu, K. A. Vissers, J. Bar, and Z. István. Achieving 10Gbps line-rate key-value stores with FPGAs. In 5th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud, 2013.

[8]

P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In ACM Conference on SIGCOMM, 2013.

Digital Library

[9]

P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker. P4: Programming protocol-independent packet processors. SIGCOMM Computer Communication Review, 44 (3): 87--95, July 2014.

Digital Library

[10]

Cavium Corporation. OCTEON II CN68XX multi-core MIPS64 processors. http://www.cavium.com/pdfFiles/CN68XX_PB_Rev1.pdf.

[11]

S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala. An FPGA Memcached appliance. In 21st ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2013.

Digital Library

[12]

S. Di Girolamo, P. Jolivet, K. Underwood, and T. Hoefler. Exploiting offload enabled network interfaces. In 23rd IEEE Symposium on High Performance Interconnects, HOTI, 2015.

Digital Library

[13]

A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.

[14]

P. Druschel and G. Banga. Lazy receiver processing (LRP): A network subsystem architecture for server systems. In 2nd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1996.

Digital Library

[15]

P. Druschel, L. Peterson, and B. Davie. Experiences with a high-speed network adaptor: A software perspective. In ACM Conference on SIGCOMM, 1994.

Digital Library

[16]

M. Flajslik and M. Rosenblum. Network interface design for low latency request-response protocols. In 2013 USENIX Annual Technical Conference, ATC, 2013.

[17]

S. Floyd and E. Kohler. Profile for datagram congestion control protocol (DCCP) congestion control ID 2: TCP-like congestion control. RFC 4341, Mar. 2006.

[18]

S. Han, K. Jang, K. Park, and S. Moon. PacketShader: A GPU-accelerated software router. In ACM Conference on SIGCOMM, 2010.

Digital Library

[19]

S. Han, K. Jang, A. Panda, S. Palkar, D. Han, and S. Ratnasamy. SoftNIC: A software NIC to augment hardware. Technical Report UCB/EECS-2015--155, EECS Department, University of California, Berkeley, May 2015. http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015--155.html.

[20]

R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In 32nd Annual International Symposium on Computer Architecture, ISCA, 2005.

Digital Library

[21]

Intel Corporation. Intel data direct I/O technology (Intel DDIO): A primer, Feb. 2012. Revision 1.0. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf.

[22]

Intel Corporation. Flow APIs for hardware offloads. Open vSwitch Fall Conference Talk, Nov. 2014. http://openvswitch.org/support/ovscon2014/18/1430-hardware-based-packet-processing.pdf.

[23]

Intel Corporation. Intel 82599 10 GbE controller datasheet, Oct. 2015. Revision 3.2. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599--10-gbe-controller-datasheet.pdf.

[24]

K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: Cheap SSL acceleration with commodity processors. In 8th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2011.

[25]

A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In ACM Conference on SIGCOMM, 2014.

Digital Library

[26]

S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, E. Witchel, and M. Silberstein. GPUnet: Networking abstractions for GPU programs. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.

[27]

E. Kohler, M. Handley, and S. Floyd. Datagram congestion control protocol (DCCP). RFC 4340, Mar. 2006.

[28]

S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter Heron: Stream processing at scale. In 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2015.

Digital Library

[29]

J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data.

[30]

H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.

[31]

N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: Enabling innovation in campus networks. SIGCOMM Computer Communication Review, 38 (2): 69--74, Mar. 2008.

Digital Library

[32]

C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In 2013 USENIX Annual Technical Conference, ATC, 2013.

[33]

J. C. Mogul and K. K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems, 15 (3): 217--252, Aug. 1997.

Digital Library

[34]

]molka:sandybridgeperfD. Molka, D. Hackenberg, and R. Schöne. Main memory and cache performance of Intel Sandy Bridge and AMD Bulldozer. In 2014 Workshop on Memory Systems Performance and Correctness, MSPC, 2014.

Digital Library

[35]

Netronome. NFP-6xxx flow processor. https://netronome.com/product/nfp-6xxx/.

[36]

S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot. Scale-out NUMA. In ph19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2014.

[37]

PCI-SIG. Atomic operations. PCI-SIG Engineering Change Notice, Jan. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf.

[38]

PCI-SIG. TLP processing hints. PCI-SIG Engineering Change Notice, Sept. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_TPH_11Sept08.pdf.

[39]

S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In ph11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.

[40]

B. Pfaff, J. Pettit, T. Koponen, E. Jackson, A. Zhou, J. Rajahalme, J. Gross, A. Wang, J. Stringer, P. Shelar, K. Amidon, and M. Casado. The design and implementation of Open vSwitch. In 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2015.

[41]

I. Pratt and K. Fraser. Arsenic: A user-accessible Gigabit Ethernet interface. In 20th IEEE International Conference on Computer Communications, INFOCOM, 2001.

[42]

RDMA Consortium. Architectural specifications for RDMA over TCP/IP. http://www.rdmaconsortium.org/.

[43]

M. Roesch. Snort - lightweight intrusion detection for networks. In 13th USENIX Conference on System Administration, LISA, 1999.

Digital Library

[44]

M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 10 (1): 26--52, Feb. 1992.

Digital Library

[45]

Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. FPMR: MapReduce framework on FPGA. In 18th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2010.

Digital Library

[46]

P. Shinde, A. Kaufmann, T. Roscoe, and S. Kaestle. We need to talk about NICs. In 14th Workshop on Hot Topics in Operating Systems, HOTOS, 2013.

[47]

A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat. Jupiter rising: A decade of Clos topologies and centralized control in Google's datacenter network. In ACM Conference on SIGCOMM, 2015.

Digital Library

[48]

W. Sun and R. Ricci. Fast and flexible: Parallel packet processing with GPUs and Click. In 9th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS, 2013.

[49]

A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@Twitter. In 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2014.

Digital Library

[50]

T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles, SOSP, 1995.

Digital Library

[51]

N. Zilberman, Y. Audzevich, G. Covington, and A. Moore. NetFPGA SUME: Toward 100 Gbps as research commodity. IEEE Micro, 34 (5): 32--41, Sept. 2014.

Cited By

Kulkarni CChandramouli BStutsman R(2021)Achieving high throughput and elasticity in a larger-than-memory storeProceedings of the VLDB Endowment10.14778/3457390.345740614:8(1427-1440)Online publication date: 21-Oct-2021
https://dl.acm.org/doi/10.14778/3457390.3457406
Wolnikowski AIbanez SStone JKim CManohar RSoulé RAngel SKasikci BKohler E(2021)ZerializerProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465283(206-212)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1145/3458336.3465283
Sadok HZhao ZChoung VAtre NBerger DHoe JPanda ASherry JAngel SKasikci BKohler E(2021)We need kernel interposition over the network dataplaneProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465281(152-158)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1145/3458336.3465281
Show More Cited By

Index Terms

High Performance Packet Processing with FlexNIC
1. Hardware
  1. Communication hardware, interfaces and storage
    1. Networking hardware
2. Networks
  1. Network architectures
  2. Network components
    1. End nodes
      1. Network adapters

Recommendations

High Performance Packet Processing with FlexNIC
ASPLOS '16

The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current ...
High Performance Packet Processing with FlexNIC
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems

The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current ...
FlexNIC: rethinking network DMA
HOTOS'15: Proceedings of the 15th USENIX conference on Hot Topics in Operating Systems

We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. The recent surge of network I/O performance has put enormous pressure on memory and software I/O ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 44, Issue 2

ASPLOS'16

May 2016

774 pages

ISSN:0163-5964

DOI:10.1145/2980024

Editor:
Doug DeGroot
acm dot org

Issue’s Table of Contents

ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
March 2016
824 pages
ISBN:9781450340915
DOI:10.1145/2872362
General Chair:
Tom Conte
Georgia Tech, USA
,
Program Chair:
Yuanyuan Zhou
University of California, San Diego, USA

Copyright © 2016 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016

Published in SIGARCH Volume 44, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

126
Total Citations
View Citations
3,958
Total Downloads

Downloads (Last 12 months)553
Downloads (Last 6 weeks)101

Reflects downloads up to 29 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kulkarni CChandramouli BStutsman R(2021)Achieving high throughput and elasticity in a larger-than-memory storeProceedings of the VLDB Endowment10.14778/3457390.345740614:8(1427-1440)Online publication date: 21-Oct-2021
https://dl.acm.org/doi/10.14778/3457390.3457406
Wolnikowski AIbanez SStone JKim CManohar RSoulé RAngel SKasikci BKohler E(2021)ZerializerProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465283(206-212)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1145/3458336.3465283
Sadok HZhao ZChoung VAtre NBerger DHoe JPanda ASherry JAngel SKasikci BKohler E(2021)We need kernel interposition over the network dataplaneProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465281(152-158)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1145/3458336.3465281
Di Girolamo SKurth ACalotoiu ABenz TSchneider TBeránek JBenini LHoefler TMartínez JDuato JJohn L(2021)A RISC-V in-network accelerator for flexible high-performance low-power packet processingProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00079(958-971)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00079
Yang JIzraelevitz JSwanson SBhagwan RPorter G(2020)FileMRProceedings of the 17th Usenix Conference on Networked Systems Design and Implementation10.5555/3388242.3388251(111-126)Online publication date: 25-Feb-2020
https://dl.acm.org/doi/10.5555/3388242.3388251
Yu HBi JSun C(2018)KeySchedProceedings of the ACM SIGCOMM 2018 Conference on Posters and Demos10.1145/3234200.3234248(45-47)Online publication date: 7-Aug-2018
https://dl.acm.org/doi/10.1145/3234200.3234248
Song XLu RGuo Z(2024)High-Performance Reconfigurable Pipeline Implementation for FPGA-Based SmartNICMicromachines10.3390/mi1504044915:4(449)Online publication date: 27-Mar-2024
https://doi.org/10.3390/mi15040449
Dong HArora SKrieger OAppavoo J(2024)Can OS Specialization give new life to old carbon in the cloud?Proceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689158(83-90)Online publication date: 16-Sep-2024
https://dl.acm.org/doi/10.1145/3688351.3689158
Li XJiang XYang YChen LWang YWang CXu CLv YYang BWu TGao HChen ZQiao YDing HDong YYang HSong JLu JZhang PWei CZhang ZChen WHe QZhu SSekar VYu MSeneviratne AVeitch D(2024)Triton: A Flexible Hardware Offloading Architecture for Accelerating Apsara vSwitch in Alibaba CloudProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672224(750-763)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672224
Lin WShan YKosta RKrishnamurthy AZhang YZhang ZPutnam A(2024)SuperNIC: An FPGA-Based, Cloud-Oriented SmartNICProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637564(130-141)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637564
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Issue’s Table of Contents