skip to main content
research-article
Open access

High Performance Packet Processing with FlexNIC

Published: 25 March 2016 Publication History

Abstract

The recent surge of network I/O performance has put enormous pressure on memory and software I/O processing sub systems. We argue that the primary reason for high memory and processing overheads is the inefficient use of these resources by current commodity network interface cards (NICs). We propose FlexNIC, a flexible network DMA interface that can be used by operating systems and applications alike to reduce packet processing overheads. FlexNIC allows services to install packet processing rules into the NIC, which then executes simple operations on packets while exchanging them with host memory. Thus, our proposal moves some of the packet processing traditionally done in software to the NIC, where it can be done flexibly and at high speed.
We quantify the potential benefits of FlexNIC by emulating the proposed FlexNIC functionality with existing hardware or in software. We show that significant gains in application performance are possible, in terms of both latency and throughput, for several widely used applications, including a key-value store, a stream processing system, and an intrusion detection system.

References

[1]
http://ictf.cs.ucsb.edu/ictfdata/2010/dumps/.
[2]
http://memcached.org/.
[3]
G. Banga, P. Druschel, and J. C. Mogul. Resource containers: A new facility for resource management in server systems. In 3rd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1999.
[4]
B. W. Barrett, R. Brightwell, S. Hemmert, K. Pedretti, K. Wheeler, K. Underwood, R. Riesen, A. B. Maccabee, and T. Hudson. The Portals 4.0.1 Network Programming Interface. Sandia National Laboratories, sand2013--3181 edition, Apr. 2013.
[5]
A. Belay, G. Prekas, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion. IX: A protected dataplane operating system for high throughput and low latency. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.
[6]
N. L. Binkert, A. G. Saidi, and S. K. Reinhardt. Integrated network interfaces for high-bandwidth TCP/IP. In 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2006.
[7]
M. Blott, K. Karras, L. Liu, K. A. Vissers, J. Bar, and Z. István. Achieving 10Gbps line-rate key-value stores with FPGAs. In 5th USENIX Workshop on Hot Topics in Cloud Computing, HotCloud, 2013.
[8]
P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN. In ACM Conference on SIGCOMM, 2013.
[9]
P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker. P4: Programming protocol-independent packet processors. SIGCOMM Computer Communication Review, 44 (3): 87--95, July 2014.
[10]
Cavium Corporation. OCTEON II CN68XX multi-core MIPS64 processors. http://www.cavium.com/pdfFiles/CN68XX_PB_Rev1.pdf.
[11]
S. R. Chalamalasetti, K. Lim, M. Wright, A. AuYoung, P. Ranganathan, and M. Margala. An FPGA Memcached appliance. In 21st ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2013.
[12]
S. Di Girolamo, P. Jolivet, K. Underwood, and T. Hoefler. Exploiting offload enabled network interfaces. In 23rd IEEE Symposium on High Performance Interconnects, HOTI, 2015.
[13]
A. Dragojević, D. Narayanan, O. Hodson, and M. Castro. FaRM: Fast remote memory. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.
[14]
P. Druschel and G. Banga. Lazy receiver processing (LRP): A network subsystem architecture for server systems. In 2nd USENIX Symposium on Operating Systems Design and Implementation, OSDI, 1996.
[15]
P. Druschel, L. Peterson, and B. Davie. Experiences with a high-speed network adaptor: A software perspective. In ACM Conference on SIGCOMM, 1994.
[16]
M. Flajslik and M. Rosenblum. Network interface design for low latency request-response protocols. In 2013 USENIX Annual Technical Conference, ATC, 2013.
[17]
S. Floyd and E. Kohler. Profile for datagram congestion control protocol (DCCP) congestion control ID 2: TCP-like congestion control. RFC 4341, Mar. 2006.
[18]
S. Han, K. Jang, K. Park, and S. Moon. PacketShader: A GPU-accelerated software router. In ACM Conference on SIGCOMM, 2010.
[19]
S. Han, K. Jang, A. Panda, S. Palkar, D. Han, and S. Ratnasamy. SoftNIC: A software NIC to augment hardware. Technical Report UCB/EECS-2015--155, EECS Department, University of California, Berkeley, May 2015. http://www.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015--155.html.
[20]
R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network I/O. In 32nd Annual International Symposium on Computer Architecture, ISCA, 2005.
[21]
Intel Corporation. Intel data direct I/O technology (Intel DDIO): A primer, Feb. 2012. Revision 1.0. http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/data-direct-i-o-technology-brief.pdf.
[22]
Intel Corporation. Flow APIs for hardware offloads. Open vSwitch Fall Conference Talk, Nov. 2014. http://openvswitch.org/support/ovscon2014/18/1430-hardware-based-packet-processing.pdf.
[23]
Intel Corporation. Intel 82599 10 GbE controller datasheet, Oct. 2015. Revision 3.2. http://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82599--10-gbe-controller-datasheet.pdf.
[24]
K. Jang, S. Han, S. Han, S. Moon, and K. Park. SSLShader: Cheap SSL acceleration with commodity processors. In 8th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2011.
[25]
A. Kalia, M. Kaminsky, and D. G. Andersen. Using RDMA efficiently for key-value services. In ACM Conference on SIGCOMM, 2014.
[26]
S. Kim, S. Huh, X. Zhang, Y. Hu, A. Wated, E. Witchel, and M. Silberstein. GPUnet: Networking abstractions for GPU programs. In 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.
[27]
E. Kohler, M. Handley, and S. Floyd. Datagram congestion control protocol (DCCP). RFC 4340, Mar. 2006.
[28]
S. Kulkarni, N. Bhagat, M. Fu, V. Kedigehalli, C. Kellogg, S. Mittal, J. M. Patel, K. Ramasamy, and S. Taneja. Twitter Heron: Stream processing at scale. In 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2015.
[29]
J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection, June 2014. http://snap.stanford.edu/data.
[30]
H. Lim, D. Han, D. G. Andersen, and M. Kaminsky. MICA: A holistic approach to fast in-memory key-value storage. In 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2014.
[31]
N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. Openflow: Enabling innovation in campus networks. SIGCOMM Computer Communication Review, 38 (2): 69--74, Mar. 2008.
[32]
C. Mitchell, Y. Geng, and J. Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store. In 2013 USENIX Annual Technical Conference, ATC, 2013.
[33]
J. C. Mogul and K. K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems, 15 (3): 217--252, Aug. 1997.
[34]
]molka:sandybridgeperfD. Molka, D. Hackenberg, and R. Schöne. Main memory and cache performance of Intel Sandy Bridge and AMD Bulldozer. In 2014 Workshop on Memory Systems Performance and Correctness, MSPC, 2014.
[35]
Netronome. NFP-6xxx flow processor. https://netronome.com/product/nfp-6xxx/.
[36]
S. Novakovic, A. Daglis, E. Bugnion, B. Falsafi, and B. Grot. Scale-out NUMA. In ph19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS, 2014.
[37]
PCI-SIG. Atomic operations. PCI-SIG Engineering Change Notice, Jan. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_Atomic_Ops_080417.pdf.
[38]
PCI-SIG. TLP processing hints. PCI-SIG Engineering Change Notice, Sept. 2008. https://www.pcisig.com/specifications/pciexpress/specifications/ECN_TPH_11Sept08.pdf.
[39]
S. Peter, J. Li, I. Zhang, D. R. K. Ports, D. Woos, A. Krishnamurthy, T. Anderson, and T. Roscoe. Arrakis: The operating system is the control plane. In ph11th USENIX Symposium on Operating Systems Design and Implementation, OSDI, 2014.
[40]
B. Pfaff, J. Pettit, T. Koponen, E. Jackson, A. Zhou, J. Rajahalme, J. Gross, A. Wang, J. Stringer, P. Shelar, K. Amidon, and M. Casado. The design and implementation of Open vSwitch. In 12th USENIX Symposium on Networked Systems Design and Implementation, NSDI, 2015.
[41]
I. Pratt and K. Fraser. Arsenic: A user-accessible Gigabit Ethernet interface. In 20th IEEE International Conference on Computer Communications, INFOCOM, 2001.
[42]
RDMA Consortium. Architectural specifications for RDMA over TCP/IP. http://www.rdmaconsortium.org/.
[43]
M. Roesch. Snort - lightweight intrusion detection for networks. In 13th USENIX Conference on System Administration, LISA, 1999.
[44]
M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems, 10 (1): 26--52, Feb. 1992.
[45]
Y. Shan, B. Wang, J. Yan, Y. Wang, N. Xu, and H. Yang. FPMR: MapReduce framework on FPGA. In 18th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA, 2010.
[46]
P. Shinde, A. Kaufmann, T. Roscoe, and S. Kaestle. We need to talk about NICs. In 14th Workshop on Hot Topics in Operating Systems, HOTOS, 2013.
[47]
A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, R. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, J. Wanderer, U. Hölzle, S. Stuart, and A. Vahdat. Jupiter rising: A decade of Clos topologies and centralized control in Google's datacenter network. In ACM Conference on SIGCOMM, 2015.
[48]
W. Sun and R. Ricci. Fast and flexible: Parallel packet processing with GPUs and Click. In 9th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, ANCS, 2013.
[49]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@Twitter. In 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD, 2014.
[50]
T. von Eicken, A. Basu, V. Buch, and W. Vogels. U-Net: a user-level network interface for parallel and distributed computing. In 15th ACM Symposium on Operating Systems Principles, SOSP, 1995.
[51]
N. Zilberman, Y. Audzevich, G. Covington, and A. Moore. NetFPGA SUME: Toward 100 Gbps as research commodity. IEEE Micro, 34 (5): 32--41, Sept. 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News
ACM SIGARCH Computer Architecture News  Volume 44, Issue 2
ASPLOS'16
May 2016
774 pages
ISSN:0163-5964
DOI:10.1145/2980024
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
    March 2016
    824 pages
    ISBN:9781450340915
    DOI:10.1145/2872362
    • General Chair:
    • Tom Conte,
    • Program Chair:
    • Yuanyuan Zhou
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2016
Published in SIGARCH Volume 44, Issue 2

Check for updates

Author Tags

  1. DMA
  2. flexible network processing
  3. match-and-action processing
  4. network interface card

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)553
  • Downloads (Last 6 weeks)101
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media