research-article

Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows

Authors:

Nathan Hanford,

Matthew K. Farrens,

Brian TierneyAuthors Info & Claims

NDM '13: Proceedings of the Third International Workshop on Network-Aware Data Management

Article No.: 1, Pages 1 - 10

https://doi.org/10.1145/2534695.2534697

Published: 17 November 2013 Publication History

Abstract

Multi-core end-systems use Receive Side Scaling (RSS) to parallelize protocol processing. RSS uses a hash function on the standard flow descriptors and an indirection table to assign incoming packets to receive queues which are pinned to specific cores. This ensures flow affinity in that the interrupt processing of all packets belonging to a specific flow is processed by the same core. A key limitation of standard RSS is that it does not consider the application process that consumes the incoming data in determining the flow affinity. In this paper, we carry out a detailed experimental analysis of the performance impact of the application affinity in a 40 Gbps testbed network with a dual hexa-core end-system. We show, contrary to conventional wisdom, that when the application process and the flow are affinitized to the same core, the performance (measured in terms of end-to-end TCP throughput) is significantly lower than the line rate. Near line rate performance is observed when the flow and the application process are affinitized to different cores belonging to the same socket. Furthermore, affinitizing the application and the flow to cores on different sockets results in significantly lower throughput than the line rate. These results arise due to the memory bottleneck, which is demonstrated using preliminary correlational data on the cache hit rate in the core that services the application process.

References

[1]

V. Ahuja, M. Farrens, and D. Ghosal. Cache-aware affinitization on commodity multicores for high-speed network flows. In Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems, pages 39--48. ACM, 2012.

Digital Library

[2]

M. Balakrishnan. Reliable Communication for Datacenters. PhD thesis, Cornell University, 2009.

Digital Library

[3]

I. S. Bridge. Sandy bridge architecture, http://en.wikipedia.org/wiki/SandyBridge/.

[4]

A. Foong, J. Fung, D. Newell, S. Abraham, P. Irelan, and A. Lopez-Estrada. Architectural characterization of processor affinity in network processing. In Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on, pages 207--218. IEEE, 2005.

Digital Library

[5]

T. Herbert. rfs: receive flow steering, september 2010. http://lwn.net/Articles/381955/.

[6]

T. Herbert. rps: receive packet steering, september 2010. http://lwn.net/Articles/361440/.

[7]

R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network i/o. In ACM SIGARCH Computer Architecture News, volume 33, pages 50--59. IEEE Computer Society, 2005.

Digital Library

[8]

S. Jana, A. Pande, A. Chan, and P. Mohapatra. Network characterization and perceptual evaluation of skype mobile videos. In 22nd International Conference on Computer Communications and Networks (ICCCN), 2013.

[9]

H. Jang and H. Jin. Miami: Multi-core aware processor affinity for tcp/ip over multiple network interfaces. In High Performance Interconnects, 2009. HOTI 2009. 17th IEEE Symposium on, pages 73--82. IEEE, 2009.

Digital Library

[10]

A. Kumar, R. Huggahalli, and S. Makineni. Characterization of direct cache access on multi-core systems and 10gbe. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, pages 341--352, feb. 2009.

[11]

E. León, K. Ferreira, and A. Maccabe. Reducing the impact of the memorywall for i/o using cache injection. In High-Performance Interconnects, 2007. HOTI 2007. 15th Annual IEEE Symposium on, pages 143--150. IEEE, 2007.

Digital Library

[12]

E. León, R. Riesen, K. Ferreira, and A. Maccabe. Cache injection for parallel applications. proc. HDPC'11, pages 15--26, 2011.

Digital Library

[13]

G. Liao, D. Guo, L. Bhuyan, and S. King. Software techniques to improve virtualized i/o performance on multi-core systems. In Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, pages 161--170. ACM, 2008.

Digital Library

[14]

G. Liao, X. Zhu, and L. Bnuyan. A new server i/o architecture for high speed networks. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, pages 255--265. IEEE, 2011.

Digital Library

[15]

T. Marian. Operating systems abstractions for software packet processing in datacenters. PhD thesis, Cornell University, 2011.

Digital Library

[16]

T. Marian, D. Freedman, K. Birman, and H. Weatherspoon. Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on, pages 575--584. IEEE, 2010.

[17]

J. Mogul and K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems (TOCS), 15(3): 217--252, 1997.

Digital Library

[18]

G. Narayanaswamy, P. Balaji, and W. Feng. Impact of network sharing in multi-core architectures. In Computer Communications and Networks, 2008. ICCCN'08. Proceedings of 17th International Conference on, pages 1--6. IEEE, 2008.

[19]

E. S. Network. Esnet, http://www.es.net/.

[20]

S. Networking. Eliminating the Receive Processing Bottleneckâ Introducing RSS. Microsoft WinHEC (April 2004), 2004.

[21]

V. Omwando, A. Pande, Y. Zeng, and P. Mohapatra. Evaluating perceptual video quality in 802.11n wlan with mobile clients. In The 8th ACM International Workshop on Wireless Network Testbeds, Experimental Evaluation and Characterization (ACM WiNTECH) 2013, pages --, 2013.

Digital Library

[22]

A. Pande and J. Zambreno. Efficient translation of algorithmic kernels on large-scale multi-cores. In Computational Science and Engineering, 2009. CSE'09. International Conference on, volume 2, pages 915--920. IEEE, 2009.

Digital Library

[23]

RSS. Scaling in the Linux Networking Stack. https://www.kernel.org/doc/Documentation/networking/scaling.txt.

[24]

J. Salim. When napi comes to town. In Linux 2005 Conf, 2005.

[25]

L. Shalev, J. Satran, E. Borovik, and M. Ben-Yehuda. Isostack: highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[26]

SuperMicro. Supermicro x9dr3-f user's manual, http://www.supermicro.com/products/motherboard/xeon/c600/x9dr3-f.cfm.

[27]

K. Wehrle, F. Pählke, H. Ritter, D. Müller, and M. Bechler. The linux networking architecture. Design and Implementation of Network Protocols in the Linux Kernel, 2005.

[28]

W. Wu, P. DeMar, and M. Crawford. A transport-friendly nic for multicore/multiprocessor systems. Parallel and Distributed Systems, IEEE Transactions on, 23(4): 607--615, 2012.

Digital Library

Cited By

Li WHu SSun GLi Y(2018)Adaptive Load Balancing on Multi-core IPsec GatewayAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05051-1_15(215-228)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-3-030-05051-1_15
Prekas GKogias MBugnion E(2017)ZygOSProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132780(325-341)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3132747.3132780
Eisenbud DYi CContavalli CSmith CKononov RMann-Hielscher ECilingiroglu ACheyney BShang WHosein JArgyraki KIsaacs R(2016)MaglevProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation10.5555/2930611.2930645(523-535)Online publication date: 16-Mar-2016
https://dl.acm.org/doi/10.5555/2930611.2930645
Show More Cited By

Index Terms

Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows
1. Hardware
  1. Communication hardware, interfaces and storage
2. Networks
  1. Network protocols
    1. Transport protocols
  2. Network types
    1. Public Internet

Recommendations

Impact of the end-system and affinities on the throughput of high-speed flows
ANCS '14: Proceedings of the tenth ACM/IEEE symposium on Architectures for networking and communications systems

Network throughput is scaling "up" to higher data transfer rates while processors are scaling "out" to multiple cores. As a result, network adapter "offloads" and performance "tuning" have received a good deal of attention lately. However, much of this ...
An evolutionary approach to improve end-to-end performance in tcp/ip networks
Can high-speed networks survive with DropTail queues management?

It has been observed that TCP connections that go through multiple congested links (MCL) have a smaller transmission rate than the other connections. Such TCP behavior is a result of two components (i) the cumulative packet losses that a flow ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

NDM '13: Proceedings of the Third International Workshop on Network-Aware Data Management

November 2013

84 pages

ISBN:9781450325226

DOI:10.1145/2534695

General Chairs:
Mehmet Balman
VMware R&D & Lawrence Berkeley National Laboratory
,
Surendra Byna
Lawrence Berkeley National Laboratory
,
Brian L. Tierney
Energy Sciences Network & Lawrence Berkeley National Laboratory

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

U.S. Department of Energy

Conference

SC13

Sponsor:

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

NDM '13 Paper Acceptance Rate 9 of 14 submissions, 64%;

Overall Acceptance Rate 14 of 23 submissions, 61%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
147
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 16 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li WHu SSun GLi Y(2018)Adaptive Load Balancing on Multi-core IPsec GatewayAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-05051-1_15(215-228)Online publication date: 7-Dec-2018
https://doi.org/10.1007/978-3-030-05051-1_15
Prekas GKogias MBugnion E(2017)ZygOSProceedings of the 26th Symposium on Operating Systems Principles10.1145/3132747.3132780(325-341)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3132747.3132780
Eisenbud DYi CContavalli CSmith CKononov RMann-Hielscher ECilingiroglu ACheyney BShang WHosein JArgyraki KIsaacs R(2016)MaglevProceedings of the 13th Usenix Conference on Networked Systems Design and Implementation10.5555/2930611.2930645(523-535)Online publication date: 16-Mar-2016
https://dl.acm.org/doi/10.5555/2930611.2930645
Rashti MSabin GKettimuthu R(2016)Long-haul secure data transfer using hardware-assisted GridFTPFuture Generation Computer Systems10.1016/j.future.2015.09.01456:C(265-276)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.future.2015.09.014
Hanford NAhuja VFarrens MGhosal DBalman MPouyoul ETierney B(2016)Improving network performance on multicore systemsFuture Generation Computer Systems10.1016/j.future.2015.09.01256:C(277-283)Online publication date: 1-Mar-2016
https://dl.acm.org/doi/10.1016/j.future.2015.09.012
Groves TGutierrez SArnold D(2015)A LogP Extension for Modeling Tree Aggregation NetworksProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.117(666-673)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1109/CLUSTER.2015.117
Hanford NAhuja VFarrens MGhosal DBalman MPouyoul ETierney BBalman MByna STierney B(2014)Analysis of the effect of core affinity on high-throughput flowsProceedings of the Fourth International Workshop on Network-Aware Data Management10.5555/2688394.2688396(9-15)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.5555/2688394.2688396
Hanford NAhuja VFarrens MGhosal DBalman MPouyoul ETierney BPrasanna VBrebner GKeslassy I(2014)Impact of the end-system and affinities on the throughput of high-speed flowsProceedings of the tenth ACM/IEEE symposium on Architectures for networking and communications systems10.1145/2658260.2661772(259-260)Online publication date: 20-Oct-2014
https://dl.acm.org/doi/10.1145/2658260.2661772
Hanford NAhuja VFarrens MGhosal DBalman MPouyoul ETierney B(2014)Analysis of the Effect of Core Affinity on High-Throughput Flows2014 Fourth International Workshop on Network-Aware Data Management10.1109/NDM.2014.10(9-15)Online publication date: Nov-2014
https://doi.org/10.1109/NDM.2014.10
Hanmin Ye Zihang Song Qianting Sun (2014)Design of green data center deployment model based on cloud computing and TIA942 heat dissipation standard2014 IEEE Workshop on Electronics, Computer and Applications10.1109/IWECA.2014.6845649(433-437)Online publication date: May-2014
https://doi.org/10.1109/IWECA.2014.6845649

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents