skip to main content
10.1145/2534695.2534697acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows

Published: 17 November 2013 Publication History

Abstract

Multi-core end-systems use Receive Side Scaling (RSS) to parallelize protocol processing. RSS uses a hash function on the standard flow descriptors and an indirection table to assign incoming packets to receive queues which are pinned to specific cores. This ensures flow affinity in that the interrupt processing of all packets belonging to a specific flow is processed by the same core. A key limitation of standard RSS is that it does not consider the application process that consumes the incoming data in determining the flow affinity. In this paper, we carry out a detailed experimental analysis of the performance impact of the application affinity in a 40 Gbps testbed network with a dual hexa-core end-system. We show, contrary to conventional wisdom, that when the application process and the flow are affinitized to the same core, the performance (measured in terms of end-to-end TCP throughput) is significantly lower than the line rate. Near line rate performance is observed when the flow and the application process are affinitized to different cores belonging to the same socket. Furthermore, affinitizing the application and the flow to cores on different sockets results in significantly lower throughput than the line rate. These results arise due to the memory bottleneck, which is demonstrated using preliminary correlational data on the cache hit rate in the core that services the application process.

References

[1]
V. Ahuja, M. Farrens, and D. Ghosal. Cache-aware affinitization on commodity multicores for high-speed network flows. In Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems, pages 39--48. ACM, 2012.
[2]
M. Balakrishnan. Reliable Communication for Datacenters. PhD thesis, Cornell University, 2009.
[3]
I. S. Bridge. Sandy bridge architecture, http://en.wikipedia.org/wiki/SandyBridge/.
[4]
A. Foong, J. Fung, D. Newell, S. Abraham, P. Irelan, and A. Lopez-Estrada. Architectural characterization of processor affinity in network processing. In Performance Analysis of Systems and Software, 2005. ISPASS 2005. IEEE International Symposium on, pages 207--218. IEEE, 2005.
[5]
T. Herbert. rfs: receive flow steering, september 2010. http://lwn.net/Articles/381955/.
[6]
T. Herbert. rps: receive packet steering, september 2010. http://lwn.net/Articles/361440/.
[7]
R. Huggahalli, R. Iyer, and S. Tetrick. Direct cache access for high bandwidth network i/o. In ACM SIGARCH Computer Architecture News, volume 33, pages 50--59. IEEE Computer Society, 2005.
[8]
S. Jana, A. Pande, A. Chan, and P. Mohapatra. Network characterization and perceptual evaluation of skype mobile videos. In 22nd International Conference on Computer Communications and Networks (ICCCN), 2013.
[9]
H. Jang and H. Jin. Miami: Multi-core aware processor affinity for tcp/ip over multiple network interfaces. In High Performance Interconnects, 2009. HOTI 2009. 17th IEEE Symposium on, pages 73--82. IEEE, 2009.
[10]
A. Kumar, R. Huggahalli, and S. Makineni. Characterization of direct cache access on multi-core systems and 10gbe. In High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, pages 341--352, feb. 2009.
[11]
E. León, K. Ferreira, and A. Maccabe. Reducing the impact of the memorywall for i/o using cache injection. In High-Performance Interconnects, 2007. HOTI 2007. 15th Annual IEEE Symposium on, pages 143--150. IEEE, 2007.
[12]
E. León, R. Riesen, K. Ferreira, and A. Maccabe. Cache injection for parallel applications. proc. HDPC'11, pages 15--26, 2011.
[13]
G. Liao, D. Guo, L. Bhuyan, and S. King. Software techniques to improve virtualized i/o performance on multi-core systems. In Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems, pages 161--170. ACM, 2008.
[14]
G. Liao, X. Zhu, and L. Bnuyan. A new server i/o architecture for high speed networks. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, pages 255--265. IEEE, 2011.
[15]
T. Marian. Operating systems abstractions for software packet processing in datacenters. PhD thesis, Cornell University, 2011.
[16]
T. Marian, D. Freedman, K. Birman, and H. Weatherspoon. Empirical characterization of uncongested optical lambda networks and 10gbe commodity endpoints. In Dependable Systems and Networks (DSN), 2010 IEEE/IFIP International Conference on, pages 575--584. IEEE, 2010.
[17]
J. Mogul and K. Ramakrishnan. Eliminating receive livelock in an interrupt-driven kernel. ACM Transactions on Computer Systems (TOCS), 15(3): 217--252, 1997.
[18]
G. Narayanaswamy, P. Balaji, and W. Feng. Impact of network sharing in multi-core architectures. In Computer Communications and Networks, 2008. ICCCN'08. Proceedings of 17th International Conference on, pages 1--6. IEEE, 2008.
[19]
E. S. Network. Esnet, http://www.es.net/.
[20]
S. Networking. Eliminating the Receive Processing Bottleneckâ Introducing RSS. Microsoft WinHEC (April 2004), 2004.
[21]
V. Omwando, A. Pande, Y. Zeng, and P. Mohapatra. Evaluating perceptual video quality in 802.11n wlan with mobile clients. In The 8th ACM International Workshop on Wireless Network Testbeds, Experimental Evaluation and Characterization (ACM WiNTECH) 2013, pages --, 2013.
[22]
A. Pande and J. Zambreno. Efficient translation of algorithmic kernels on large-scale multi-cores. In Computational Science and Engineering, 2009. CSE'09. International Conference on, volume 2, pages 915--920. IEEE, 2009.
[23]
RSS. Scaling in the Linux Networking Stack. https://www.kernel.org/doc/Documentation/networking/scaling.txt.
[24]
J. Salim. When napi comes to town. In Linux 2005 Conf, 2005.
[25]
L. Shalev, J. Satran, E. Borovik, and M. Ben-Yehuda. Isostack: highly efficient network processing on dedicated cores. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.
[26]
SuperMicro. Supermicro x9dr3-f user's manual, http://www.supermicro.com/products/motherboard/xeon/c600/x9dr3-f.cfm.
[27]
K. Wehrle, F. Pählke, H. Ritter, D. Müller, and M. Bechler. The linux networking architecture. Design and Implementation of Network Protocols in the Linux Kernel, 2005.
[28]
W. Wu, P. DeMar, and M. Crawford. A transport-friendly nic for multicore/multiprocessor systems. Parallel and Distributed Systems, IEEE Transactions on, 23(4): 607--615, 2012.

Cited By

View all

Index Terms

  1. Characterizing the impact of end-system affinities on the end-to-end performance of high-speed flows

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        NDM '13: Proceedings of the Third International Workshop on Network-Aware Data Management
        November 2013
        84 pages
        ISBN:9781450325226
        DOI:10.1145/2534695
        • General Chairs:
        • Mehmet Balman,
        • Surendra Byna,
        • Brian L. Tierney
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 17 November 2013

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. 40 Gbps network
        2. ESnet
        3. RFS
        4. RPS
        5. application affinity
        6. end-system performance
        7. flow affinity
        8. high-speed network
        9. multi-core affinization

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        SC13

        Acceptance Rates

        NDM '13 Paper Acceptance Rate 9 of 14 submissions, 64%;
        Overall Acceptance Rate 14 of 23 submissions, 61%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 16 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media