skip to main content
10.1145/3064176.3064196acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

An interface to implement NUMA policies in the Xen hypervisor

Published: 23 April 2017 Publication History

Abstract

While virtualization only introduces a small overhead on machines with few cores, this is not the case on larger ones. Most of the overhead on the latter machines is caused by the Non-Uniform Memory Access (NUMA) architecture they are using. In order to reduce this overhead, this paper shows how NUMA placement heuristics can be implemented inside Xen. With an evaluation of 29 applications on a 48-core machine, we show that the NUMA placement heuristics can multiply the performance of 9 applications by more than 2.

References

[1]
K. Adams and O. Agesen. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'06, pages 2--13, 2006.
[2]
J. Ahn, C. H. Park, and J. Huh. Micro-sliced virtual processors to hide the effect of discontinuous cpu availability for consolidated systems. In Proceedings of the International Symposium on Microarchitecture, MICRO'14, pages 394--405, 2014.
[3]
M. Aigner, C. M. Kirsch, M. Lippautz, and A. Sokolova. Fast, multicore-scalable, low-fragmentation memory allocation through large virtual memory and global data structures. In Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'15, pages 451--169, 2015.
[4]
llalloc: Lockless memory allocator. http://locklessinc.com/.
[5]
Cache hierarchy and memory subsystem of the amd opteron processor. http://portal.nersc.gov/project/training/files/XE6-feb-2011/Architecture/Opteron-Memory-Cache.pdf, 2011.
[6]
Amd i/o virtualization technology (iommu) specification. http://support.amd.com/TechDocs/48882_IOMMU.pdf, 2015.
[7]
S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to many cores. In Proceedings of the conference on Operating Systems Design and Implementation, OSDI'10, pages 1--16, 2010.
[8]
E. Bugnion, S. Devine, and M. Rosenblum. Disco: Running commodity operating systems on scalable multiprocessors. In Proceedings of the Symposium on Operating Systems Principles, SOSP'97, pages 143--156, 1997.
[9]
L. Cheng, J. Rao, and F. C. M. Lau. vscale: Automatic and efficient processor scaling for smp virtual machines. In Proceedings of the European Conference on Computer Systems, EuroSys'16, pages 2:1--2:14, 2016.
[10]
L. Cherkasova and R. Gardner. Measuring cpu overhead for i/o processing in the xen virtual machine monitor. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'05, pages 24--24, 2005.
[11]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the Symposium on Cloud computing, SoCC'10, pages 143--154, 2010.
[12]
M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize, B. Lepers, V. Quema, and M. Roth. Traffic management: A holistic approach to memory placement on numa systems. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'13, pages 381--394, 2013.
[13]
F. David, G. Thomas, J. Lawall, and G. Muller. Continuously measuring critical section pressure with the Free-Lunch profiler. In Proceedings of the conference on Object Oriented Programming Systems Languages and Applications, OOPSLA'14, 2014.
[14]
T. David, R. Guerraoui, and V. Trigonakis. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the Symposium on Operating Systems Principles, SOSP'13, pages 33--48, 2013.
[15]
D. Dice, V. J. Marathe, and N. Shavit. Lock cohorting: A general technique for designing NUMA locks. In Proceedings of the symposium on Principles and Practices of Parallel Programming, PPoPP'12, pages 247--256, 2012.
[16]
X. Ding, P. B. Gibbons, M. A. Kozuch, and J. Shan. Gleaner: Mitigating the blocked-waiter wakeup problem for virtualized multicore applications. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'14, pages 73--84, 2014.
[17]
L. Gidra, G. Thomas, J. Sopena, M. Shapiro, and N. Nguyen. NumaGiC: a garbage collector for big data on big NUMA machines. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15, pages 661--673, 2015.
[18]
A. Gordon, N. Amit, N. Har'El, M. Ben-Yehuda, A. Landau, A. Schuster, and D. Tsafrir. Eli: Bare-metal performance for i/o virtualization. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'12, pages 411--422, 2012.
[19]
D. Gupta, L. Cherkasova, R. Gardner, and A. Vahdat. Enforcing performance isolation across virtual machines in xen. In Proceedings of the International Conference on Middleware, Middleware'06, pages 342--362, 2006.
[20]
A. Haas, M. Lippautz, T. A. Henzinger, H. Payer, A. Sokolova, C. M. Kirsch, and A. Sezgin. Distributed queues in shared memory: Multicore performance and scalability through quantitative relaxation. In Proceedings of the ACM International Conference on Computing Frontiers, pages 17:1--17:9, 2013.
[21]
J. Han, J. Ahn, C. Kim, Y. Kwon, Y.-R. Choi, and J. Huh. The effect of multi-core on hpc applications in virtualized systems. In Proceedings of the European conference on Parallel processing, EuroPar'10, pages 615--623, 2010.
[22]
Y. Koh, R. C. Knauerhase, P. Brett, M. Bowman, Z. Wen, and C. Pu. An analysis of performance interference effects in virtual environments. In Proceedings of the International Symposium on Performance Analysis of Systems and Software, ISPASS'07, pages 200--209, 2007.
[23]
R. Lachaize, B. Lepers, and V. Quema. Memprof: A memory profiler for numa multicore systems. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'12, pages 53--64, 2012.
[24]
J. R. Lange, K. Pedretti, P. Dinda, P. G. Bridges, C. Bae, P. Soltero, and A. Merritt. Minimal-overhead virtualization of a large scale supercomputer. In Proceedings of the international conference on Virtual Execution Environments, VEE'11, pages 169--180, 2011.
[25]
Autonuma: the other approach to numa scheduling. http://lwn.net/Articles/488709/, 2012.
[26]
M. Liu and T. Li. Optimizing virtual machine consolidation performance on numa server architecture for cloud workloads. In Proceedings of the International Symposium on Computer Architecture, ISCA'14, pages 325--336, 2014.
[27]
J. Mars, L. Tang, K. Skadron, M. L. Soffa, and R. Hundt. Increasing utilization in modern warehouse-scale computers using bubble-up. IEEE Micro, 32(3):88--99, 2012.
[28]
J. M. Mellor-Crummey and M. L. Scott. Synchronization without contention. In Proceedings of the conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '91, pages 269--278, 1991.
[29]
A. Menon, J. R. Santos, Y. Turner, G. J. Janakiraman, and W. Zwaenepoel. Diagnosing performance overheads in the xen virtual machine environment. In Proceedings of the international conference on Virtual Execution Environments, VEE'05, pages 13--23, 2005.
[30]
D. Ongaro, A. L. Cox, and S. Rixner. Scheduling i/o in virtual machine monitors. In Proceedings of the international conference on Virtual Execution Environments, VEE' 08, pages 1--10, 2008.
[31]
X. Pu, L. Liu, Y. Mei, S. Sivathanu, Y. Koh, and C. Pu. Understanding performance interference of i/o workload in virtualized cloud environments. In Proceedings of the International Conference on Cloud Computing, CLOUD'10, pages 51--58, 2010.
[32]
J. Rao, K. Wang, X. Zhou, and C.-Z. Xu. Optimizing virtual machine scheduling in NUMA multicore systems. In Proceedings of the symposium on High Performance Computer Architecture, HPCA'13, pages 306--317, 2013.
[33]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Symposium on Operating Systems Principles, SOSP'13, pages 472--488, 2013.
[34]
S. Schneider, C. D. Antonopoulos, and D. S. Nikolopoulos. Scalable locality-conscious multithreaded memory allocation. In Proceedings of the International Symposium on Memory Management, ISMM'06, pages 84--94, 2006.
[35]
X. Song, H. Chen, and B. Zang. Characterizing the performance and scalability of many-core applications on virtualized platforms. Technical report, Parallel Processing Institute, Fudan University, 2010.
[36]
B. Teabe, A. Tchana, and D. Hagimont. Application-specific quantum for multi-core platform scheduler. In Proceedings of the European Conference on Computer Systems, EuroSys'16, pages 3:1--3:14, 2016.
[37]
B. Teabe, A. Tchana, and D. Hagimont. The lock holder and the lock waiter pre-emption problems: nip them in the bud using informed spinlocks (i-spinlocks). In Proceedings of the European Conference on Computer Systems, EuroSys'17, 2017.
[38]
M. M. Tikir and J. K. Hollingsworth. NUMA-aware Java heaps for server applications. In Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS'05, pages 108--117, 2005.
[39]
V. Uhlig, J. LeVasseur, E. Skoglund, and U. Dannowski. Towards scalable multiprocessor virtual machines. In Proceedings of the conference on Virtual Machine Research And Technology Symposium'04, pages 1--14, 2004.
[40]
P. M. Wells, K. Chakraborty, and G. S. Sohi. Hardware support for spin management in overcommitted virtual machines. In Proceedings of the International Conference on Parallel Architectures and Compilation, PACT'06, pages 124--133, 2006.
[41]
C. Weng, Q. Liu, L. Yu, and M. Li. Dynamic adaptive scheduling for virtual machines. In Proceedings of the symposium on High-Performance Parallel and Distributed Computing, HPDC'11, pages 239--250, 2011.
[42]
C. Xu, S. Gamage, H. Lu, R. Kompella, and D. Xu. vturbo: Accelerating virtual machine i/o processing using designated turbo-sliced core. In Proceedings of the Usenix Annual Technical Conference, USENIX ATC'13, pages 243--254, 2013.
[43]
C. Xu, S. Gamage, P. N. Rao, A. Kangarlou, R. R. Kompella, and D. Xu. vslicer: Latency-aware virtual machine scheduling via differentiated-frequency CPU slicing. In Proceedings of the symposium on High-Performance Parallel and Distributed Computing, HPDC'12, pages 3--14, 2012.
[44]
H. Yang, A. D. Breslow, J. Mars, and L. Tang. Bubble-flux: precise online qos management for increased utilization in warehouse scale computers. In Proceedings of the International Symposium on Computer Architecture, ISCA'13, pages 607--618, 2013.
[45]
J. Zhou and B. Demsky. Memory management for many-core processors with software configurable locality policies. In Proceedings of the International Symposium on Memory Management, ISMM'12, pages 3--14, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '17: Proceedings of the Twelfth European Conference on Computer Systems
April 2017
648 pages
ISBN:9781450349383
DOI:10.1145/3064176
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2017

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '17
Sponsor:
EuroSys '17: Twelfth EuroSys Conference 2017
April 23 - 26, 2017
Belgrade, Serbia

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media