skip to main content
10.1145/3591195.3595270acmconferencesArticle/Chapter ViewAbstractPublication PagesismmConference Proceedingsconference-collections
research-article
Open access

Scaling Up Performance of Managed Applications on NUMA Systems

Published: 06 June 2023 Publication History

Abstract

Scaling up the performance of managed applications on Non-Uniform Memory Access (NUMA) architectures has been a challenging task, as it requires a good understanding of the underlying architecture and managed runtime environments (MRE). Prior work has studied this problem from the scope of specific components of the managed runtimes, such as the Garbage Collectors, as a means to increase the NUMA awareness in MREs.
In this paper, we follow a different approach that complements prior work by studying the behavior of managed applications on NUMA architectures during mutation time. At first, we perform a characterization study that classifies several Dacapo and Renaissance applications as per their scalability-critical properties. Based on this study, we propose a novel lightweight mechanism in MREs for optimizing the scalability of managed applications on NUMA systems, in an application-agnostic way. Our experimental results show that the proposed mechanism can result in relative performance ranging from 0.66x up to 3.29x, with a geometric mean of 1.11x, against a NUMA-agnostic execution.

References

[1]
Reto Achermann, Ashish Panwar, Abhishek Bhattacharjee, Timothy Roscoe, and Jayneel Gandhi. 2020. Mitosis: Transparently Self-Replicating Page-Tables for Large-Memory Machines. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 283–300. isbn:9781450371025 https://doi.org/10.1145/3373376.3378468
[2]
Khaled Alnowaiser and Jeremy Singer. 2015. Topology-Aware Parallelism for NUMA Copying Collectors. In Revised Selected Papers of the 28th International Workshop on Languages and Compilers for Parallel Computing - Volume 9519 (LCPC 2015). Springer-Verlag, Berlin, Heidelberg. 191–205. isbn:9783319297774 https://doi.org/10.1007/978-3-319-29778-1_12
[3]
Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khang, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanović, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA ’06). Association for Computing Machinery, New York, NY, USA. 169–190. isbn:1595933484 https://doi.org/10.1145/1167473.1167488
[4]
Irina Calciu, Siddhartha Sen, Mahesh Balakrishnan, and Marcos K. Aguilera. 2017. Black-Box Concurrent Data Structures for NUMA Architectures. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). Association for Computing Machinery, New York, NY, USA. 207–221. isbn:9781450344654 https://doi.org/10.1145/3037697.3037721
[5]
Rui Chen. 2018. Dacapo 9.12 MR1 Release Notes. https://github.com/dacapobench/dacapobench/blob/468b86874a2f62c66d111fc871674f935619ca0b/benchmarks/RELEASE_NOTES.txt
[6]
Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth. 2013. Traffic Management: A Holistic Approach to Memory Placement on NUMA Systems. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA. 381–394. isbn:9781450318709 https://doi.org/10.1145/2451116.2451157
[7]
Kristof Du Bois, Jennifer B. Sartor, Stijn Eyerman, and Lieven Eeckhout. 2013. Bottle Graphs: Visualizing Scalability Bottlenecks in Multi-Threaded Applications. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). Association for Computing Machinery, New York, NY, USA. 355–372. isbn:9781450323741 https://doi.org/10.1145/2509136.2509529
[8]
Fabien Gaud, Baptiste Lepers, Jeremie Decouchant, Justin Funston, Alexandra Fedorova, and Vivien Quéma. 2014. Large Pages May Be Harmful on NUMA Systems. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC’14). USENIX Association, USA. 231–242. isbn:9781931971102
[9]
Lokesh Gidra, Gaël Thomas, Julien Sopena, and Marc Shapiro. 2013. A Study of the Scalability of Stop-the-World Garbage Collectors on Multicores. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’13). Association for Computing Machinery, New York, NY, USA. 229–240. isbn:9781450318709 https://doi.org/10.1145/2451116.2451142
[10]
Lokesh Gidra, Gaël Thomas, Julien Sopena, Marc Shapiro, and Nhan Nguyen. 2015. NumaGiC: A Garbage Collector for Big Data on Big NUMA Machines. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’15). Association for Computing Machinery, New York, NY, USA. 661–673. isbn:9781450328357 https://doi.org/10.1145/2694344.2694361
[11]
James R Goodman and Herbert Hing Jing Hum. 2009. MESIF: A two-hop cache coherency protocol for point-to-point interconnects. University of Auckland.
[12]
Shams M. Imam and Vivek Sarkar. 2014. Savina - An Actor Benchmark Suite: Enabling Empirical Evaluation of Actor Libraries. In Proceedings of the 4th International Workshop on Programming Based on Actors Agents & Decentralized Control (AGERE! ’14). Association for Computing Machinery, New York, NY, USA. 67–80. isbn:9781450321891 https://doi.org/10.1145/2687357.2687368
[13]
Reactors IO. 2013. Reactors.IO. http://reactors.io/
[14]
Tomas Kalibera, Matthew Mole, Richard Jones, and Jan Vitek. 2012. A Black-Box Approach to Understanding Concurrency in DaCapo. In Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’12). Association for Computing Machinery, New York, NY, USA. 335–354. isbn:9781450315616 https://doi.org/10.1145/2384616.2384641
[15]
Christos Kotselidis, James Clarkson, Andrey Rodchenko, Andy Nisbet, John Mawer, and Mikel Luján. 2017. Heterogeneous Managed Runtime Systems: A Computer Vision Case Study. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ’17). Association for Computing Machinery, New York, NY, USA. 74–82. isbn:9781450349482 https://doi.org/10.1145/3050748.3050764
[16]
Philipp Lengauer, Verena Bitto, Hanspeter Mössenböck, and Markus Weninger. 2017. A Comprehensive Java Benchmark Study on Memory and Garbage Collection Behavior of DaCapo, DaCapo Scala, and SPECjvm2008. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering (ICPE ’17). Association for Computing Machinery, New York, NY, USA. 3–14. isbn:9781450344043 https://doi.org/10.1145/3030207.3030211
[17]
Ruairidh MacGregor, Phil Trinder, and Hans-Wolfgang Loidl. 2021. Improving GHC Haskell NUMA Profiling. In Proceedings of the 9th ACM SIGPLAN International Workshop on Functional High-Performance and Numerical Computing (FHPNC 2021). Association for Computing Machinery, New York, NY, USA. 1–12. isbn:9781450386142 https://doi.org/10.1145/3471873.3472974
[18]
Zoltan Majo and Thomas R. Gross. 2011. Memory Management in NUMA Multicore Systems: Trapped between Cache Contention and Interconnect Overhead. In Proceedings of the International Symposium on Memory Management (ISMM ’11). Association for Computing Machinery, New York, NY, USA. 11–20. isbn:9781450302630 https://doi.org/10.1145/1993478.1993481
[19]
Zoltan Majo and Thomas R. Gross. 2012. Matching Memory Access Patterns and Data Placement for NUMA Systems. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO ’12). Association for Computing Machinery, New York, NY, USA. 230–241. isbn:9781450312066 https://doi.org/10.1145/2259016.2259046
[20]
Orion Papadakis. 2022. Performance analysis and optimizations of managed applications on Non-Uniform Memory architectures. PhD thesis, The University of Manchester.
[21]
Orion Papadakis, Foivos S. Zakkak, Nikos Foutris, and Christos Kotselidis. 2020. You Can’t Hide You Can’t Run: A Performance Assessment of Managed Applications on a NUMA Machine. In Proceedings of the 17th International Conference on Managed Programming Languages and Runtimes (MPLR 2020). Association for Computing Machinery, New York, NY, USA. 80–88. isbn:9781450388535 https://doi.org/10.1145/3426182.3426189
[22]
Maria Patrou, Kenneth B. Kent, Gerhard W. Dueck, Charlie Gracie, and Aleksandar Micic. 2018. NUMA Awareness: Improving Thread and Memory Management. In 2018 44th Euromicro Conference on Software Engineering and Advanced Applications (SEAA). IEEE, New York, NY, USA. 119–123. https://doi.org/10.1109/SEAA.2018.00028
[23]
David A. Patterson and John L. Hennessy. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. isbn:1558800698
[24]
Aleksandar Prokopec, Andrea Rosà, David Leopoldseder, Gilles Duboscq, Petr Tůma, Martin Studener, Lubomír Bulej, Yudi Zheng, Alex Villazón, Doug Simon, Thomas Würthinger, and Walter Binder. 2019. Renaissance: Benchmarking Suite for Parallel Applications on the JVM. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 31–47. isbn:9781450367127 https://doi.org/10.1145/3314221.3314637
[25]
Andrea Rosà, Lydia Y. Chen, and Walter Binder. 2016. AkkaProf: A Profiler for Akka Actors in Parallel and Distributed Applications. In Programming Languages and Systems, Atsushi Igarashi (Ed.). Springer International Publishing, Cham. 139–147. isbn:978-3-319-47958-3
[26]
Eduardo Rosales, Andrea Rosà, and Walter Binder. 2020. FJProf: Profiling Fork/Join Applications on the Java Virtual Machine. In Proceedings of the 13th EAI International Conference on Performance Evaluation Methodologies and Tools (VALUETOOLS ’20). Association for Computing Machinery, New York, NY, USA. 128–135. isbn:9781450376464 https://doi.org/10.1145/3388831.3388851
[27]
Andrew S Tanenbaum and Albert S Woodhull. 2006. Operating Systems - Design and Implementation - Third Edition. Pearson Education Inc, Upper Saddle River, NJ 07458. isbn:0131429388
[28]
Xinghui Zhao and Nadeem Jamali. 2013. Load Balancing Non-Uniform Parallel Computations. In Proceedings of the 2013 Workshop on Programming Based on Actors, Agents, and Decentralized Control (AGERE! 2013). Association for Computing Machinery, New York, NY, USA. 97–108. isbn:9781450326025 https://doi.org/10.1145/2541329.2541337
[29]
Xin Zhao, Jin Zhou, Hui Guan, Wei Wang, Xu Liu, and Tongping Liu. 2021. NumaPerf: Predictive NUMA Profiling. In Proceedings of the ACM International Conference on Supercomputing (ICS ’21). Association for Computing Machinery, New York, NY, USA. 52–62. isbn:9781450383356 https://doi.org/10.1145/3447818.3460361
[30]
Sergey Zhuravlev, Sergey Blagodurov, and Alexandra Fedorova. 2010. Addressing Shared Resource Contention in Multicore Processors via Scheduling. In Proceedings of the Fifteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Association for Computing Machinery, New York, NY, USA. 129–142. isbn:9781605588391 https://doi.org/10.1145/1736020.1736036

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISMM 2023: Proceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management
June 2023
175 pages
ISBN:9798400701795
DOI:10.1145/3591195
This work is licensed under a Creative Commons Attribution 4.0 International License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dacapo
  2. JVM
  3. Managed Runtimes
  4. MaxineVM
  5. NUMA
  6. Optimization
  7. Renaissance
  8. Scalability

Qualifiers

  • Research-article

Funding Sources

Conference

ISMM '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 72 of 156 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 392
    Total Downloads
  • Downloads (Last 12 months)252
  • Downloads (Last 6 weeks)27
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media