research-article

Exploring multi-threaded Java application performance on multicore hardware

Authors:

Jennfer B. Sartor,

Lieven EeckhoutAuthors Info & Claims

OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Pages 281 - 296

https://doi.org/10.1145/2384616.2384638

Published: 19 October 2012 Publication History

Abstract

While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the embedded to the server domain. Managed languages complicate performance studies because they have additional virtual machine threads that collect garbage and dynamically compile, closely interacting with application threads. Further complexity is introduced as modern multicore machines have multiple sockets and dynamic frequency scaling options, broadening opportunities to reduce both power and running time.

In this paper, we explore the performance of Java applications, studying how best to map application and virtual machine (JVM) threads to a multicore, multi-socket environment. We explore both the cost of separating JVM threads from application threads, and the opportunity to speed up or slow down the clock frequency of isolated threads. We perform experiments with the multi-threaded DaCapo benchmarks and pseudojbb2005 running on the Jikes Research Virtual Machine, on a dual-socket, 8-core Intel Nehalem machine to reveal several novel, and sometimes counter-intuitive, findings. We believe these insights are a first but important step towards understanding and optimizing managed language performance on modern hardware.

References

[1]

L. A. Barroso and U. Hölzle. The case for energy-proportional systems. IEEE Computer, 40: 33--37, Dec. 2007.

Digital Library

[2]

S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator locality. In Programming Language Design and Implementation (PLDI), pages 22--32, Tuscon, AZ, June 2008.

Digital Library

[3]

S. M. Blackburn, M. Hirzel, R. Garner, and D. Stefanović. pjbb2005: The pseudojbb benchmark. URL http://users.cecs.anu.edu.au/ steveb/research/research-infrastructure/pjbb2005.

[4]

S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In ACM SIGPLAN Conference on Object-Oriented Programing, Systems, Languages, and Applications (OOPSLA), pages 169--190, Oct. 2006.

Digital Library

[5]

S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffman, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. Wake up and smell the coffee: Evaluation methodology for the 21st century. Communications of the ACM, 51 (8): 83--89, Aug. 2008.

Digital Library

[6]

T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In The 39th International Symposium on Computer Architecture (ISCA), pages 225--236, June 2012.

Digital Library

[7]

R. H. Dennard, F. H. Gaensslen, V. L. Rideout, E. Bassous, and A. R. LeBlanc. Design of ion-implanted mosfet's with very small physical dimensions. IEEE Journal of Solid-State Circuits, Oct 1974.

[8]

J. Dorsey, S. Searles, M. Ciraula, S. Johnson, N. Bujanos, D. Wu, M. Braganza, S. Meyers, E. Fang, and R. Kumar. An integrated quad-core Opteron processor. In Proceedings of the International Solid State Circuits Conference (ISSCC), pages 102--103, Feb. 2007.

[9]

H. Esmaeilzadeh, E. R. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In 38th International Symposium on Computer Architecture (ISCA), pages 365--376, June 2011.

Digital Library

[10]

H. Esmaeilzadeh, T. Cao, Y. Xi, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 319--332, June 2011.

Digital Library

[11]

A. Georges, D. Buytaert, and L. Eeckhout. Statistically rigorous Java performance evaluation. In Proceedings of the Annual ACM SIGPLAN Conference on Object-Oriented Programming, Languages, Applications and Systems (OOPSLA), pages 57--76, Oct. 2007.

Digital Library

[12]

C.-H. Hsu and U. Kremer. The design, implementation, and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 38--48, June 2003.

Digital Library

[13]

S. Hu and L. K. John. Impact of virtual execution environments on processor energy consumption and hardware adaptation. In International Conference on Virtual Execution Environments (VEE), pages 100--110, June 2006.

Digital Library

[14]

C. J. Hughes, J. Srinivasan, and S. V. Adve. Saving energy with architectural and frequency adaptations for multimedia applications. In Proceedings of the 34th Annual International Symposium on Microarchitecture (MICRO), pages 250--261, Dec. 2001.

Digital Library

[15]

Intel Coorporation. Intel turbo boost technology in Intel core microarchitecture (Nehalem) based processors, Nov 2008.

[16]

C. Isci, A. Buyuktosunoglu, C.-Y. Cher, P. Bose, and M. Martonosi. An analysis of efficient multi-core global power management policies: Maximizing performance for a given power budget. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 347--358, Dec. 2006.

Digital Library

[17]

C. Isci, G. Contreras, and M. Martonosi. Live, runtime phase monitoring and prediction on real systems and application to dynamic power management. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 359--370, Dec. 2006.

Digital Library

[18]

W. Kim, M. S. Gupta, G.-Y. Wei, and D. Brooks. System level analysis of fast, per-core DVFS using on-chip switching regulators. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA), pages 123--134, Feb. 2008.

[19]

G. E. Moore. Readings in computer architecture. chapter Cramming more components onto integrated circuits, pages 56--59. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000.

Digital Library

[20]

Y. Seeley. JIRA issue LUCENE-1800: QueryParser should use reusable token streams, 2009. URL https://issues.apache.org/jira/browse/LUCENE-1800.

[21]

G. Semeraro, D. H. Albonesi, S. G. Dropsho, G. Magklis, S. Dwarkadas, and M. L. Scott. Dynamic frequency and voltage control for a multiple clock domain microarchitecture. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 356--367, Nov. 2002.

Digital Library

[22]

TIOBE Software. TIOBE programming community index, 2011. http://tiobe.com/tpci.html.

[23]

Q. Wu, V. J. Reddi, Y. Wu, J. Lee, D. Connors, D. Brooks, M. Martonosi, and D. W. Clark. A dynamic compilation framework for controlling microprocessor energy and performance. In Proceedings of the International Symposium on Microarchitecture (MICRO), pages 271--282, Nov. 2005.

Digital Library

[24]

F. Xie, M. Martonosi, and S. Malik. Compile-time dynamic voltage scaling settings: Opportunities and limits. In Proceedings of the International Symposium on Programming Language Design and Implementation (PLDI), pages 49--62, June 2003.

Digital Library

[25]

X. Yang, S. Blackburn, D. Frampton, J. Sartor, and K. McKinley. Why nothing matters: The impact of zeroing. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA), pages 307--324, Oct 2011.

Digital Library

Cited By

Suo KRao JJiang HSrisa-an WOliveira RFelber PHu Y(2018)Characterizing and optimizing hotspot parallel garbage collection on multicore systemsProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190512(1-15)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3190508.3190512
Hussein APayer MHosking AVick C(2017)One Process to Reap Them AllACM SIGPLAN Notices10.1145/3140607.305075452:7(171-186)Online publication date: 8-Apr-2017
https://dl.acm.org/doi/10.1145/3140607.3050754
Hussein APayer MHosking AVick C(2017)One Process to Reap Them AllProceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3050748.3050754(171-186)Online publication date: 8-Apr-2017
https://dl.acm.org/doi/10.1145/3050748.3050754
Show More Cited By

Index Terms

Exploring multi-threaded Java application performance on multicore hardware
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
      1. Runtime environments
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection

Recommendations

Exploring multi-threaded Java application performance on multicore hardware
OOPSLA '12

While there have been many studies of how to schedule applications to take advantage of increasing numbers of cores in modern-day multicore processors, few have focused on multi-threaded managed language applications which are prevalent from the ...
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs
CGO '10: Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization

In this paper we describe techniques for compiling fine-grained SPMD-threaded programs, expressed in programming models such as OpenCL or CUDA, to multicore execution platforms. Programs developed for manycore processors typically express finer thread-...
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

OOPSLA '12: Proceedings of the ACM international conference on Object oriented programming systems languages and applications

October 2012

1052 pages

ISBN:9781450315616

DOI:10.1145/2384616

General Chair:
Gary T. Leavens
University of Central Florida
,
Program Chair:
Matthew B. Dwyer
University of Nebraska - Lincoln

ACM SIGPLAN Notices Volume 47, Issue 10
OOPSLA '12
October 2012
1011 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2398857
Issue’s Table of Contents

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SPLASH '12

Sponsor:

SIGPLAN

SPLASH '12: Conference on Systems, Programming, and Applications: Software for Humanity

October 19 - 26, 2012

Arizona, Tucson, USA

Acceptance Rates

Overall Acceptance Rate 268 of 1,244 submissions, 22%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
441
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Suo KRao JJiang HSrisa-an WOliveira RFelber PHu Y(2018)Characterizing and optimizing hotspot parallel garbage collection on multicore systemsProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190512(1-15)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3190508.3190512
Hussein APayer MHosking AVick C(2017)One Process to Reap Them AllACM SIGPLAN Notices10.1145/3140607.305075452:7(171-186)Online publication date: 8-Apr-2017
https://dl.acm.org/doi/10.1145/3140607.3050754
Hussein APayer MHosking AVick C(2017)One Process to Reap Them AllProceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/3050748.3050754(171-186)Online publication date: 8-Apr-2017
https://dl.acm.org/doi/10.1145/3050748.3050754
Akram SSartor JEeckhout L(2017)DEP+BURSTIEEE Transactions on Computers10.1109/TC.2016.260990366:4(601-615)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TC.2016.2609903
Sartor JBois KEyerman SEeckhout L(2017)Analyzing the scalability of managed language applications with speedup stacks2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2017.7975267(23-32)Online publication date: Apr-2017
https://doi.org/10.1109/ISPASS.2017.7975267
Hylock R(2016)UPC: Large-Scale Memory Efficient Java Primitive CollectionsJournal of Software10.17706/jsw.11.3.251-27111:3(251-271)Online publication date: 2016
https://doi.org/10.17706/jsw.11.3.251-271
Qian JSrisa-an WSeth SJiang HLi DYi P(2016)Exploiting FIFO Scheduler to Improve Parallel Garbage Collection PerformanceACM SIGPLAN Notices10.1145/3007611.289224851:7(109-121)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/3007611.2892248
Qian JSrisa-an WSeth SJiang HLi DYi PGupta-Cledat VPorter DSarkar V(2016)Exploiting FIFO Scheduler to Improve Parallel Garbage Collection PerformanceProceedings of the12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments10.1145/2892242.2892248(109-121)Online publication date: 25-Mar-2016
https://dl.acm.org/doi/10.1145/2892242.2892248
Akram SSartor JCraeynest KHeirman WEeckhout L(2016)Boosting the Priority of GarbageACM Transactions on Architecture and Code Optimization10.1145/287542413:1(1-25)Online publication date: 7-Mar-2016
https://dl.acm.org/doi/10.1145/2875424
Akram SSartor JEeckhout L(2016)DVFS performance prediction for managed multithreaded applications2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2016.7482070(12-23)Online publication date: Apr-2016
https://doi.org/10.1109/ISPASS.2016.7482070
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents