skip to main content
10.5555/3014904.3014967acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Caliper: performance introspection for HPC software stacks

Published: 13 November 2016 Publication History

Abstract

Many performance engineering tasks, from long-term performance monitoring to post-mortem analysis and online tuning, require efficient runtime methods for introspection and performance data collection. To understand interactions between components in increasingly modular HPC software, performance introspection hooks must be integrated into runtime systems, libraries, and application codes across the software stack. This requires an interoperable, cross-stack, general-purpose approach to performance data collection, which neither application-specific performance measurement nor traditional profile or trace analysis tools provide. With Caliper, we have developed a general abstraction layer to provide performance data collection as a service to applications, runtime systems, libraries, and tools. Individual software components connect to Caliper in independent data producer, data consumer, and measurement control roles, which allows them to share performance data across software stack boundaries. We demonstrate Caliper's performance analysis capbilities with two case studies of production scenarios.

References

[1]
I. Karlin, A. Bhatele, B. L. Chamberlain, J. Cohen, Z. Devito, M. Gokhale, R. Haque, R. Hornung, J. Keasler, D. Laney, E. Luke, S. Lloyd, J. McGraw, R. Neely, D. Richards, M. Schulz, C. H. Still, F. Wang, and D. Wong, "Lulesh programming model and performance ports overview," Tech. Rep. LLNL-TR-608824, December 2012.
[2]
R. D. Hornung and J. A. Keasler, "The RAJA Poratability Layer: Overview and Status," Lawrence Livermore National Laboratory, Tech. Rep. LLNL-TR-661403, Sep. 2014.
[3]
B. T. N. Gunney, A. M. Wissink, and D. A. Hysom, "Parallel Clustering Algorithms for Structured AMR," Journal of Parallel and Distributed Computing, vol. 66, no. 11, pp. 1419--1430, 2006.
[4]
R. Falgout, J. Jones, and U. Yang, "The Design and Implementation of HYPRE, a Library of Parallel High Performance Preconditioners," Chapter in Numerical Solution of Partial Differential Equations on Parallel Computers, A.M. Bruaset and A. Tveito, eds., vol. 51, no. 4, pp. 267--294, 2006.
[5]
D. A. Beckingsale, W. Gaudin, A. Herdman, and S. Jarvis, "Resident Block-Structured Adaptive Mesh Refinement on Thousands of Graphics Processing Units," in Proceedings of the 44th International Conference on Parallel Processing. IEEE, Aug. 2015, pp. 61--70.
[6]
A. E. Eichenberger, J. M. Mellor-Crummey, M. Schulz, M. Wong, N. Copty, J. DelSignore, R. Dietrich, X. Liu, E. Loh, and D. Lorenz, "OMPT: OpenMP tools application programming interfaces for performance analysis," in Proc. of the 9th International Workshop on OpenMP (IWOMP), Canberra, Australia, ser. LNCS, no. 8122. Berlin / Heidelberg: Springer, 2013, pp. 171--185.
[7]
P. J. Mucci, S. Browne, C. Deane, and G. Ho, "PAPI: A portable interface to hardware performance counters," in Proc. Department of Defense HPCMP User Group Conference, Jun. 1999.
[8]
Knüpfer, Andreas and Rössel, Christian and Mey, Dieteran and Biersdorff, Scott and Diethelm, Kai and Eschweiler, Dominic and Geimer, Markus and Gerndt, Michael and Lorenz, Daniel and Malony, Allen and Nagel, Wolfgang E. and Oleynik, Yury and Philippen, Peter and Saviankou, Pavel and Schmidl, Dirk and Shende, Sameer and Tschüter, Ronny and Wagner, Michael and Wesarg, Bert and Wolf, Felix, "Score-P: A joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir," in Tools for High Performance Computing 2011, Brunst, Holger and Müller, Matthias S. and Nagel, Wolfgang E. and Resch, Michael M., Ed. Springer Berlin Heidelberg, 2011, pp. 79--91.
[9]
S. Shende and A. D. Malony, "The tau parallel performance system," International Journal of High Performance Computing Applications, vol. 20, no. 2, pp. 287--311, 2006.
[10]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent, "Hpctoolkit: Tools for performance analysis of optimized parallel programs," Concurrency and Computation: Practice and Experience, vol. 22, no. 6, pp. 685--701, 2010.
[11]
M. Schulz, J. Galarowicz, D. Maghrak, W. Hachfeld, D. Montoya, and S. Cranford, "Open|speedshop: An open source infrastructure for parallel performance analysis," Scientific Programming, vol. 16, no. 2--3, pp. 105--121, 2008.
[12]
M. Geimer, F. Wolf, B. J. N. Wylie, E. Ábrahám, D. Becker, and B. Mohr, "The Scalasca performance toolset architecture," Concurrency and Computation: Practice and Experience, vol. 22, no. 6, pp. 702--719, Apr. 2010. {Online}. Available: http://apps.fz-juelich.de/jsc-pubsystem/pub-webpages/general/get_attach.php?pubid=142
[13]
J. Mellor-Crummey, R. Fowler, and G. Marin, "HPCView: A tool for top-down analysis of node performance," The Journal of Supercomputing, vol. 23, pp. 81--101, 2002.
[14]
K. A. Huck and A. D. Malony, "Perfexplorer: A performance data mining framework for large-scale parallel computing," in Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, ser. SC '05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 41--. {Online}. Available: http://dx.doi.org/10.1109/SC.2005.55
[15]
W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach, "VAMPIR: Visualization and analysis of MPI resources," Supercomputer, vol. 12, no. 1, pp. 69--80, 1996.
[16]
L. D. Erman, F. Hayes-Roth, V. R. Lesser, and D. R. Reddy, "The hearsay-ii speech-understanding system: Integrating knowledge to resolve uncertainty," ACM Computing Surveys (CSUR), vol. 12, no. 2, pp. 213--253, 1980.
[17]
H. P. Nii, "Blackboard application systems, blackboard systems and a knowledge engineering perspective," AI magazine, vol. 7, no. 3, p. 82, 1986.
[18]
D. D. Corkill, "Blackboard systems," AI expert, vol. 6, no. 9, pp. 40--47, 1991.
[19]
K. Huck, A. Porterfield, N. Chaimov, H. Kaiser, A. D. Malony, T. Sterling, and R. Fowler, "An Autonomic Performance Environment for Exascale," Supercomputing Frontiers and Innovations, vol. 2, no. 3, 2015.
[20]
A. Mandal, R. Fowler, and A. Porterfield, "System-wide introspection for accurate attribution of performance bottlenecks," in Workshop on High-performance Infrastructure for Scalable Tools (WHIST), Venice, Italy, 06/2012 2012.
[21]
K. Varda, "Google's data interchange format," Online, July 7 2008, https://developers.google.com/protocol-buffers/.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2016
1034 pages
ISBN:9781467388153
  • Conference Chair:
  • John West

Sponsors

In-Cooperation

Publisher

IEEE Press

Publication History

Published: 13 November 2016

Check for updates

Author Tags

  1. computer performance
  2. high performance computing
  3. parallel processing
  4. performance analysis
  5. software performance
  6. software reusability
  7. software tools

Qualifiers

  • Research-article

Conference

SC16
Sponsor:

Acceptance Rates

SC '16 Paper Acceptance Rate 81 of 442 submissions, 18%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media