skip to main content
10.1109/CCGRID.2017.43acmconferencesArticle/Chapter ViewAbstractPublication PagesccgridConference Proceedingsconference-collections
tutorial

Automatic Adaption of the Sampling Frequency for Detailed Performance Analysis

Published: 14 May 2017 Publication History

Abstract

One of the most urgent challenges in event based performance analysis is the enormous amount of collected data. Combining event tracing and periodic sampling has been a successful approach to allow a detailed event-based recording of MPI communication and a coarse recording of the remaining application with periodic sampling. In this paper, we present a novel approach to automatically adapt the sampling frequency during runtime to the given amount of buffer space, releasing users to find an appropriate sampling frequency themselves. This way, the entire measurement can be kept within a single memory buffer, which avoids disruptive intermediate memory buffer flushes, excessive data volumes, and measurement delays due to slow file system interaction. We describe our approach to sort and store samples based on their order of occurrence in an hierarchical array based on powers of two. Furthermore, we evaluate the feasibility as well as the overhead of the approach with the prototype implementation OTFX based on the Open Trace Format 2, a state-of-the-art Open Source event trace library used by the performance analysis tools Vampir, Scalasca, and Tau.

References

[1]
Top500, "Top 500 supercomputer sites," Nov 2016, http://www.top500.org.
[2]
A. Knüpfer, C. Rössel, D. Mey, S. Biersdorff, K. Diethelm, D. Eschweiler, M. Geimer, M. Gerndt, D. Lorenz, A. Malony, W. E. Nagel, Y. Oleynik, P. Philippen, P. Saviankou, D. Schmidl, S. Shende, R. Tschüter, M. Wagner, B. Wesarg, and F. Wolf, "Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir," in Tools for High Performance Computing 2011, H. Brunst, M. S. Müller, W. E. Nagel, and M. M. Resch, Eds. Springer Berlin Heidelberg, 2012, pp. 79--91.
[3]
"Extrae instrumentation package," http://tools.bsc.es/extrae.
[4]
T. Ilsche, J. Schuchart, R. Schöne, and D. Hackenberg, "Combining Instrumentation and Sampling for Trace-Based Application Performance Analysis," in Tools for High Performance Computing 2014: Proceedings of the 8th International Workshop on Parallel Tools for High Performance Computing. Springer International Publishing, 2015, pp. 123--136.
[5]
V. M. Weaver, "Linux perf_event Features and Overhead," in Proceedings of the 2013 FastPath Workshop, 2013.
[6]
D. Eschweiler, M. Wagner, M. Geimer, A. Knüpfer, W. E. Nagel, and F. Wolf, "Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries," in Applications, Tools and Techniques on the Road to Exascale Computing, ser. Advances in Parallel Computing, vol. 22, 2012, pp. 481--490.
[7]
Virtual Institute - High Productivity Supercomputing (VI-HPS), "Score-P and OTF2 website and download page," May 2015, http://www.vihps.org/projects/score-p.
[8]
M. Wagner and W. E. Nagel, "Strategies for Real-Time Event Reduction," in Euro-Par 2012: Parallel Processing Workshops, ser. Lecture Notes in Computer Science. Springer, 2013, vol. 7640, pp. 429--438.
[9]
M. Wagner, A. Knüpfer, and W. E. Nagel, "Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2," in Parallel Processing (ICPP), 2013 42nd International Conference on, 2013, pp. 970--976.
[10]
A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel, "The Vampir Performance Analysis Tool Set," in Tools for High Performance Computing. Springer, July 2008, pp. 139--155.
[11]
M. Geimer, F. Wolf, B. J. Wylie, E. Ábrahám, D. Becker, and B. Mohr, "The Scalasca Performance Toolset Architecture," Concurrency and Computation: Practice and Experience, vol. 22, no. 6, pp. 702--719, 2010.
[12]
S. S. Shende and A. D. Malony, "The Tau Parallel Performance System," International Journal on High Performance Computing Applications, vol. 20, no. 2, pp. 287--311, 2006.
[13]
J. Stolle, M. Wagner, J. Doleschal, F. Schmitt, and H. Brunst, "Adaptive Runtime Filtering: Reducing Trace Size and Bias in Event-Based Performance Analysis," in 18th International Conference on Computational Science and Engineering, 2015, pp. 262--269.
[14]
J. Mußler, D. Lorenz, and F. Wolf, "Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis," in Proceedings of the 17th International Conference on Parallel Processing -Volume Part I, ser. Euro-Par'11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 65--76.
[15]
M. Wagner, A. Knüpfer, and W. E. Nagel, "OTFX: An In-memory Event Tracing Extension to the Open Trace Format 2," in Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2016, pp. 3--17.
[16]
M. Wagner, J. Doleschal, A. Knüpfer, and W. E. Nagel, "Selective Runtime Monitoring: Non-intrusive Elimination of High-frequency Functions," in Proceedings of the International Conference on High Performance Computing & Simulation (HPCS), 2014, pp. 295--302.
[17]
A. Knüpfer and W. E. Nagel, "Compressible Memory Data Structures for Event-based Trace Analysis," Future Gener. Comput. Syst., vol. 22, no. 3, pp. 359--368, 2006.
[18]
H. Servat, G. Llort, J. Gimneza, and J. Labarta, "Detailed Performance Analysis Using Coarse Grain Sampling," in Euro-Par 2009 - Parallel Processing Workshops, 2010, pp. 185--198.
[19]
G. Llort, J. Gonzalez, H. Servat, J. Gimenez, and J. Labarta, "On-line Detection of Large-scale Parallel Application's Structure," in Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, 2010, pp. 1--10.
[20]
C. E. Leiserson, H. Prokop, and K. H. Randall, "Using de Bruijn Sequences to Index a 1 in a Computer Word," 1998.
[21]
B. Hess, C. Kutzner, D. van der Spoel, and E. Lindahl, "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation," Journal of Chemical Theory and Computation, vol. 4, no. 3, pp. 435--447, 2008.
[22]
M. Lieber, V. Grützun, R. Wolke, M. S. Müller, and W. E. Nagel, "Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4," in Applied Parallel and Scientific Computing, ser. LNCS, vol. 7133. Springer, 2012, pp. 131--141.
[23]
Argonne National Laboratories, "Nek5000 website," May 2015, http://nek5000.mcs.anl.gov.
[24]
S. Plimpton, "Fast Parallel Algorithms for Short-Range Molecular Dynamics," Journal of Computational Physics, vol. 117, no. 1, pp. 1--19, 1995.
[25]
Sandia National Laboratories, "Lammps website," May 2015, http://lammps.sandia.gov.
[26]
"Open Trace Format 2 User Manual," http://www.vihps.org/projects/score-p.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
May 2017
1167 pages
ISBN:9781509066100

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 May 2017

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

CCGrid '17
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 33
    Total Downloads
  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media