tutorial

Automatic Adaption of the Sampling Frequency for Detailed Performance Analysis

Authors:

Michael Wagner,

Andreas KnüpferAuthors Info & Claims

CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Pages 973 - 981

https://doi.org/10.1109/CCGRID.2017.43

Published: 14 May 2017 Publication History

Abstract

One of the most urgent challenges in event based performance analysis is the enormous amount of collected data. Combining event tracing and periodic sampling has been a successful approach to allow a detailed event-based recording of MPI communication and a coarse recording of the remaining application with periodic sampling. In this paper, we present a novel approach to automatically adapt the sampling frequency during runtime to the given amount of buffer space, releasing users to find an appropriate sampling frequency themselves. This way, the entire measurement can be kept within a single memory buffer, which avoids disruptive intermediate memory buffer flushes, excessive data volumes, and measurement delays due to slow file system interaction. We describe our approach to sort and store samples based on their order of occurrence in an hierarchical array based on powers of two. Furthermore, we evaluate the feasibility as well as the overhead of the approach with the prototype implementation OTFX based on the Open Trace Format 2, a state-of-the-art Open Source event trace library used by the performance analysis tools Vampir, Scalasca, and Tau.

References

[1]

Top500, "Top 500 supercomputer sites," Nov 2016, http://www.top500.org.

[2]

A. Knüpfer, C. Rössel, D. Mey, S. Biersdorff, K. Diethelm, D. Eschweiler, M. Geimer, M. Gerndt, D. Lorenz, A. Malony, W. E. Nagel, Y. Oleynik, P. Philippen, P. Saviankou, D. Schmidl, S. Shende, R. Tschüter, M. Wagner, B. Wesarg, and F. Wolf, "Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir," in Tools for High Performance Computing 2011, H. Brunst, M. S. Müller, W. E. Nagel, and M. M. Resch, Eds. Springer Berlin Heidelberg, 2012, pp. 79--91.

[3]

"Extrae instrumentation package," http://tools.bsc.es/extrae.

[4]

T. Ilsche, J. Schuchart, R. Schöne, and D. Hackenberg, "Combining Instrumentation and Sampling for Trace-Based Application Performance Analysis," in Tools for High Performance Computing 2014: Proceedings of the 8th International Workshop on Parallel Tools for High Performance Computing. Springer International Publishing, 2015, pp. 123--136.

[5]

V. M. Weaver, "Linux perf_event Features and Overhead," in Proceedings of the 2013 FastPath Workshop, 2013.

[6]

D. Eschweiler, M. Wagner, M. Geimer, A. Knüpfer, W. E. Nagel, and F. Wolf, "Open Trace Format 2: The Next Generation of Scalable Trace Formats and Support Libraries," in Applications, Tools and Techniques on the Road to Exascale Computing, ser. Advances in Parallel Computing, vol. 22, 2012, pp. 481--490.

[7]

Virtual Institute - High Productivity Supercomputing (VI-HPS), "Score-P and OTF2 website and download page," May 2015, http://www.vihps.org/projects/score-p.

[8]

M. Wagner and W. E. Nagel, "Strategies for Real-Time Event Reduction," in Euro-Par 2012: Parallel Processing Workshops, ser. Lecture Notes in Computer Science. Springer, 2013, vol. 7640, pp. 429--438.

Digital Library

[9]

M. Wagner, A. Knüpfer, and W. E. Nagel, "Hierarchical Memory Buffering Techniques for an In-Memory Event Tracing Extension to the Open Trace Format 2," in Parallel Processing (ICPP), 2013 42nd International Conference on, 2013, pp. 970--976.

Digital Library

[10]

A. Knüpfer, H. Brunst, J. Doleschal, M. Jurenz, M. Lieber, H. Mickler, M. S. Müller, and W. E. Nagel, "The Vampir Performance Analysis Tool Set," in Tools for High Performance Computing. Springer, July 2008, pp. 139--155.

[11]

M. Geimer, F. Wolf, B. J. Wylie, E. Ábrahám, D. Becker, and B. Mohr, "The Scalasca Performance Toolset Architecture," Concurrency and Computation: Practice and Experience, vol. 22, no. 6, pp. 702--719, 2010.

Digital Library

[12]

S. S. Shende and A. D. Malony, "The Tau Parallel Performance System," International Journal on High Performance Computing Applications, vol. 20, no. 2, pp. 287--311, 2006.

Digital Library

[13]

J. Stolle, M. Wagner, J. Doleschal, F. Schmitt, and H. Brunst, "Adaptive Runtime Filtering: Reducing Trace Size and Bias in Event-Based Performance Analysis," in 18th International Conference on Computational Science and Engineering, 2015, pp. 262--269.

Digital Library

[14]

J. Mußler, D. Lorenz, and F. Wolf, "Reducing the Overhead of Direct Application Instrumentation Using Prior Static Analysis," in Proceedings of the 17th International Conference on Parallel Processing -Volume Part I, ser. Euro-Par'11. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 65--76.

Digital Library

[15]

M. Wagner, A. Knüpfer, and W. E. Nagel, "OTFX: An In-memory Event Tracing Extension to the Open Trace Format 2," in Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2016, pp. 3--17.

[16]

M. Wagner, J. Doleschal, A. Knüpfer, and W. E. Nagel, "Selective Runtime Monitoring: Non-intrusive Elimination of High-frequency Functions," in Proceedings of the International Conference on High Performance Computing & Simulation (HPCS), 2014, pp. 295--302.

[17]

A. Knüpfer and W. E. Nagel, "Compressible Memory Data Structures for Event-based Trace Analysis," Future Gener. Comput. Syst., vol. 22, no. 3, pp. 359--368, 2006.

Digital Library

[18]

H. Servat, G. Llort, J. Gimneza, and J. Labarta, "Detailed Performance Analysis Using Coarse Grain Sampling," in Euro-Par 2009 - Parallel Processing Workshops, 2010, pp. 185--198.

Digital Library

[19]

G. Llort, J. Gonzalez, H. Servat, J. Gimenez, and J. Labarta, "On-line Detection of Large-scale Parallel Application's Structure," in Parallel Distributed Processing (IPDPS), 2010 IEEE International Symposium on, 2010, pp. 1--10.

[20]

C. E. Leiserson, H. Prokop, and K. H. Randall, "Using de Bruijn Sequences to Index a 1 in a Computer Word," 1998.

[21]

B. Hess, C. Kutzner, D. van der Spoel, and E. Lindahl, "GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation," Journal of Chemical Theory and Computation, vol. 4, no. 3, pp. 435--447, 2008.

[22]

M. Lieber, V. Grützun, R. Wolke, M. S. Müller, and W. E. Nagel, "Highly Scalable Dynamic Load Balancing in the Atmospheric Modeling System COSMO-SPECS+FD4," in Applied Parallel and Scientific Computing, ser. LNCS, vol. 7133. Springer, 2012, pp. 131--141.

Digital Library

[23]

Argonne National Laboratories, "Nek5000 website," May 2015, http://nek5000.mcs.anl.gov.

[24]

S. Plimpton, "Fast Parallel Algorithms for Short-Range Molecular Dynamics," Journal of Computational Physics, vol. 117, no. 1, pp. 1--19, 1995.

Digital Library

[25]

Sandia National Laboratories, "Lammps website," May 2015, http://lammps.sandia.gov.

[26]

"Open Trace Format 2 User Manual," http://www.vihps.org/projects/score-p.

Recommendations

Detailed performance analysis using coarse grain sampling
Euro-Par'09: Proceedings of the 2009 international conference on Parallel processing

Performance evaluation tools enable analysts to shed light on how applications behave both from a general point of view and at concrete execution points, but cannot provide detailed information beyond the monitored regions of code.

Having the ability to ...
An analysis of packet sampling in the frequency domain
IMC '09: Proceedings of the 9th ACM SIGCOMM conference on Internet measurement

Packet sampling techniques introduce measurement errors that should be carefully handled in order to correctly characterize the network behavior. In the literature several works have studied the statistical properties of packet sampling and the way it ...
Efficient trace-sampling simulation techniques for cache performance analysis
SS '96: Proceedings of the 29th Annual Simulation Symposium (SS '96)

We focus on the simulation techniques in order to reduce the space and time requirements for simulating large caches. First, we propose a space sampling technique to perform trace reduction for time and space. Our approach is to perform stratified ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CCGrid '17: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

May 2017

1167 pages

ISBN:9781509066100

Sponsors

Publisher

IEEE Press

Publication History

Published: 14 May 2017

Check for updates

Qualifiers

Tutorial
Research
Refereed limited

Conference

CCGrid '17

Sponsor:

SIGARCH

CCGrid '17: 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

May 14 - 17, 2017

Madrid, Spain

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
33
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents