skip to main content
10.1145/1362622.1362662acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

The ghost in the machine: observing the effects of kernel operation on parallel application performance

Published: 10 November 2007 Publication History

Abstract

The performance of a parallel application on a scalable HPC system is determined by user-level execution of the application code and system-level (OS kernel) operations. To understand the influences of system-level factors on application performance, the measurement of OS kernel activities is key. We describe a technology to observe kernel actions and make this information available to application-level performance measurement tools. The benefits of merged application and OS performance information and its use in parallel performance analysis are demonstrated, both for profiling and tracing methodologies. In particular, we focus on the problem of kernel noise assessment as a stress test of the approach. We show new results for characterizing noise and introduce new techniques for evaluating noise interference and its effects on application execution. Our kernel measurement and noise analysis technologies are being developed as part of Linux OS environments for scalable parallel systems.

References

[1]
PAPI: Performance Application Programming Interface. http://icl.cs.utk.edu/projects/papi/.
[2]
TAU: Tuning and Analysis Utilities. http://www.cs.uoregon.edu/research/paracomp/tau/.
[3]
R. M. Badia, J. Labarta, J. Giménez, and F. Ascalé. DIMEMAS: Predicting MPI application behavior in Grid environments. In Workshop on Grid Applications and Programming Tools (GGF8), 2003.
[4]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The nas parallel benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.
[5]
P. Beckman, K. Iskra, K. Yoshi, S. Coghlan, and A. Nataraj. Benchmarking the Effect of Operating System Interferences on Extreme-Scale Parallel Machines. IEEE Cluster Computing Journal, to appear.
[6]
R. Bell, A. D. Malony, and S. Shende. A portable, extensible, and scalable tool for parallel performance profile analysis. Lecture Notes in Computer Science, 2790:17--26, 2003.
[7]
B. M. Cantrill, M. W. Shapiro, and A. H. Leventhal. Dynamic instrumentation of production systems. In USENIX '04: Proceedings of the 2004 USENIX Annual Technical Conference, page 13, Boston, MA, USA, 2004. USENIX.
[8]
W. Feng, M. K. Gardner, and J. R. Hay. The magnet toolkit: Design, implementation and evaluation. Journal of Supercomputing, 23:67--79, August 2002.
[9]
T. Jones, S. Dawson, R. Neely, W. Tuel, L. Brenner, J. Fier, R. Blackmore, P. Caffrey, B. Maskell, P. Tomlinson, and M. Roberts. Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, page 10, Washington, DC, USA, 2003. IEEE Computer Society.
[10]
K. R. Koch, R. S. Baker, and R. E. Alcouffe. Solution of the first-order form of the 3-D discrete ordinates equation on a massively parallel processor. Transactions of the American Nuclear Society, 65:198--199, 1992.
[11]
A. Malony and S. Shende. Overhead Compensation in Performance Profiling. In EuroPar '04: European Conference on Parallel Processing, pages 119--132, Sept. 2004. (Best paper award).
[12]
A. Malony, S. Shende, and A. Morris. Phase-based Parallel Performance Profiling. In PARCO '05: Conference on Parallel Computing, Sept. 2005.
[13]
A. Mirgorodskiy and B. P. Miller. Crosswalk: A tool for performance profiling across the user-kernel boundary.
[14]
R. Mraz. Reducing the variance of point to point transfers in the IBM 9076 parallel computer. In SC '94: ACM/IEEE Conference on Supercomputing, 1994.
[15]
A. Nataraj, A. Malony, S. Shende, and A. Morris. Integrated parallel performance views. IEEE Cluster Computing Journal, to appear.
[16]
A. Nataraj, A. Malony, S. Shende, and A. Morris. Kernel-Level Measurement for Integrated Parallel Performance Views: the KTAU Project. In IEEE Conference on Cluster Computing, Sept. 2006. (Best paper award).
[17]
F. Petrini, D. Kerbyson, and S. Pakin. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of AS CI Q. In ACM/IEEE SC2003, Phoenix, Arizona, Nov. 10--16 2003.
[18]
Y. Ruan and V. Pai. Making the "box" transparent: System call performance as a first-class result. In USENIX '04: Proceedings of the 2004 USENIX Annual Technical Conference, page 15, Boston, MA, USA, 2004. USENIX.
[19]
S. Sharma, P. G. Bridges, and A. B. Maccabe. A framework for analyzing linux system overheads on hpc applications. In LACSI '05: Proceedings of the 2005 Los Alamos Computer Science Institute Symposium, page 17, Santa Fe, NM, USA, 2005.
[20]
M. Sottile and R. Minnich. Analysis of Microbenchmarks for the Performance Tuning of Clusters. In Cluster '04: IEEE Conference on Cluster Computing, 2004.
[21]
M. J. Sottile, V. P. Chandu, and D. A. Bader. Performance analysis of parallel programs via message-passing graph traversal. In IPDPS '06: 2006 IEEE International Parallel and Distributed Processing Symposium (IPDPS '06), 2006.
[22]
A. Tamches and B. P. Miller. Fine-grained dynamic instrumentation of commodity operating system kernels. In OSDI '99: Operating Systems Design and Implementation, pages 117--130, 1999.
[23]
F. Wolf, A. Malony, S. Shende, and A. Morris. Trace-based Parallel Performance Overhead Compensation. In L. T. Yang, et. al., editor, HPCC '05: High Performance Computation Conference, volume LNCS 3726, pages 617--628. Springer, Sept. 2005.
[24]
K. Yaghmour and M. R. Dagenais. Measuring and characterizing system behavior using kernel-level event logging. In USENIX '00: USENIX Annual Technical Conference, Boston, MA, USA, 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing
November 2007
723 pages
ISBN:9781595937643
DOI:10.1145/1362622
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 November 2007

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

SC '07
Sponsor:

Acceptance Rates

SC '07 Paper Acceptance Rate 54 of 268 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media