skip to main content
10.1145/3332466.3374505acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
poster

Reflector: a fine-grained I/O tracker for HPC systems

Published: 19 February 2020 Publication History

Abstract

We present Reflector, to support both high-level and low-level I/O monitoring through user-defined interfaces such as HDF5 and NetCDF in addition to POSIX- and MPI-IO. We evaluate Reflector on both an on-premises 500-core HPC cluster and a leadership-class supercomputer at the Lawrence Berkeley National Laboratory. Preliminary results are promising as the system prototype incurs negligible performance overhead and clearly illustrates the I/O patterns and bottlenecks of multiple applications.

References

[1]
Dorian C Arnold, Dong H Ahn, Bronis R De Supinski, Gregory L Lee, Barton P Miller, and Martin Schulz. 2007. Stack trace analysis for large scale debugging. In 2007 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1--10.
[2]
Philip Carns, Robert Latham, Robert Ross, Kamil Iskra, Samuel Lang, and Katherine Riley. 2009. 24/7 characterization of petascale I/O workloads. In 2009 IEEE International Conference on Cluster Computing and Workshops. IEEE, 1--10.
[3]
Anthony Chan, William Gropp, and Ewing Lusk. 2008. An efficient format for nearly constant-time access to arbitrary time intervals in large trace files. Scientific Programming 16, 2-3 (2008), 155--165.
[4]
GNU-Linker. Accessed 2019. https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_3.html.
[5]
GOTCHA. Accessed 2019. https://github.com/LLNL/GOTCHA.
[6]
HDF5. Accessed 2019. https://www.hdfgroup.org/.
[7]
NetCDF. Accessed 2019. http://www.unidata.ucar.edu/software/netcdf.
[8]
Nils Nieuwejaar, David Kotz, Apratim Purakayastha, C Sclatter Ellis, and Michael L Best. 1996. File-access characteristics of parallel scientific workloads. IEEE Transactions on Parallel and Distributed Systems 7, 10 (1996), 1075--1089.
[9]
PMPI. Accessed 2019. https://github.com/LLNL/wrap.
[10]
Philip C Roth. 2007. Characterizing the I/O behavior of scientific applications on the Cray XT. In Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Super computing'07. ACM, 50--55.
[11]
Seetharami Seelam, I-Hsin Chung, Ding-Yong Hong, Hui-Fang Wen, and Hao Yu. 2008. Early experiences in application level I/O tracing on Blue Gene systems. In 2008 IEEE International Symposium on Parallel and Distributed Processing. IEEE, 1--8.
[12]
Sameer S Shende and Allen D Malony. 2006. The TAU parallel performance system. The International Journal of High Performance Computing Applications 20, 2 (2006), 287--311.
[13]
Konstantin Shvachko, Hairong Kuang, Sanjay Radia, and Robert Chansler. 2010. The Hadoop Distributed File System. In Proceedings of IEEE Symposium on Mass Storage Systems and Technologies.
[14]
Jeffrey Vetter and Chris Chambreau. 2005. mpip: Lightweight, scalable mpi profiling. (2005).
[15]
Jeffrey S Vetter and Michael O McCracken. 2001. Statistical scalability analysis of communication operations in distributed applications. ACM SIGPLAN Notices 36, 7 (2001), 123--132.
[16]
Omer Zaki, Ewing Lusk, William Gropp, and Deborah Swider. 1999. Toward scalable performance visualization with Jumpshot. The International Journal of High Performance Computing Applications 13, 3 (1999), 277--288.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2020
454 pages
ISBN:9781450368186
DOI:10.1145/3332466
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 February 2020

Check for updates

Qualifiers

  • Poster

Conference

PPoPP '20

Acceptance Rates

PPoPP '20 Paper Acceptance Rate 28 of 121 submissions, 23%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media