skip to main content
10.1145/2488551.2488570acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Runtime MPI collective checking with tree-based overlay networks

Published: 15 September 2013 Publication History

Abstract

Runtime error detection tools detect many classes of MPI usage errors, including errors in collective communication calls. However, they often face scalability challenges. We present runtime checks for MPI collective operations that use a Tree-Based Overlay Network (TBON) for scalability and that provide full datatype matching. While we can use transitive correctness properties for most checks, some collective operations impose non-transitive correctness properties, e.g., MPI_Alltoallv, where we use an intralayer communication within the TBON to distribute datatype matching information. An overhead study with stress tests and two benchmark suites demonstrates applicability and scalability at 4,096, 2,048 and 16,384 processes respectively.

References

[1]
D. H. Bailey, L. Dagum, E. Barszcz, and H. D. Simon. NAS Parallel Benchmark Results. Technical report, IEEE Parallel and Distributed Technology, 1992.
[2]
D. Buntinas, G. Bosilca, R. L. Graham, G. Vallée, and G. R. Watson. A Scalable Tools Communications Infrastructure. In Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications, HPCS '08, pages 33--39, Washington, DC, USA, 2008. IEEE Computer Society.
[3]
C. Falzone, A. Chan, and E. Lusk. Collective error detection for MPI collective operations. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 12th European PVM/MPI Users' Group Meeting, pages 138--147. Springer, 2005.
[4]
W. Gropp. Runtime Checking of Datatype Signatures in MPI. In Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pages 160--167, London, UK, 2000. Springer.
[5]
T. Hilbrich, M. S. Müller, B. R. de Supinski, M. Schulz, and W. E. Nagel. GTI: A Generic Tools Infrastructure for Event-Based Tools in Parallel Systems. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS '12, pages 1364--1375, Washington, DC, USA, 2012. IEEE Computer Society.
[6]
T. Hilbrich, M. S. Müller, M. Schulz, and B. R. de Supinski. Order Preserving Event Aggregation in TBONs. In Y. Cotronis, A. Danalis, D. Nikolopoulos, and J. Dongarra, editors, Recent Advances in the Message Passing Interface, volume 6960 of Lecture Notes in Computer Science, pages 19--28. Springer Berlin/Heidelberg, 2011.
[7]
T. Hilbrich, J. Protze, M. Schulz, B. R. de Supinski, and M. S. Müller. MPI Runtime Error Detection with MUST: Advances in Deadlock Detection. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 30:1--30:11, Los Alamitos, CA, USA, 2012. IEEE Computer Society.
[8]
G. R. Luecke, Y. Zou, J. Coyle, J. Hoekstra, and M. Kraeva. Deadlock Detection in MPI Programs. Concurrency and Computation: Practice and Experience, 14:911--932, 2002.
[9]
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 2.2. http://www.mpi-forum.org/docs/mpi22-report.pdf, 2009.
[10]
M. S. Müller, M. van Waveren, R. Lieberman, B. Whitney, H. Saito, K. Kumaran, J. Baron, W. C. Brantley, C. Parrott, T. Elken, H. Feng, and C. Ponder. SPEC MPI2007 -- An Application Benchmark Suite for Parallel Systems using MPI. Concurrency and Computation: Practice and Experience, 22(2):191--205, 2010.
[11]
J. Protze, T. Hilbrich, A. Knüpfer, B. R. de Supinski, and M. S. Müller. Holistic Debugging of MPI Derived Datatypes. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS '12, pages 354--365. IEEE Computer Society, Washington, DC, USA, 2012.
[12]
P. C. Roth, D. C. Arnold, and B. P. Miller. MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools. In Proceedings of the 2003 ACM/IEEE conference on Supercomputing, SC '03, New York, NY, USA, 2003. ACM.
[13]
J. L. Träff and J. Worringen. Verifying Collective MPI Calls. In Recent Advances In Parallel Virtual Machine And Message Passing. 11th European PVM/MPI Users' Group Meeting. LNCS 3241, pages 18--27. Springer, 2004.
[14]
S. S. Vakkalanka, S. Sharma, G. Gopalakrishnan, and R. M. Kirby. ISP: A Tool for Model Checking MPI Programs. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pages 285--286, New York, NY, USA, 2008. ACM.
[15]
J. S. Vetter and B. R. de Supinski. Dynamic Software Testing of MPI Applications with Umpire. In Proceedings of the 2000 ACM/IEEE conference on Supercomputing, Supercomputing '00, Washington, DC, USA, 2000. IEEE Computer Society.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting
September 2013
289 pages
ISBN:9781450319034
DOI:10.1145/2488551
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • ARCOS: Computer Architecture and Technology Area, Universidad Carlos III de Madrid

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI collectives
  2. correctness
  3. tree-based overlay networks

Qualifiers

  • Research-article

Conference

EuroMPI '13
Sponsor:
  • ARCOS
EuroMPI '13: 20th European MPI Users's Group Meeting
September 15 - 18, 2013
Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media