skip to main content
10.1145/2966884.2966906acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Runtime Correctness Analysis of MPI-3 Nonblocking Collectives

Published: 25 September 2016 Publication History

Abstract

The Message Passing Interface (MPI) includes nonblocking collective operations that support additional overlap between computation and communication. These new operations enable complex data movement between large numbers of processes. However, their asynchronous behavior hides and complicates the detection of defects in their use. We highlight a lack of correctness tool support for these operations and extend the MUST runtime MPI correctness tool to alleviate this complexity. We introduce a classification to summarize the types of correctness analyses that are applicable to MPI's nonblocking collectives. We identify complex wait-for dependencies in deadlock situations and incorrect use of communication buffers as the most challenging types of usage errors. We devise, demonstrate, and evaluate the applicability of correctness analyses for these errors. A scalable analysis mechanism allows our runtime approach to scale with the application. Benchmark measurements highlight the scalability and applicability of our approach at up to 4,096 application processes and with low overhead.

References

[1]
C. Falzone, A. Chan, E. Lusk, and W. Gropp. Collective Error Detection for MPI Collective Operations. In B. Martino, D. Kranzlmüller, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 3666 of Lecture Notes in Computer Science, pages 138--147. Springer Berlin Heidelberg, 2005.
[2]
T. Hilbrich, B. R. de Supinski, W. E. Nagel, J. Protze, C. Baier, and M. S. Müller. Distributed Wait State Tracking for Runtime MPI Deadlock Detection. In Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis, SC'13, pages 16:1--16:12, New York, NY, USA, 2013. ACM.
[3]
T. Hilbrich, F. Hänsel, M. Schulz, B. R. de Supinski, M. S. Müller, W. E. Nagel, and J. Protze. Runtime MPI Collective Checking with Tree-Based Overlay Networks. In Recent Advances in the Message Passing Interface, Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013.
[4]
T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and Performance Analysis of Non-blocking Collective Operations for MPI. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, SC '07, pages 52:1--52:10, New York, NY, USA, 2007. ACM.
[5]
G. R. Luecke, H. Chen, J. Coyle, J. Hoekstra, M. Kraeva, and Y. Zou. MPI-CHECK: A Tool for Checking Fortran 90 MPI Programs. Concurrency and Computation: Practice and Experience, 15(2):93--100, 2003.
[6]
Message Passing Interface Forum. MPI: A Message-Passing Interface Standard, Version 3.0. http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf, 2012. Last visited on 27/11/2013.
[7]
P. Ohly and W. Krotz-Vogel. Automated MPI Correctness Checking: What if there was a magic option? In Proceedings of the 8th LCI International Conference on High-Performance Clustered Computing, 2007.
[8]
J. Protze, T. Hilbrich, A. Knüpfer, B. R. de Supinski, and M. S. Müller. Holistic Debugging of MPI Derived Datatypes. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS '12, pages 354--365. IEEE Computer Society, Washington, DC, USA, 2012.
[9]
J. Protze, T. Hilbrich, M. Schulz, B. R. de Supinski, W. E. Nagel, and M. S. Müller. MPI Runtime Error Detection with MUST: A Scalable and Crash-Safe Approach. In 43nd International Conference on Parallel Processing (ICPP), Fifth International Workshop on Parallel Software Tools and Tool Infrastructures, Los Alamitos, CA, USA, 2014. IEEE Computer Society.
[10]
E. Saillard, P. Carribault, and D. Barthou. Combining Static and Dynamic Validation of MPI Collective Communications. In Proceedings of the 20th European MPI Users' Group Meeting, EuroMPI '13, pages 117--122, New York, NY, USA, 2013. ACM.
[11]
V. Samofalov, V. Krukov, B. Kuhn, S. Zheltov, A. V. Konovalov, and J. DeSouza. Automated Correctness Analysis of MPI Programs with Intel(r) Message Checker. In Parallel Computing: Current & Future Issues of High-End Computing, Proceedings of the International Conference ParCo 2005, volume 33 of John von Neumann Institute for Computing Series, pages 901--908. Central Institute for Applied Mathematics, Jülich, Germany, 2005.
[12]
S. Siegel. Using MPI-Spin to Model Check MPI Programs with Nonblocking Communication. In Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI), 2006.
[13]
S. F. Siegel and T. K. Zirkel. Automatic Formal Verification of MPI-Based Parallel Programs. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, PPoPP '11, pages 309--310, New York, NY, USA, 2011. ACM.
[14]
J. L. Träff and J. Worringen. Verifying Collective MPI Calls. In D. Kranzlmüller, P. Kacsuk, and J. Dongarra, editors, Recent Advances in Parallel Virtual Machine and Message Passing Interface, volume 3241 of Lecture Notes in Computer Science, pages 18--27. Springer Berlin Heidelberg, 2004.
[15]
S. S. Vakkalanka, S. Sharma, G. Gopalakrishnan, and R. M. Kirby. ISP: A Tool for Model Checking MPI Programs. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '07, pages 285--286, New York, NY, USA, 2008. ACM.
[16]
J. S. Vetter and B. R. de Supinski. Dynamic Software Testing of MPI Applications with Umpire. In Proceedings of the 2000 ACM/IEEE conference on Supercomputing, Supercomputing '00, Washington, DC, USA, 2000. IEEE Computer Society.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
September 2016
225 pages
ISBN:9781450342346
DOI:10.1145/2966884
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroMPI 2016
EuroMPI 2016: The 23rd European MPI Users' Group Meeting
September 25 - 28, 2016
Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media