skip to main content
10.1145/3343211.3343221acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Mixing ranks, tasks, progress and nonblocking collectives

Published: 11 September 2019 Publication History

Abstract

Since the beginning, MPI has defined the rank as an implicit attribute associated with the MPI process' environment. In particular, each MPI process generally runs inside a given UNIX process and is associated with a fixed identifier in its WORLD communicator. However, this state of things is about to change with the rise of new abstractions such as MPI Sessions. In this paper, we propose to outline how such evolution could enable optimizations which were previously linked to specific MPI runtimes executing MPI processes in shared memory (e.g. thread-based MPI). By implementing runtime-level work-sharing through what we define as MPI tasks, enabling the ability to progress indifferently from stream context we show that there is potential for improved asynchronous progress. In the absence of a Session implementation, this assumption is validated in the context of a thread-based MPI where nonblocking Collective (NBC) were implemented on top of Extended Generic Requests progressed by any rank on the node thanks to an MPI extension enabling threads to dynamically share their MPI context.

References

[1]
E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. 2009. The Design of OpenMP Tasks. IEEE Transactions on Parallel and Distributed Systems 20, 3 (March 2009), 404--418.
[2]
Brian Barrett, Ronald B. Brightwell, Ryan Grant, Kevin Pedretti, Kyle Wheeler, Keith D. Underwood, Rolf Riesen, Arthur B. Maccabe, Trammel Hudson, and Scott Hemmert. 2017. The Portals 4.1 Network Programming Interface. (4 2017).
[3]
David E. Bernholdt, Swen Boehm, George Bosilca, Manjunath Gorentla Venkata, Ryan E. Grant, Thomas Naughton, Howard P. Pritchard, Martin Schulz, and Geoffroy R. Vallee. {n. d.}. A survey of MPI usage in the US exascale computing project. Concurrency and Computation: Practice and Experience 0, 0 ({n. d.}), e4851. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.4851 e4851 cpe.4851.
[4]
Jean-Baptiste Besnard, Julien Adam, Sameer Shende, Marc Pérache, Patrick Carribault, Julien Jaeger, and Allen D. Maloney. 2016. Introducing Task-Containers As an Alternative to Runtime-Stacking. In Proceedings of the 23rd European MPI Users' Group Meeting (EuroMPI 2016). ACM, New York, NY, USA, 51--63.
[5]
Jean-Baptiste Besnard, Allen Malony, Sameer Shende, Marc Pérache, Patrick Carribault, and Julien Jaeger. 2015. An MPI Halo-Cell Implementation for Zero-Copy Abstraction. In Proceedings of the 22ND European MPI Users' Group Meeting (EuroMPI '15). ACM, New York, NY, USA, Article 3, 9 pages.
[6]
Antoine Capra, Patrick Carribault, Jean-Baptiste Besnard, Allen D. Malony, Marc Pérache, and Julien Jaeger. 2017. User Co-scheduling for MPI+OpenMP Applications Using OpenMP Semantics. In Scaling OpenMP for Exascale Performance and Portability, Bronis R. de Supinski, Stephen L. Olivier, Christian Terboven, Barbara M. Chapman, and Matthias S. Müller (Eds.). Springer International Publishing, Cham, 203--216.
[7]
Ralph H. Castain, David Solt, Joshua Hursey, and Aurelien Bouteiller. 2017. PMIx: Process Management for Exascale Environments. In Proceedings of the 24th European MPI Users' Group Meeting (EuroMPI '17). ACM, New York, NY, USA, Article 14, 10 pages.
[8]
Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, Marc Pérache, and Hugo Taboada. 2018. Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor. In Euro-Par 2018: Parallel Processing, Marco Aldinucci, Luca Padovani, and Massimo Torquati (Eds.). Springer International Publishing, Cham, 616--627.
[9]
Alexandre Denis, Julien Jaeger, and Hugo Taboada. 2019. Progress Thread Placement for Overlapping MPI Non-blocking Collectives Using Simultaneous Multi-threading. In Euro-Par 2018: Parallel Processing Workshops, Gabriele Mencagli, Dora B. Heras, Valeria Cardellini, Emiliano Casalicchio, Emmanuel Jeannot, Felix Wolf, Antonio Salis, Claudio Schifanella, Ravi Reddy Manumachu, Laura Ricci, Marco Beccuti, Laura Antonelli, José Daniel Garcia Sanchez, and Stephen L. Scott (Eds.). Springer International Publishing, Cham, 123--133.
[10]
S. Derradji, T. Palfer-Sollier, J. Panziera, A. Poudes, and F. W. Atos. 2015. The BXI Interconnect Architecture. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects. 18--25.
[11]
Sylvain Didelot, Patrick Carribault, Marc Pérache, and William Jalby. 2014. Improving MPI communication overlap with collaborative polling. Computing 96, 4 (01 Apr 2014), 263--278.
[12]
James Dinan, Pavan Balaji, David Goodell, Douglas Miller, Marc Snir, and Rajeev Thakur. 2013. Enabling MPI Interoperability Through Flexible Communication Endpoints. In Proceedings of the 20th European MPI Users' Group Meeting (EuroMPI '13). ACM, New York, NY, USA, 13--18.
[13]
Dan Holmes et al. 2019. MPI Session working group wiki. https://github.com/mpiwg-sessions/sessions-issues/wiki
[14]
Mario Flajslik, James Dinan, and Keith D. Underwood. 2016. Mitigating MPI Message Matching Misery. In High Performance Computing, Julian M. Kunkel, Pavan Balaji, and Jack Dongarra (Eds.). Springer International Publishing, Cham, 281--299.
[15]
Andrew Friedley, Greg Bronevetsky, Torsten Hoefler, and Andrew Lumsdaine. 2013. Hybrid MPI: Efficient Message Passing for Multi-core Systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '13). ACM, New York, NY, USA, Article 18, 11 pages.
[16]
Andrew Friedley, Torsten Hoefler, Greg Bronevetsky, Andrew Lumsdaine, and Ching-Chen Ma. 2013. Ownership Passing: Efficient Distributed Memory Programming on Multi-core Systems. SIGPLAN Not. 48, 8 (Feb. 2013), 177--186.
[17]
Al Geist, William Gropp, Steve Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, William Saphir, Tony Skjellum, and Marc Snir. 1996. MPI-2: Extending the message-passing interface. In European Conference on Parallel Processing. Springer, 128--135.
[18]
Ryan Grant, Anthony Skjellum, and Purushotham V Bangalore. 2015. Lightweight threading with MPI using Persistent Communications Semantics. Technical Report. Sandia National Lab.(SNL-NM), Albuquerque, NM (United States).
[19]
Torsten Hoefler, Greg Bronevetsky, Brian Barrett, Bronis R. de Supinski, and Andrew Lumsdaine. 2010. Efficient MPI Support for Advanced Hybrid Programming Models. In Recent Advances in the Message Passing Interface, Rainer Keller, Edgar Gabriel, Michael Resch, and Jack Dongarra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 50--61.
[20]
Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, and Ron Brightwell. 2017. sPIN: High-performance Streaming Processing In the Network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, Article 59, 16 pages.
[21]
Torsten Hoefler, Andrew Lumsdaine, and Wolfgang Rehm. 2007. Implementation and Performance Analysis of Non-blocking Collective Operations for MPI. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07). ACM, New York, NY, USA, Article 52, 10 pages.
[22]
Daniel Holmes, Kathryn Mohror, Ryan E. Grant, Anthony Skjellum, Martin Schulz, Wesley Bland, and Jeffrey M. Squyres. 2016. MPI Sessions: Leveraging Runtime Infrastructure to Increase Scalability of Applications at Exascale. In Proceedings of the 23rd European MPI Users' Group Meeting (EuroMPI 2016). ACM, New York, NY, USA, 121--129.
[23]
Daniel John Holmes. 2012. McMPI-a managed-code message passing interface library for high performance communication in C. (2012). http://hdl.handle.net/1842/7732
[24]
Atsushi Hori, Min Si, Balazs Gerofi, Masamichi Takagi, Jai Dayal, Pavan Balaji, and Yutaka Ishikawa. 2018. Process-in-process: Techniques for Practical Address-space Sharing. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '18). ACM, New York, NY, USA, 131--143.
[25]
Chao Huang, Orion Lawlor, and L. V. Kalé. 2004. Adaptive MPI. In Languages and Compilers for Parallel Computing, Lawrence Rauchwerger (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 306--322.
[26]
Robert Latham, William Gropp, Robert Ross, and Rajeev Thakur. 2007. Extending the MPI-2 Generalized Request Interface. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, Franck Cappello, Thomas Herault, and Jack Dongarra (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 223--232.
[27]
Stas Negara, Gengbin Zheng, Kuo-Chuan Pan, Natasha Negara, Ralph E. Johnson, Laxmikant V. Kalé, and Paul M. Ricker. 2011. Automatic MPI to AMPI Program Transformation Using Photran. In Euro-Par 2010 Parallel Processing Workshops, Mario R. Guarracino, Frédéric Vivien, Jesper Larsson Träff, Mario Cannatoro, Marco Danelutto, Anders Hast, Francesca Perla, Andreas Küpfer, Beniamino Di Martino, and Michael Alexander (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 531--539.
[28]
Marc Pérache, Hervé Jourdren, and Raymond Namyst. 2008. MPC: A Unified Parallel Runtime for Clusters of NUMA Machines. In Euro-Par 2008 - Parallel Processing, Emilio Luque, Tomàs Margalef, and Domingo Benítez (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 78--88.
[29]
Kevin Sala, Xavier Teruel, Josep M. Perez, Antonio J. Peña, Vicenç Beltran, and Jesus Labarta. 2019. Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85 (2019), 153 -- 166.
[30]
S. Seo, R. Latham, J. Zhang, and P. Balaji. 2015. Implementation and Evaluation of MPI Nonblocking Collective I/O. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 1084--1091.
[31]
M. Si, A. J. Peña, J. Hammond, P. Balaji, M. Takagi, and Y. Ishikawa. 2015. Casper: An Asynchronous Progress Model for MPI RMA on Many-Core Architectures. In 2015 IEEE International Parallel and Distributed Processing Symposium. 665--676.
[32]
Hugo Taboada. 2018. Recouvrement des Collectives MPI Non-bloquantes sur Processeur Manycore. Ph.D. Dissertation. http://www.theses.fr/2018BORD0365 Thèse de doctorat dirigée par Jeannot, Emmanuel et Denis, Alexandre Informatique Bordeaux 2018.
[33]
R. Thakur, W. Gropp, Mathematics, Computer Science, and Univ. of Illinois. 2009. Test suite for evaluating performance of multithreaded MPI communication. Parallel Comput. 35, 12 Dec. 2009 (12 2009).
[34]
Jeremiah James Willcock, Torsten Hoefler, Nicholas Gerard Edmonds, and Andrew Lumsdaine. 2010. AM++: A Generalized Active Message Framework. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT '10). ACM, New York, NY, USA, 401--410.
[35]
X. Zhao, D. Buntinas, J. Zounmevo, J. Dinan, D. Goodell, P. Balaji, R. Thakur, A. Afsahi, and W. Gropp. 2013. Toward Asynchronous and MPI-Interoperable Active Messages. In 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing. 87--94.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '19: Proceedings of the 26th European MPI Users' Group Meeting
September 2019
134 pages
ISBN:9781450371759
DOI:10.1145/3343211
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 September 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. MPI sessions
  2. hybrid MPI
  3. nonblocking collectives
  4. progress
  5. thread-based MPI

Qualifiers

  • Research-article

Conference

EuroMPI 2019
EuroMPI 2019: 26th European MPI Users' Group Meeting
September 11 - 13, 2019
Zürich, Switzerland

Acceptance Rates

EuroMPI '19 Paper Acceptance Rate 13 of 26 submissions, 50%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media