research-article

Introducing Task-Containers as an Alternative to Runtime-Stacking

Authors:

Jean-Baptiste Besnard,

Patrick Carribault,

Allen D. MaloneyAuthors Info & Claims

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

Pages 51 - 63

https://doi.org/10.1145/2966884.2966910

Published: 25 September 2016 Publication History

Abstract

The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that MPI will have to evolve in order to exploit shared-memory parallelism, either by collaborating with other programming models (MPI+X) or by introducing new shared-memory approaches. This paper considers extensions to C and C++ to make it possible for MPI Processes to run into threads. More generally, a thread-local storage (TLS) library is developed to simplify the collocation of arbitrary tasks and services in a shared-memory context called a task-container. The paper discusses how such containers simplify model and service mixing at the OS process level, eventually easing the collocation of arbitrary tasks with MPI processes in a runtime agnostic fashion, opening alternatives to runtime stacking.

References

[1]

D. C. Arnold, D. H. Ahn, B. R. de Supinski, G. L. Lee, B. P. Miller, and M. Schulz. Stack trace analysis for large scale debugging. In 2007 IEEE International Parallel and Distributed Processing Symposium, pages 1--10, March 2007.

[2]

J.-B. Besnard, A. Malony, S. Shende, M. Pérache, P. Carribault, and J. Jaeger. An mpi halo-cell implementation for zero-copy abstraction. In Proceedings of the 22Nd European MPI Users' Group Meeting, EuroMPI '15, pages 3:1--3:9, New York, NY, USA, 2015. ACM.

Digital Library

[3]

J. B. Besnard, M. Pérache, and W. Jalby. Event streaming for online performance measurements reduction. In 2013 42nd International Conference on Parallel Processing, pages 985--994, Oct 2013.

Digital Library

[4]

P. Carribault, M. Pérache, and H. Jourdren. Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC, pages 1--14. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.

Digital Library

[5]

CEA/ParaTools. MPC Website. http://mpc.hpcframework.paratools.com/, 2016.

[6]

B. Chapman, T. Curtis, S. Pophale, S. Poole, J. Kuehn, C. Koelbel, and L. Smith. Introducing openshmem: Shmem for the pgas community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS '10, pages 2:1--2:3, New York, NY, USA, 2010. ACM.

Digital Library

[7]

L. Dagum and R. Menon. OpenMP: An Industry Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering, 5(1):46--55, 1998.

Digital Library

[8]

J. Dinan, P. Balaji, D. Goodell, D. Miller, M. Snir, and R. Thakur. Enabling mpi interoperability through flexible communication endpoints. In Proceedings of the 20th European MPI Users' Group Meeting, EuroMPI '13, pages 13--18, New York, NY, USA, 2013. ACM.

Digital Library

[9]

J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, J.-C. Andre, D. Barkai, J.-Y. Berthou, T. Boku, B. Braunschweig, F. Cappello, B. Chapman, X. Chi, A. Choudhary, S. Dosanjh, T. Dunning, S. Fiore, A. Geist, W. Gropp, R. Harrison, M. Hereld, M. Heroux, A. Hoisie, K. Hotta, Z. Jin, Y. Ishikawa, F. Johnson, S. Kale, R. Kenway, D. Keyes, B. Kramer, J. Labarta, A. Lichnewsky, T. Lippert, B. Lucas, B. Maccabe, S. Matsuoka, P. Messina, P. Michielse, B. Mohr, M. S. Mueller, W. E. Nagel, H. Nakashima, M. E. Papka, D. Reed, M. Sato, E. Seidel, J. Shalf, D. Skinner, M. Snir, T. Sterling, R. Stevens, F. Streitz, B. Sugar, S. Sumimoto, W. Tang, J. Taylor, R. Thakur, A. Trefethen, M. Valero, A. van der Steen, J. Vetter, P. Williams, R. Wisniewski, and K. Yelick. The international exascale software project roadmap. International Journal of High Performance Computing Applications, 25(1):3--60, 2011.

Digital Library

[10]

M. Dorier, M. Dreher, T. Peterka, J. M. Wozniak, G. Antoniu, and B. Raffin. Lessons learned from building in situ coupling frameworks. In Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, ISAV2015, pages 19--24, New York, NY, USA, 2015. ACM.

Digital Library

[11]

U. Drepper. Elf handling for thread-local storage. Technical report, Technical report, Red Hat, Inc. http://people.redhat.com/drepper/tls.pdf, 2013.

[12]

A. Friedley, G. Bronevetsky, T. Hoefler, and A. Lumsdaine. Hybrid mpi: Efficient message passing for multi-core systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, pages 18:1--18:11, New York, NY, USA, 2013. ACM.

Digital Library

[13]

A. Friedley, T. Hoefler, G. Bronevetsky, A. Lumsdaine, and C.-C. Ma. Ownership passing: Efficient distributed memory programming on multi-core systems. SIGPLAN Not., 48(8):177--186, Feb. 2013.

Digital Library

[14]

T. Hilbrich, J. Protze, M. Schulz, B. R. de Supinski, and M. S. Müller. Mpi runtime error detection with must: Advances in deadlock detection. Sci. Program., 21(3-4):109--121, July 2013.

Digital Library

[15]

T. Hoefler and A. Lumsdaine. Message progression in parallel computing - to thread or not to thread? In 2008 IEEE International Conference on Cluster Computing, pages 213--222, Sept 2008.

[16]

C. Huang, O. Lawlor, and L. V. Kale. Adaptive MPI, pages 306--322. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.

[17]

Intel. User and Reference Guide for the Intel C++ Compiler 14.0. https://software.intel.com/en-us/node/513001, 2014.

[18]

I. ISO. Iec 9899: 2011 information technology-programming languages-c. International Organization for Standardization, Geneva, Switzerland, 27:59, 2011.

[19]

J. Jeffers and J. Reinders. Intel Xeon Phi coprocessor high-performance programming. Newnes, 2013.

Digital Library

[20]

R. Latham, W. Gropp, R. Ross, and R. Thakur. Extending the MPI-2 Generalized Request Interface, pages 223--232. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.

Digital Library

[21]

N. Liu, J. Cope, P. H. Carns, C. D. Carothers, R. B. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In Proceedings of MSST/SNAPI 2012, Pacific Grove, CA, 04/2012 2012.

[22]

S. Negara, G. Zheng, K.-C. Pan, N. Negara, R. E. Johnson, L. V. Kalé, and P. M. Ricker. Automatic MPI to AMPI Program Transformation Using Photran, pages 531--539. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.

Digital Library

[23]

M. Pérache, H. Jourdren, and R. Namyst. MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, pages 78--88. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.

Digital Library

[24]

H. Sankaranarayanan and P. A. Kulkarni. Source-to-source refactoring and elimination of global variables in c programs. Journal of Software Engineering and Applications, 2013.

[25]

J. E. Stone, D. Gohara, and G. Shi. Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science and Engineering, 12(3):66--73, 2010.

Digital Library

[26]

M. Tchiboukdjian, P. Carribault, and M. Pérache. Hierarchical local storage: Exploiting flexible user-data sharing between mpi tasks. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 366--377, May 2012.

Digital Library

[27]

P. Tu and D. Padua. Compiler optimizations for scalable parallel systems. In S. Pande and D. P. Agrawal, editors, Compiler optimizations for scalable parallel systems, chapter Automatic Array Privatization, pages 247--281. Springer-Verlag New York, Inc., New York, NY, USA, 2001.

Digital Library

[28]

S. Wienke, P. Springer, C. Terboven, and D. an Mey. OpenACC --- First Experiences with Real-World Applications, pages 859--870. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.

Digital Library

[29]

G. Zheng, S. Negara, C. L. Mendes, L. V. Kale, and E. R. Rodrigues. Automatic handling of global variables for multi-threaded mpi programs. In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems, ICPADS '11, pages 220--227, Washington, DC, USA, 2011. IEEE Computer Society.

Digital Library

Cited By

Adam JBesnard JRoussel AJaeger JCarribault PPérache M(2024)To Share or Not to Share: A Case for MPI in Shared-MemoryRecent Advances in the Message Passing Interface10.1007/978-3-031-73370-3_6(89-102)Online publication date: 25-Sep-2024
https://doi.org/10.1007/978-3-031-73370-3_6
Taboada HPereira RJaeger JBesnard J(2023)Towards Achieving Transparent Malleability Thanks to MPI Process VirtualizationHigh Performance Computing10.1007/978-3-031-40843-4_3(28-41)Online publication date: 21-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-40843-4_3
Besnard JShende SMalony AJaeger JPerache M(2022)Enabling Global MPI Process Addressing in MPI ApplicationsProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555829(27-36)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555829
Show More Cited By

Recommendations

Mixing ranks, tasks, progress and nonblocking collectives
EuroMPI '19: Proceedings of the 26th European MPI Users' Group Meeting

Since the beginning, MPI has defined the rank as an implicit attribute associated with the MPI process' environment. In particular, each MPI process generally runs inside a given UNIX process and is associated with a fixed identifier in its WORLD ...
Associative Parallel Containers in STAPL
Languages and Compilers for Parallel Computing

The Standard Template Adaptive Parallel Library (<Emphasis Type="SmallCaps">stapl</Emphasis>) is a parallel programming framework that extends C++ and <Emphasis Type="SmallCaps">stl</Emphasis>with support for parallelism. <Emphasis Type="SmallCaps">...
Preserving the original MPI semantics in a virtualized processor environment

Processor virtualization is a technique in which a programmer divides a computation into many entities, which are mapped to the available processors. The number of these entities, referred to as virtual processors, is typically larger than the number of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting

September 2016

225 pages

ISBN:9781450342346

DOI:10.1145/2966884

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EuroMPI 2016

EuroMPI 2016: The 23rd European MPI Users' Group Meeting

September 25 - 28, 2016

Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
96
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Adam JBesnard JRoussel AJaeger JCarribault PPérache M(2024)To Share or Not to Share: A Case for MPI in Shared-MemoryRecent Advances in the Message Passing Interface10.1007/978-3-031-73370-3_6(89-102)Online publication date: 25-Sep-2024
https://doi.org/10.1007/978-3-031-73370-3_6
Taboada HPereira RJaeger JBesnard J(2023)Towards Achieving Transparent Malleability Thanks to MPI Process VirtualizationHigh Performance Computing10.1007/978-3-031-40843-4_3(28-41)Online publication date: 21-May-2023
https://dl.acm.org/doi/10.1007/978-3-031-40843-4_3
Besnard JShende SMalony AJaeger JPerache M(2022)Enabling Global MPI Process Addressing in MPI ApplicationsProceedings of the 29th European MPI Users' Group Meeting10.1145/3555819.3555829(27-36)Online publication date: 14-Sep-2022
https://dl.acm.org/doi/10.1145/3555819.3555829
Ramos EWhite SBhosale AKale L(2022)Runtime Techniques for Automatic Process VirtualizationWorkshop Proceedings of the 51st International Conference on Parallel Processing10.1145/3547276.3548522(1-10)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3547276.3548522
Milewicz RPirkelbauer PSoundararajan PAhmed HSkjellum T(2021)Negative Perceptions About the Applicability of Source-to-Source Compilers in HPC: A Literature ReviewHigh Performance Computing10.1007/978-3-030-90539-2_16(233-246)Online publication date: 24-Jun-2021
https://dl.acm.org/doi/10.1007/978-3-030-90539-2_16
Besnard JJaeger JMalony AShende STaboada HPérache MCarribault P(2019)Mixing ranks, tasks, progress and nonblocking collectivesProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343221(1-10)Online publication date: 11-Sep-2019
https://dl.acm.org/doi/10.1145/3343211.3343221
Adam JBesnard JMalony AShende SPérache MCarribault PJaeger J(2018)Transparent High-Speed Network Checkpoint/Restart in MPIProceedings of the 25th European MPI Users' Group Meeting10.1145/3236367.3236383(1-11)Online publication date: 23-Sep-2018
https://dl.acm.org/doi/10.1145/3236367.3236383
White SKale L(2018)Optimizing point‐to‐point communication between adaptive MPI endpoints in shared memoryConcurrency and Computation: Practice and Experience10.1002/cpe.446732:3Online publication date: 12-Mar-2018
https://doi.org/10.1002/cpe.4467

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents