skip to main content
10.1145/2966884.2966910acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

Introducing Task-Containers as an Alternative to Runtime-Stacking

Published: 25 September 2016 Publication History

Abstract

The advent of many-core architectures poses new challenges to the MPI programming model which has been designed for distributed memory message passing. It is now clear that MPI will have to evolve in order to exploit shared-memory parallelism, either by collaborating with other programming models (MPI+X) or by introducing new shared-memory approaches. This paper considers extensions to C and C++ to make it possible for MPI Processes to run into threads. More generally, a thread-local storage (TLS) library is developed to simplify the collocation of arbitrary tasks and services in a shared-memory context called a task-container. The paper discusses how such containers simplify model and service mixing at the OS process level, eventually easing the collocation of arbitrary tasks with MPI processes in a runtime agnostic fashion, opening alternatives to runtime stacking.

References

[1]
D. C. Arnold, D. H. Ahn, B. R. de Supinski, G. L. Lee, B. P. Miller, and M. Schulz. Stack trace analysis for large scale debugging. In 2007 IEEE International Parallel and Distributed Processing Symposium, pages 1--10, March 2007.
[2]
J.-B. Besnard, A. Malony, S. Shende, M. Pérache, P. Carribault, and J. Jaeger. An mpi halo-cell implementation for zero-copy abstraction. In Proceedings of the 22Nd European MPI Users' Group Meeting, EuroMPI '15, pages 3:1--3:9, New York, NY, USA, 2015. ACM.
[3]
J. B. Besnard, M. Pérache, and W. Jalby. Event streaming for online performance measurements reduction. In 2013 42nd International Conference on Parallel Processing, pages 985--994, Oct 2013.
[4]
P. Carribault, M. Pérache, and H. Jourdren. Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC, pages 1--14. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010.
[5]
CEA/ParaTools. MPC Website. http://mpc.hpcframework.paratools.com/, 2016.
[6]
B. Chapman, T. Curtis, S. Pophale, S. Poole, J. Kuehn, C. Koelbel, and L. Smith. Introducing openshmem: Shmem for the pgas community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS '10, pages 2:1--2:3, New York, NY, USA, 2010. ACM.
[7]
L. Dagum and R. Menon. OpenMP: An Industry Standard API for Shared-Memory Programming. IEEE Computational Science and Engineering, 5(1):46--55, 1998.
[8]
J. Dinan, P. Balaji, D. Goodell, D. Miller, M. Snir, and R. Thakur. Enabling mpi interoperability through flexible communication endpoints. In Proceedings of the 20th European MPI Users' Group Meeting, EuroMPI '13, pages 13--18, New York, NY, USA, 2013. ACM.
[9]
J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, J.-C. Andre, D. Barkai, J.-Y. Berthou, T. Boku, B. Braunschweig, F. Cappello, B. Chapman, X. Chi, A. Choudhary, S. Dosanjh, T. Dunning, S. Fiore, A. Geist, W. Gropp, R. Harrison, M. Hereld, M. Heroux, A. Hoisie, K. Hotta, Z. Jin, Y. Ishikawa, F. Johnson, S. Kale, R. Kenway, D. Keyes, B. Kramer, J. Labarta, A. Lichnewsky, T. Lippert, B. Lucas, B. Maccabe, S. Matsuoka, P. Messina, P. Michielse, B. Mohr, M. S. Mueller, W. E. Nagel, H. Nakashima, M. E. Papka, D. Reed, M. Sato, E. Seidel, J. Shalf, D. Skinner, M. Snir, T. Sterling, R. Stevens, F. Streitz, B. Sugar, S. Sumimoto, W. Tang, J. Taylor, R. Thakur, A. Trefethen, M. Valero, A. van der Steen, J. Vetter, P. Williams, R. Wisniewski, and K. Yelick. The international exascale software project roadmap. International Journal of High Performance Computing Applications, 25(1):3--60, 2011.
[10]
M. Dorier, M. Dreher, T. Peterka, J. M. Wozniak, G. Antoniu, and B. Raffin. Lessons learned from building in situ coupling frameworks. In Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization, ISAV2015, pages 19--24, New York, NY, USA, 2015. ACM.
[11]
U. Drepper. Elf handling for thread-local storage. Technical report, Technical report, Red Hat, Inc. http://people.redhat.com/drepper/tls.pdf, 2013.
[12]
A. Friedley, G. Bronevetsky, T. Hoefler, and A. Lumsdaine. Hybrid mpi: Efficient message passing for multi-core systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '13, pages 18:1--18:11, New York, NY, USA, 2013. ACM.
[13]
A. Friedley, T. Hoefler, G. Bronevetsky, A. Lumsdaine, and C.-C. Ma. Ownership passing: Efficient distributed memory programming on multi-core systems. SIGPLAN Not., 48(8):177--186, Feb. 2013.
[14]
T. Hilbrich, J. Protze, M. Schulz, B. R. de Supinski, and M. S. Müller. Mpi runtime error detection with must: Advances in deadlock detection. Sci. Program., 21(3-4):109--121, July 2013.
[15]
T. Hoefler and A. Lumsdaine. Message progression in parallel computing - to thread or not to thread? In 2008 IEEE International Conference on Cluster Computing, pages 213--222, Sept 2008.
[16]
C. Huang, O. Lawlor, and L. V. Kale. Adaptive MPI, pages 306--322. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.
[17]
Intel. User and Reference Guide for the Intel C++ Compiler 14.0. https://software.intel.com/en-us/node/513001, 2014.
[18]
I. ISO. Iec 9899: 2011 information technology-programming languages-c. International Organization for Standardization, Geneva, Switzerland, 27:59, 2011.
[19]
J. Jeffers and J. Reinders. Intel Xeon Phi coprocessor high-performance programming. Newnes, 2013.
[20]
R. Latham, W. Gropp, R. Ross, and R. Thakur. Extending the MPI-2 Generalized Request Interface, pages 223--232. Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
[21]
N. Liu, J. Cope, P. H. Carns, C. D. Carothers, R. B. Ross, G. Grider, A. Crume, and C. Maltzahn. On the role of burst buffers in leadership-class storage systems. In Proceedings of MSST/SNAPI 2012, Pacific Grove, CA, 04/2012 2012.
[22]
S. Negara, G. Zheng, K.-C. Pan, N. Negara, R. E. Johnson, L. V. Kalé, and P. M. Ricker. Automatic MPI to AMPI Program Transformation Using Photran, pages 531--539. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011.
[23]
M. Pérache, H. Jourdren, and R. Namyst. MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, pages 78--88. Springer Berlin Heidelberg, Berlin, Heidelberg, 2008.
[24]
H. Sankaranarayanan and P. A. Kulkarni. Source-to-source refactoring and elimination of global variables in c programs. Journal of Software Engineering and Applications, 2013.
[25]
J. E. Stone, D. Gohara, and G. Shi. Opencl: A parallel programming standard for heterogeneous computing systems. Computing in Science and Engineering, 12(3):66--73, 2010.
[26]
M. Tchiboukdjian, P. Carribault, and M. Pérache. Hierarchical local storage: Exploiting flexible user-data sharing between mpi tasks. In Parallel Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International, pages 366--377, May 2012.
[27]
P. Tu and D. Padua. Compiler optimizations for scalable parallel systems. In S. Pande and D. P. Agrawal, editors, Compiler optimizations for scalable parallel systems, chapter Automatic Array Privatization, pages 247--281. Springer-Verlag New York, Inc., New York, NY, USA, 2001.
[28]
S. Wienke, P. Springer, C. Terboven, and D. an Mey. OpenACC --- First Experiences with Real-World Applications, pages 859--870. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
[29]
G. Zheng, S. Negara, C. L. Mendes, L. V. Kale, and E. R. Rodrigues. Automatic handling of global variables for multi-threaded mpi programs. In Proceedings of the 2011 IEEE 17th International Conference on Parallel and Distributed Systems, ICPADS '11, pages 220--227, Washington, DC, USA, 2011. IEEE Computer Society.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '16: Proceedings of the 23rd European MPI Users' Group Meeting
September 2016
225 pages
ISBN:9781450342346
DOI:10.1145/2966884
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. In-Situ
  2. MPI+X
  3. Privatization
  4. Thread-Based MPI

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroMPI 2016
EuroMPI 2016: The 23rd European MPI Users' Group Meeting
September 25 - 28, 2016
Edinburgh, United Kingdom

Acceptance Rates

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media