skip to main content
10.1145/2802658.2802669acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

An MPI Halo-Cell Implementation for Zero-Copy Abstraction

Published: 21 September 2015 Publication History

Abstract

In the race for Exascale, the advent of many-core processors will bring a shift in parallel computing architectures to systems of much higher concurrency, but with a relatively smaller memory per thread. This shift raises concerns for the adaptability of HPC software, for the current generation to the brave new world. In this paper, we study domain splitting on an increasing number of memory areas as an example problem where negative performance impact on computation could arise. We identify the specific parameters that drive scalability for this problem, and then model the halo-cell ratio on common mesh topologies to study the memory and communication implications. Such analysis argues for the use of shared-memory parallelism, such as with OpenMP, to address the performance problems that could occur. In contrast, we propose an original solution based entirely on MPI programming semantics, while providing the performance advantages of hybrid parallel programming. Our solution transparently replaces halo-cells transfers with pointer exchanges when MPI tasks are running on the same node, effectively removing memory copies. The results we present demonstrate gains in terms of memory and computation time on Xeon Phi (compared to OpenMP-only and MPI-only) using a representative domain decomposition benchmark.

References

[1]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system, volume 30. ACM, 1995.
[2]
D. Buntinas, G. Mercier, and W. Gropp. Data transfers between processes in an SMP system: Performance study and application to MPI. In Parallel Processing, 2006. ICPP 2006. International Conference on. IEEE, 2006.
[3]
D. Buntinas, G. Mercier, and W. Gropp. Design and evaluation of nemesis, a scalable, low-latency, message-passing communication subsystem. In Cluster Computing and the Grid, 2006. CCGRID 06. Sixth IEEE International Symposium on, volume 1. IEEE, 2006.
[4]
B. Chapman, T. Curtis, S. Pophale, S. Poole, J. Kuehn, C. Koelbel, and L. Smith. Introducing OpenSHMEM: SHMEM for the PGAS community. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. ACM, 2010.
[5]
L. Dagum and R. Menon. OpenMP: an industry standard API for shared-memory programming. Computational Science & Engineering, IEEE, 5(1), 1998.
[6]
B. Dalton, G. Tanase, M. Alvanos, G. Almasi, and E. Tiotto. Memory Management Techniques for Exploiting RDMA in PGAS Languages. In Workshop on Languages, Compilers and Parallel Computing, 2014.
[7]
J. Dongarra et al. The international exascale software project roadmap. International Journal of High Performance Computing Applications, 2011.
[8]
A. Friedley, G. Bronevetsky, T. Hoefler, and A. Lumsdaine. Hybrid MPI: efficient message passing for multi-core systems. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. ACM, 2013.
[9]
A. Friedley, T. Hoefler, G. Bronevetsky, A. Lumsdaine, and C.-C. Ma. Ownership passing: efficient distributed memory programming on multi-core systems. In ACM SIGPLAN Notices, volume 48, pages 177--186. ACM, 2013.
[10]
B. Goglin and S. Moreaud. KNEM: A generic and scalable kernel-assisted intra-node mpi communication framework. Journal of Parallel and Distributed Computing, 73(2), 2013.
[11]
J. Jose, M. Luo, S. Sur, and D. K. Panda. Unifying UPC and MPI runtimes: experience with MVAPICH. In Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model. ACM, 2010.
[12]
H. Kamal and A. Wagner. FG-MPI: Fine-grain MPI for multicore and clusters. In Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on, April 2010.
[13]
J. Nieplocha, R. J. Harrison, and R. J. Littlefield. Global arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing, 10(2), 1996.
[14]
R. W. Numrich and J. Reid. Co-Array Fortran for parallel programming. In ACM Sigplan Fortran Forum, volume 17. ACM, 1998.
[15]
M. Pérache, H. Jourdren, and R. Namyst. MPC: A Unified Parallel Runtime for Clusters of NUMA machines. In Proceedings of the 14th International Euro-Par Conference on Parallel Processing, Euro-Par '08, Berlin, Heidelberg, 2008. Springer-Verlag.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EuroMPI '15: Proceedings of the 22nd European MPI Users' Group Meeting
September 2015
149 pages
ISBN:9781450337953
DOI:10.1145/2802658
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Conseil Régional d'Aquitaine
  • Communauté Urbaine de Bordeaux
  • INRIA: INRIA Rhône-Alpes

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ghost-Cells
  2. MPI
  3. MPI_Halo
  4. Zero-Copy
  5. memory

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroMPI '15
EuroMPI '15: The 22nd European MPI Users' Group Meeting
September 21 - 23, 2015
Bordeaux, France

Acceptance Rates

EuroMPI '15 Paper Acceptance Rate 14 of 29 submissions, 48%;
Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)2
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)To Share or Not to Share: A Case for MPI in Shared-MemoryRecent Advances in the Message Passing Interface10.1007/978-3-031-73370-3_6(89-102)Online publication date: 25-Sep-2024
  • (2020)Hardware Locality-Aware Partitioning and Dynamic Load-Balancing of Unstructured Meshes for Large-Scale Scientific ApplicationsProceedings of the Platform for Advanced Scientific Computing Conference10.1145/3394277.3401851(1-10)Online publication date: 29-Jun-2020
  • (2019)Interoperability strategies for GASPI and MPI in large-scale scientific applicationsInternational Journal of High Performance Computing Applications10.1177/109434201880835933:3(554-568)Online publication date: 1-May-2019
  • (2019)Mixing ranks, tasks, progress and nonblocking collectivesProceedings of the 26th European MPI Users' Group Meeting10.1145/3343211.3343221(1-10)Online publication date: 11-Sep-2019
  • (2019)Unifying the Analysis of Performance Event Streams at the Consumer Interface LevelTools for High Performance Computing 201710.1007/978-3-030-11987-4_4(57-71)Online publication date: 15-Feb-2019
  • (2016)Efficient parallelization of MATLAB stencil applications for multi-core clustersProceedings of the Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for HPC10.5555/3019129.3019132(20-29)Online publication date: 13-Nov-2016
  • (2016)Introducing Task-Containers as an Alternative to Runtime-StackingProceedings of the 23rd European MPI Users' Group Meeting10.1145/2966884.2966910(51-63)Online publication date: 25-Sep-2016
  • (2016)Efficient Parallelization of MATLAB Stencil Applications for Multi-core Clusters2016 Sixth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)10.1109/WOLFHPC.2016.07(20-29)Online publication date: Nov-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media