article

High performance computing using MPI and OpenMP on multi-core parallel systems

Authors:

Dennis Jespersen,

Piyush Mehrotra,

Barbara ChapmanAuthors Info & Claims

Parallel Computing, Volume 37, Issue 9

Pages 562 - 575

https://doi.org/10.1016/j.parco.2011.02.002

Published: 01 September 2011 Publication History

Abstract

The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.

References

[1]

E. Allen, D. Chase, C. Flood, V. Luchangco, J.-W. Maessen, S. Ryu, G.L. Steele, Project Fortress: A Multicore Language for Multicore Processors. Linux Magazine, 2007.

[2]

D. Bailey, T. Harris, W. Saphir, R. Van der Wijngaart, A. Woo, M. Yarrow, The NAS Parallel Benchmarks 2.0, Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA, 1995.

[3]

S. Benkner, T. Brandes, Exploiting data locality on scalable shared memory machines with data parallel programs, in: Proceedings of the Euro-Par 2000 Parallel Processing, Munich, Germany, 2000, pp. 647-657.

[4]

Berger, M.J., Aftosmis, M.J., Marshall, D.D. and Murman, S.M., Performance of a new CFD flow solver using a hybrid programming paradigm. Journal of Parallel and Distributed Computing. v65 i4. 414-423.

[5]

J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C.A. Nelson, C.D. Offner, Extending OpenMP for NUMA machines, in: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, Dallas, TX, 2000.

[6]

Chamberlain, B.L., Callahan, D. and Zima, H.P., Parallel programmability and the chapel language. International Journal of High Performance Computing Applications. v21 i3. 291-312.

[7]

P. Charles, C. Donawa, K. Ebcioglu, C. Grotho, A. Kielstra, V. Saraswat, V. Sarkar, C.V. Praun, X10: an object-oriented approach to non-uniform cluster computing, in: Proceedings of the 20th ACM SIGPLAN Conference on Object-oriented Programing, Systems, Languages, and Applications, ACM SIGPLAN, 2005, pp. 519-538.

Digital Library

[8]

Diaconescu, R. and Zima, H., An approach to data distributions in chapel. Int. J. High Perform. Comput. Appl. v21 i3. 313-335.

[9]

J. Djomehri, D. Jespersen, J. Taft, H. Jin, R. Hood, P. Mehrotra, Performance of CFD applications on NASA supercomputers, in Proceedings of the International Conference on Parallel Computational Fluid Dynamics, 2009, pp. 240-244.

[10]

R.L. Graham, G. Bosilca, MPI forum: preview of the MPI 3 standard. SC09 Birds-of-Feather Session, 2009. Available from: <http://www.open-mpi.org/papers/sc-2009/MPI_Forum_SC09_BOF-2up.pdf>.

[11]

Hah, C. and Wennerstrom, A.J., Three-dimensional flow fields inside a transonic compressor with swept blades. ASME Journal of Turbomachinery. v113 i1. 241-251.

[12]

L. Huang, H. Jin, Liqi Yi, B. Chapman, Enabling locality-aware computations in OpenMP, Scientific Programming 18 (3-4) (2010) 169-181 (special issue).

Digital Library

[13]

Hochstein, L. and Basili, V.R., The ASC-alliance projects: a case study of large-scale parallel scientific code development. Computer. v41 i3. 50-58.

[14]

L. Hochstein, F. Shull, L.B. Reid, The role of MPI in development time: a case study, in: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Austin, Texas, 2008.

[15]

Jin, H. and Van der Wijgnaart, R.F., Performance characteristics of the multi-zone NAS parallel benchmarks. Journal of Parallel and Distributed Computing. v66. 674-685.

[16]

H. Jin, R. Hood, P. Mehotra, A practical study of UPC with the NAS parallel benchmarks, in: Proceedings of the 3rd Conference on Partitioned Global Address Space (PGAS) Programming Models, Ashburn, VA, 2009.

Digital Library

[17]

D. Kaushik, S. Balay, D. Keyes, B. Smith, Understanding the performance of hybrid MPI/ OpenMP programming model for implicit CFD codes, in: Proceedings of the 21st International Conference on Parallel Computational Fluid Dynamics, Moffett Field, CA, USA, May 18-22, 2009, pp. 174-177.

[18]

A. Kleen, An NUMA API for Linux, SUSE Labs, 2004. Available from: <http://www.halobates.de/numaapi3.pdf>.

[19]

G. Krawezik, F. Cappello, Performance comparison of MPI and three OpenMP programming styles on shared memory Multiprocessors, in: Proceedings of the 15th Annual ACM Symposium on Parallel Algorithms and Architectures, San Diego, CA, 2003, pp. 118-127.

[20]

R.H. Nichols, R.W. Tramel, P.G. Buning, Solver and turbulence model upgrades to OVERFLOW 2 for unsteady and high-speed applications, in: Proceedings of the 24th Applied Aerodynamics Conference, volume AIAA-2006-2824, 2006.

[21]

Numrich, R. and Reid, J., Co-array fortran for parallel programming. In ACM Fortran Forum. v17. 1-31.

[22]

OpenMP Architecture Review Board, OpenMP Application Program Interface 3.0, 2008. Available from: <http://www.openmp.org/>.

[23]

The OpenUH compiler project. Available from:<http://www.cs.uh.edu/openuh>.

[24]

Pleiades Hardware. Available from: <http://www.nas.nasa.gov/Resources/Systems/pleiades.html>.

[25]

R. Rabenseifner, G. Hager, G. Jost, Tutorial on hybrid MPI and OpenMP parallel programing, in: Supercomputing Conference 2009 (SC09), Portland, OR, 2009.

[26]

R. Rabenseifner, G. Hager, G. Jost, Hybrid MPI/OpenMP parallel programing on clusters of multi-core SMP nodes, in: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2009), Weimar, Germany, 2009, pp. 427-436.

Digital Library

[27]

S. Saini, D. Talcott, D. Jespersen, J. Djomehri, H. Jin, R. Biswas, Scientific application-based performance comparison of SGI Altix 4700, IBM Power5+, and SGI ICE 8200 supercomputers, in: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008.

Digital Library

[28]

H. Shan, H. Jin, K. Fuerlinger, A. Koniges, N. Wright, Analyzing the Performance Effects of Programming Models and Memory Usage on Cray XT5 Platforms, Cray User Group (CUG) meeting, Edingburgh, United Kingdom, 2010.

[29]

Silicon Graphics, Inc., MIPSpro (TM) Power Fortran 77 Programmer's Guide, Document 007-2361, SGI, 1999.

[30]

The Top500 List of Supercomputer Sites. Available from: <http://www.top500.org/lists/>.

[31]

The UPC Consortium, UPC Language Specification (V1.2), 2005. Available from: <http://www.upc.gwu.edu/documentation.html>.

[32]

R.F. Van der Wijgnaart, H. Jin, The NAS Parallel Benchmarks, Multi-Zone Versions. Technical Report NAS-03-010, NASA Ames Research Center, Moffett Field, CA, 2003.

[33]

K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, A. Aiken, Titanium: a high-performance java dialect, in: Proceedings of ACM 1998 Workshop on Java for High-Performance Network Computing, 1998.

Cited By

Masciari ENapolitano E(2024)Sustainability and High Performance ComputingInformation Integration and Web Intelligence10.1007/978-3-031-78093-6_21(237-242)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/978-3-031-78093-6_21
Zambre RChandramowlishwaran AWolf FShende SCulhane CAlam SJagode H(2022)Lessons learned on MPI+threads communicationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571987(1-16)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571987
Wan YHe LZhang YZhao ZLiu JZhang H(2022)An efficient communication strategy for massively parallel computation in CFDThe Journal of Supercomputing10.1007/s11227-022-04940-379:7(7560-7583)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s11227-022-04940-3
Show More Cited By

High performance computing using MPI and OpenMP on multi-core parallel systems

Recommendations

Comparative analysis of OpenMP and MPI on multi-core architecture
ANSS '11: Proceedings of the 44th Annual Simulation Symposium

The trend of processors today is with having multi-core processors. The benefit of having many cores in one processor is a huge performance gain with parallel computing. However, programmers are faced with a difficult decision on which programming model ...
Performance evaluation of a multi-zone application in different OpenMP approaches

We describe a performance study of a multi-zone application benchmark implemented in several OpenMP approaches that exploit multi-level parallelism and deal with unbalanced workload. The multi-zone application was derived from the well-known NAS ...
Performance Evaluation of OpenMP and MPI Hybrid Programs on a Large Scale Multi-core Multi-socket Cluster, T2K Open Supercomputer
ICPPW '09: Proceedings of the 2009 International Conference on Parallel Processing Workshops

Non-uniform memory access (NUMA) systems, where each processor has its own memory, have been popular platform in high-end computing. While some early studies had reported that a flat-MPI programming model outperformed an OpenMP/MPI hybrid programming ...

Comments

Information & Contributors

Information

Published In

cover image Parallel Computing

Parallel Computing Volume 37, Issue 9

September, 2011

155 pages

ISSN:0167-8191

Issue’s Table of Contents

Copyright © © 2011.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 September 2011

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

27
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Masciari ENapolitano E(2024)Sustainability and High Performance ComputingInformation Integration and Web Intelligence10.1007/978-3-031-78093-6_21(237-242)Online publication date: 1-Dec-2024
https://dl.acm.org/doi/10.1007/978-3-031-78093-6_21
Zambre RChandramowlishwaran AWolf FShende SCulhane CAlam SJagode H(2022)Lessons learned on MPI+threads communicationProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571987(1-16)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571987
Wan YHe LZhang YZhao ZLiu JZhang H(2022)An efficient communication strategy for massively parallel computation in CFDThe Journal of Supercomputing10.1007/s11227-022-04940-379:7(7560-7583)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s11227-022-04940-3
Zhang KSu HDou Y(2021)Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architecturesThe Journal of Supercomputing10.1007/s11227-021-03823-377:11(13584-13600)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1007/s11227-021-03823-3
Wang DLei YZhou J(2021)Hybrid MPI/OpenMP parallel asynchronous distributed alternating direction method of multipliersComputing10.1007/s00607-021-00968-0103:12(2737-2762)Online publication date: 1-Dec-2021
https://dl.acm.org/doi/10.1007/s00607-021-00968-0
Basu KGhoting AMazumder RPan YDaumé HSingh A(2020)ECLIPSEProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525004(704-714)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525004
Wang HChandramowlishwaran ACuicchi CQualters IKramer W(2020)PencilProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433814(1-16)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433814
Zambre RChandramowliswharan ABalaji PAyguadé EHwu WBadia RHofstee H(2020)How I learned to stop worrying about user-visible endpoints and love MPIProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392773(1-13)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3392717.3392773
Groth SGrünewald DTeich JHannig FPalesi MPalermo GGraves CArima E(2020)A runtime system for finite element methods in a partitioned global address spaceProceedings of the 17th ACM International Conference on Computing Frontiers10.1145/3387902.3392628(39-48)Online publication date: 11-May-2020
https://dl.acm.org/doi/10.1145/3387902.3392628
Heldens SHijma PWerkhoven BMaassen JBelloum AVan Nieuwpoort R(2020)The Landscape of Exascale ResearchACM Computing Surveys10.1145/337239053:2(1-43)Online publication date: 20-Mar-2020
https://dl.acm.org/doi/10.1145/3372390
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents