skip to main content
article

High performance computing using MPI and OpenMP on multi-core parallel systems

Published: 01 September 2011 Publication History

Abstract

The rapidly increasing number of cores in modern microprocessors is pushing the current high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems - distributed memory across nodes and shared memory with non-uniform memory access within each node - poses a challenge to application developers. In this paper, we study a hybrid approach to programming such systems - a combination of two traditional programming models, MPI and OpenMP. We present the performance of standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applications using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures.

References

[1]
E. Allen, D. Chase, C. Flood, V. Luchangco, J.-W. Maessen, S. Ryu, G.L. Steele, Project Fortress: A Multicore Language for Multicore Processors. Linux Magazine, 2007.
[2]
D. Bailey, T. Harris, W. Saphir, R. Van der Wijngaart, A. Woo, M. Yarrow, The NAS Parallel Benchmarks 2.0, Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA, 1995.
[3]
S. Benkner, T. Brandes, Exploiting data locality on scalable shared memory machines with data parallel programs, in: Proceedings of the Euro-Par 2000 Parallel Processing, Munich, Germany, 2000, pp. 647-657.
[4]
Berger, M.J., Aftosmis, M.J., Marshall, D.D. and Murman, S.M., Performance of a new CFD flow solver using a hybrid programming paradigm. Journal of Parallel and Distributed Computing. v65 i4. 414-423.
[5]
J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris, C.A. Nelson, C.D. Offner, Extending OpenMP for NUMA machines, in: Proceedings of the 2000 ACM/IEEE Conference on Supercomputing, Dallas, TX, 2000.
[6]
Chamberlain, B.L., Callahan, D. and Zima, H.P., Parallel programmability and the chapel language. International Journal of High Performance Computing Applications. v21 i3. 291-312.
[7]
P. Charles, C. Donawa, K. Ebcioglu, C. Grotho, A. Kielstra, V. Saraswat, V. Sarkar, C.V. Praun, X10: an object-oriented approach to non-uniform cluster computing, in: Proceedings of the 20th ACM SIGPLAN Conference on Object-oriented Programing, Systems, Languages, and Applications, ACM SIGPLAN, 2005, pp. 519-538.
[8]
Diaconescu, R. and Zima, H., An approach to data distributions in chapel. Int. J. High Perform. Comput. Appl. v21 i3. 313-335.
[9]
J. Djomehri, D. Jespersen, J. Taft, H. Jin, R. Hood, P. Mehrotra, Performance of CFD applications on NASA supercomputers, in Proceedings of the International Conference on Parallel Computational Fluid Dynamics, 2009, pp. 240-244.
[10]
R.L. Graham, G. Bosilca, MPI forum: preview of the MPI 3 standard. SC09 Birds-of-Feather Session, 2009. Available from: <http://www.open-mpi.org/papers/sc-2009/MPI_Forum_SC09_BOF-2up.pdf>.
[11]
Hah, C. and Wennerstrom, A.J., Three-dimensional flow fields inside a transonic compressor with swept blades. ASME Journal of Turbomachinery. v113 i1. 241-251.
[12]
L. Huang, H. Jin, Liqi Yi, B. Chapman, Enabling locality-aware computations in OpenMP, Scientific Programming 18 (3-4) (2010) 169-181 (special issue).
[13]
Hochstein, L. and Basili, V.R., The ASC-alliance projects: a case study of large-scale parallel scientific code development. Computer. v41 i3. 50-58.
[14]
L. Hochstein, F. Shull, L.B. Reid, The role of MPI in development time: a case study, in: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, Austin, Texas, 2008.
[15]
Jin, H. and Van der Wijgnaart, R.F., Performance characteristics of the multi-zone NAS parallel benchmarks. Journal of Parallel and Distributed Computing. v66. 674-685.
[16]
H. Jin, R. Hood, P. Mehotra, A practical study of UPC with the NAS parallel benchmarks, in: Proceedings of the 3rd Conference on Partitioned Global Address Space (PGAS) Programming Models, Ashburn, VA, 2009.
[17]
D. Kaushik, S. Balay, D. Keyes, B. Smith, Understanding the performance of hybrid MPI/ OpenMP programming model for implicit CFD codes, in: Proceedings of the 21st International Conference on Parallel Computational Fluid Dynamics, Moffett Field, CA, USA, May 18-22, 2009, pp. 174-177.
[18]
A. Kleen, An NUMA API for Linux, SUSE Labs, 2004. Available from: <http://www.halobates.de/numaapi3.pdf>.
[19]
G. Krawezik, F. Cappello, Performance comparison of MPI and three OpenMP programming styles on shared memory Multiprocessors, in: Proceedings of the 15th Annual ACM Symposium on Parallel Algorithms and Architectures, San Diego, CA, 2003, pp. 118-127.
[20]
R.H. Nichols, R.W. Tramel, P.G. Buning, Solver and turbulence model upgrades to OVERFLOW 2 for unsteady and high-speed applications, in: Proceedings of the 24th Applied Aerodynamics Conference, volume AIAA-2006-2824, 2006.
[21]
Numrich, R. and Reid, J., Co-array fortran for parallel programming. In ACM Fortran Forum. v17. 1-31.
[22]
OpenMP Architecture Review Board, OpenMP Application Program Interface 3.0, 2008. Available from: <http://www.openmp.org/>.
[23]
The OpenUH compiler project. Available from:<http://www.cs.uh.edu/openuh>.
[24]
Pleiades Hardware. Available from: <http://www.nas.nasa.gov/Resources/Systems/pleiades.html>.
[25]
R. Rabenseifner, G. Hager, G. Jost, Tutorial on hybrid MPI and OpenMP parallel programing, in: Supercomputing Conference 2009 (SC09), Portland, OR, 2009.
[26]
R. Rabenseifner, G. Hager, G. Jost, Hybrid MPI/OpenMP parallel programing on clusters of multi-core SMP nodes, in: Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP2009), Weimar, Germany, 2009, pp. 427-436.
[27]
S. Saini, D. Talcott, D. Jespersen, J. Djomehri, H. Jin, R. Biswas, Scientific application-based performance comparison of SGI Altix 4700, IBM Power5+, and SGI ICE 8200 supercomputers, in: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, 2008.
[28]
H. Shan, H. Jin, K. Fuerlinger, A. Koniges, N. Wright, Analyzing the Performance Effects of Programming Models and Memory Usage on Cray XT5 Platforms, Cray User Group (CUG) meeting, Edingburgh, United Kingdom, 2010.
[29]
Silicon Graphics, Inc., MIPSpro (TM) Power Fortran 77 Programmer's Guide, Document 007-2361, SGI, 1999.
[30]
The Top500 List of Supercomputer Sites. Available from: <http://www.top500.org/lists/>.
[31]
The UPC Consortium, UPC Language Specification (V1.2), 2005. Available from: <http://www.upc.gwu.edu/documentation.html>.
[32]
R.F. Van der Wijgnaart, H. Jin, The NAS Parallel Benchmarks, Multi-Zone Versions. Technical Report NAS-03-010, NASA Ames Research Center, Moffett Field, CA, 2003.
[33]
K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, A. Aiken, Titanium: a high-performance java dialect, in: Proceedings of ACM 1998 Workshop on Java for High-Performance Network Computing, 1998.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Parallel Computing
Parallel Computing  Volume 37, Issue 9
September, 2011
155 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 September 2011

Author Tags

  1. Data Locality
  2. Hybrid MPI+OpenMP programming
  3. Multi-core Systems
  4. OpenMP Extensions

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media