research-article

Optimization of MPI_Allreduce on the blue Gene/Q supercomputer

Authors:

Sameer Kumar,

Daniel FarajAuthors Info & Claims

EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting

Pages 97 - 103

https://doi.org/10.1145/2488551.2488557

Published: 15 September 2013 Publication History

Get Access

Abstract

The IBM Blue Gene/Q supercomputer has a 5D torus network where each node is connected to ten bi-directional links. In this paper we present techniques to optimize the MPI_Allreduce collective operation by building ten different edge disjoint spanning trees on the ten torus links. We accelerate summing of network packets with local buffers by the use of Quad Processing SIMD unit in the BG/Q cores and executing the sums on multiple communication threads created by the PAMI libraries. The net gain we achieve is a peak throughput of 6.3 GB/sec for double precision floating point sum allreduce, that is a speedup of 3.75x over the collective network based algorithm in the product MPI stack on BG/Q.

References

[1]

D. Chen, N. A. Eisley, P. Heidelberger, R. M. Senger, Y. Sugawara, S. Kumar, V. Salapura, D. L. Satterfield, B. Steinmacher-Burow, and J. J. Parker. The IBM Blue Gene/Q interconnection network and message unit. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pages 26:1--26:10. ACM, 2011.

Digital Library

Google Scholar

[2]

W. Gropp, E. Lusk, N. Doss, and A. Skjellum. MPICH: A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Parallel Computing, 22(6):789--828, September 1996.

Digital Library

Google Scholar

[3]

IBM Blue Gene Team. Overview of the IBM Blue Gene/P project. IBM Journal of Research and Development, 52(1/2), 2008.

Digital Library

Google Scholar

[4]

S. Kumar, A. Faraj, A. R. Mamidala, B. E. Smith, G. Dózsa, B. Cernohous, J. A. Gunnels, D. Miller, J. Ratterman, and P. Heidelberger. Architecture of the component collective messaging interface. International Journal of High Performance Computing Applications (IJHPCA), pages 16--33, 2010.

Digital Library

Google Scholar

[5]

S. Kumar, A. Mamidala, D. Faraj, B. Smith, M. Blocksome, B. Cernohous, D. Miller, J. Parker, J. Ratterman, P. Heidelberger, D. Chen, and B. Steinmacher-Burow. PAMI: A parallel active message interface for the BlueGene/Q supercomputer. In Proceedings of 26th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Shanghai, China, May 2012.

Digital Library

Google Scholar

[6]

G. I. Tanase, G. Almási, H. Xue, and C. Archer. Composable, Non-blocking Collective Operations on POWER7 IH. In Proceedings of the 26th ACM international conference on Supercomputing, ICS '12, 2012.

Digital Library

Google Scholar

Cited By

View all

Luczynski PGianinazzi LIff PWilson LDe Sensi DHoefler TMencagli GDazzi PLowenthal DBadia R(2024)Near-Optimal Wafer-Scale ReduceProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658693(334-347)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658693
Sreedhar DSaxena VSabharwal YVerma AKumar S(2018)Efficient Training of Convolutional Neural Nets on Large Distributed Systems2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00057(392-401)Online publication date: Sep-2018
https://doi.org/10.1109/CLUSTER.2018.00057
Di SCappello F(2016)Adaptive Impact-Driven Detection of Silent Data Corruption for HPC ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.251763927:10(2809-2823)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1109/TPDS.2016.2517639
Show More Cited By

Recommendations

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations
ICS '09: Proceedings of the 23rd international conference on Supercomputing

The IBM Blue Gene/P (BG/P) system is a massively parallel supercomputer succeeding BG/L, and it is based on orders of magnitude in system size and significant power consumption efficiency. BG/P comes with many enhancements to the machine design and new ...
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

We present the architecture of the Deep Computing Messaging Framework (DCMF), a message passing runtime designed for the Blue Gene/P machine and other HPC architectures. DCMF has been designed to easily support several programming paradigms such as the ...
Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer

The Blue Gene/Q (BG/Q) machine is the latest in the line of IBM massively parallel supercomputers, designed to scale to 262,144 nodes and 16 million threads. Each BG/Q node has 68 hardware threads. Hybrid programming paradigms, which use message passing ...

Comments

Information & Contributors

Information

Published In

EuroMPI '13: Proceedings of the 20th European MPI Users' Group Meeting

September 2013

289 pages

ISBN:9781450319034

DOI:10.1145/2488551

General Chair:
Jack Dongarra
University of Tennessee
,
Program Chairs:
Javier Garcia Blas
University Carlos III, Spain
,
Jesus Carretero
University Carlos III, Spain

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

US Government

Conference

EuroMPI '13

Sponsor:

ARCOS

EuroMPI '13: 20th European MPI Users's Group Meeting

September 15 - 18, 2013

Madrid, Spain

Acceptance Rates

EuroMPI '13 Paper Acceptance Rate 22 of 47 submissions, 47%;

Overall Acceptance Rate 66 of 139 submissions, 47%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
291
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Luczynski PGianinazzi LIff PWilson LDe Sensi DHoefler TMencagli GDazzi PLowenthal DBadia R(2024)Near-Optimal Wafer-Scale ReduceProceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3625549.3658693(334-347)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3625549.3658693
Sreedhar DSaxena VSabharwal YVerma AKumar S(2018)Efficient Training of Convolutional Neural Nets on Large Distributed Systems2018 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER.2018.00057(392-401)Online publication date: Sep-2018
https://doi.org/10.1109/CLUSTER.2018.00057
Di SCappello F(2016)Adaptive Impact-Driven Detection of Silent Data Corruption for HPC ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.251763927:10(2809-2823)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1109/TPDS.2016.2517639
Kumar SSharkawi SJan K(2016)Optimization and Analysis of MPI Collective Communication on Fat-Tree Networks2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.85(1031-1040)Online publication date: May-2016
https://doi.org/10.1109/IPDPS.2016.85
Bui HMalakar PVishwanath VMunson TJung EJohnson APapka MLeigh J(2015)Improving Communication Throughput by Multipath Load Balancing on Blue Gene/QProceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC)10.1109/HiPC.2015.44(115-124)Online publication date: 16-Dec-2015
https://dl.acm.org/doi/10.1109/HiPC.2015.44
Bui HJacob RMalakar PViswanath VJohnson APapka MLeigh J(2015)Multipath Load Balancing for M × N Communication Patterns on the Blue Gene/Q Supercomputer Interconnection NetworkProceedings of the 2015 IEEE International Conference on Cluster Computing10.1109/CLUSTER.2015.140(833-840)Online publication date: 8-Sep-2015
https://dl.acm.org/doi/10.1109/CLUSTER.2015.140
Kumar SMamidala AHeidelberger PChen DFaraj D(2014)Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputerThe International Journal of High Performance Computing Applications10.1177/109434201455208628:4(450-464)Online publication date: 7-Nov-2014
https://doi.org/10.1177/1094342014552086
Bui HLeigh JJungy EVishwanathy VPapka M(2014)Improving Data Movement Performance for Sparse Data Patterns on the Blue Gene/Q SupercomputerProceedings of the 2014 43rd International Conference on Parallel Processing Workshops10.1109/ICPPW.2014.47(302-311)Online publication date: 9-Sep-2014
https://dl.acm.org/doi/10.1109/ICPPW.2014.47

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations

The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer

Comments

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

MPI collective communications on the blue gene/p supercomputer: algorithms and optimizations

The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Optimization of MPI collective operations on the IBM Blue Gene/Q supercomputer

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations