research-article

MPI windows on storage for HPC applications

Authors:

Sergio Rivas-Gomez,

Roberto Gioiosa,

Sai Narasimhamurthy,

Stefano MarkidisAuthors Info & Claims

EuroMPI '17: Proceedings of the 24th European MPI Users' Group Meeting

Article No.: 15, Pages 1 - 11

https://doi.org/10.1145/3127024.3127034

Published: 25 September 2017 Publication History

Abstract

Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI windows on storage, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. Using a modified STREAM micro-benchmark, we measure the sustained bandwidth of MPI windows on storage against MPI memory windows and observe that only a 10% performance penalty is incurred. When using parallel file systems such as Lustre, asymmetric performance is observed with a 10% performance penalty in reading operations and a 90% in writing operations. Nonetheless, experimental results of a Distributed Hash Table and the HACC I/O kernel mini-application show that the overall penalty of MPI windows on storage can be negligible in most cases on real-world applications.

References

[1]

Wesley Bland. 2012. User level failure mitigation in MPI. In European Conference on Parallel Processing. Springer, 499--504.

Digital Library

[2]

Wesley Bland, George Bosilca, Aurelien Bouteiller, Thomas Herault, and Jack Dongarra. 2012. A proposal for User-Level Failure Mitigation in the MPI-3 standard. Department of Electrical Engineering and Computer Science, University of Tennessee (2012).

[3]

Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. O'Reilly.

Digital Library

[4]

Ron Brightwell and Kevin Pedretti. 2011. An intra-node implementation of OpenShmem using virtual address space mapping. In Fifth partitioned global address space conference.

[5]

Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, and Marc Snir. 2009. Toward exascale resilience. The International Journal of High Performance Computing Applications 23, 4 (2009), 374--388.

Digital Library

[6]

Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross. 2011. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage (TOS) 7, 3 (2011), 8.

Digital Library

[7]

Adrian M Caulfield, Joel Coburn, Todor Mollov, Arup De, Ameen Akel, Jiahua He, Arun Jagatheesan, Rajesh K Gupta, Allan Snavely, and Steven Swanson. 2010. Understanding the impact of emerging non-volatile memories on high-performance, io-intensive computing. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1--11.

Digital Library

[8]

Avery Ching, Alok Choudhary, Kenin Coloma, Wei-keng Liao, Robert Ross, and William Gropp. 2003. Noncontiguous i/o accesses through mpi-io. In Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd IEEE/ACM International Symposium on. IEEE, 104--111.

Digital Library

[9]

Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. 2013. Deep learning with COTS HPC systems. In Proceedings of The 30th International Conference on Machine Learning. 1337--1345.

Digital Library

[10]

Michael Cox and David Ellsworth. 1997. Application-controlled demand paging for out-of-core visualization. In Proceedings of the 8th conference on Visualization'97. IEEE Computer Society Press, 235--ff.

Digital Library

[11]

Piotr Dorozyński, Pawel Czarnul, Artur Malinowski, Krzysztof Czurylo, Lukasz Dorau, Maciej Maciejewski, and Pawel Skowron. 2016. Checkpointing of parallel MPI applications using MPI one-sided API with support for byte-addressable non-volatile RAM. Procedia Computer Science 80 (2016), 30--40.

Digital Library

[12]

Graham E Fagg and Jack J Dongarra. 2000. FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In European Parallel Virtual Machine/Message Passing Interface UsersâĂ&Zacute; Group Meeting. Springer, 346--353.

Digital Library

[13]

Huansong Fu, Manjunath Gorentla Venkata, Ahana Roy Choudhury, Neena Imam, and Weikuan Yu. 2017. High-Performance Key-Value Store On OpenSHMEM. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing. IEEE.

Digital Library

[14]

Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling highly-scalable remote memory access programming with MPI-3 one sided. Scientific Programming 22, 2 (2014), 75--91.

Digital Library

[15]

William Gropp, Torsten Hoefler, Rajeev Thakur, and Ewing Lusk. 2014. Using advanced MPI: Modern features of the message-passing interface. MIT Press.

Digital Library

[16]

William Gropp and Ewing Lusk. 2004. Fault tolerance in message passing interface programs. The International Journal of High Performance Computing Applications 18, 3 (2004), 363--372.

Digital Library

[17]

William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel computing 22, 6 (1996), 789--828.

Digital Library

[18]

Yanfei Guo, Wesley Bland, Pavan Balaji, and Xiaobo Zhou. 2015. Fault tolerant MapReduce-MPI for HPC clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 34.

Digital Library

[19]

Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, Katrin Heitmann, Kalyan Kumaran, Venkatram Vishwanath, Tom Peterka, Joe Insley, et al. 2016. HACC: extreme scaling and performance across diverse architectures. Commun. ACM 60, 1 (2016), 97--104.

Digital Library

[20]

Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. 2013. MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing 95, 12 (2013), 1121--1136.

Digital Library

[21]

Terry Jones, Michael J Brim, Geoffroy Vallee, Benjamin Mayer, Aaron Welch, Tonglin Li, Michael Lang, Latchesar Ionkov, Douglas Otstott, Ada Gavrilovska, et al. 2017. UNITY: Unified Memory and File Space. In Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017. ACM, 6.

Digital Library

[22]

Edward Karrels and Ewing Lusk. 1994. Performance analysis of MPI programs. In Proceedings of the Workshop on Environments and Tools For Parallel Scientific Computing. 195--200.

[23]

Sangkuen Lee, Hyogi Sim, Youngjae Kim, and Sudharshan S Vazhkudai. 2017. AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing. IEEE.

Digital Library

[24]

Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A multiplatform study of I/O behavior on petascale supercomputers. In HPDC'15. ACM, 33--44.

Digital Library

[25]

John D McCalpin. 1995. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter 19 (1995), 25.

[26]

Message Passing Interface Forum. 2015. MPI: A Message-Passing Interface Standard. Vol. 3.1. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. Accessed: 2017-05-21.

[27]

Todd C Mowry, Angela K Demke, Orran Krieger, et al. 1996. Automatic compiler-inserted I/O prefetching for out-of-core applications. In OSDI, Vol. 96. 3--17.

Digital Library

[28]

Mihir Nanavati, Malte Schwarzkopf, Jake Wires, and Andrew Warfield. 2015. Non-volatile storage. Commun. ACM 59, 1 (2015), 56--63.

Digital Library

[29]

Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, Erwin Laure, and Stefano Markidis. 2017. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 683--692.

[30]

Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, and Roberto Gioiosa. 2016. Exploring Application Performance on Emerging Hybrid-Memory Supercomputers. In High Performance Computing and Communications. IEEE, 473--480.

[31]

Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68.

Digital Library

[32]

Sergio Rivas-Gomez, Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, and Roberto Gioiosa. 2017. Poster: Extending Message Passing Interface Windows to Storage. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing. IEEE.

Digital Library

[33]

Rajeev Thakur and Alok Choudhary. 1996. An extended two-phase method for accessing sections of out-of-core arrays. Scientific Programming 5, 4 (1996), 301--317.

Digital Library

[34]

Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. Data sieving and collective I/O in ROMIO. In Frontiers of Massively Parallel Computation, 1999. IEEE, 182--189.

Digital Library

[35]

Rajeev Thakur, Ewing Lusk, and William Gropp. 1997. Users guide for ROMIO: A high-performance, portable MPI-IO implementation. Technical Report. Technical Report ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory.

[36]

Sivan Toledo. 1999. A survey of out-of-core algorithms in numerical linear algebra. External Memory Algorithms and Visualization 50 (1999), 161--179.

Digital Library

Cited By

Zheng HVishwanath VKoziol QTang HRavi JMainzer JByna S(2022)HDF5 Cache VOL: Efficient and Scalable Parallel I/O through Caching Data on Node-local Storage2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00015(61-70)Online publication date: May-2022
https://doi.org/10.1109/CCGrid54584.2022.00015
Fernando EMeyliana MHidayanto APrabowo H(2020)Factors Influencing Social Knowledge Management in Social Society: A Systematic Literature ReviewAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0503265:3(198-206)Online publication date: 2020
https://doi.org/10.25046/aj050326
Chien SMarkidis SKarim RLaure ENarasimhamurthy S(2019)Exploring Scientific Application Performance Using Large Scale Object StorageHigh Performance Computing10.1007/978-3-030-02465-9_8(117-130)Online publication date: 25-Jan-2019
https://doi.org/10.1007/978-3-030-02465-9_8
Show More Cited By

Index Terms

MPI windows on storage for HPC applications
1. Computing methodologies
  1. Parallel computing methodologies

Recommendations

MPI-IO/Gfarm: An Optimized Implementation of MPI-IO for the Gfarm File System
CCGRID '11: Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

This paper proposes a design and implementation of an MPI-IO implementation of the Gfarm file system, called MPI-IO/Gfarm. The Gfarm file system is a global file system that federates the local storage of compute nodes among several clusters. It has a ...
Implementation and Evaluation of File Write-Back and Prefetching for MPI-IO Over GPFS

In this paper we present the implementation of an open-source MPI-IO interface for the General Parallel File System (GPFS). Our solution includes the design and implementation of GPFS-based write-back and prefetching modules, which have been integrated ...
A parallel programming interface for out-of-core cluster applications

Clusters of workstations are a practical approach to parallel computing that provide high performance at a low cost for many scientific and engineering applications. In order to handle problems with increasing data sets, methods supporting parallel out-...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EuroMPI '17: Proceedings of the 24th European MPI Users' Group Meeting

September 2017

169 pages

ISBN:9781450348492

DOI:10.1145/3127024

Conference Chair:
Antonio J. Peña
Barcelona Supercomputing Center
,
General Chair:
Pavan Balaji
Argonne National Laboratory
,
Program Chairs:
William Gropp
University of Illinois, Urbana-Champaign
,
Rajeev Thakur
Argonne National Laboratory

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Mellanox: Mellanox Technologies
Intel: Intel

In-Cooperation

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 September 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

EuroMPI/USA '17

Sponsor:

Mellanox
Intel

EuroMPI/USA '17: 24th European MPI Users' Group Meeting

September 25 - 28, 2017

Illinois, Chicago

Acceptance Rates

EuroMPI '17 Paper Acceptance Rate 17 of 37 submissions, 46%;

Overall Acceptance Rate 66 of 139 submissions, 47%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
154
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)1

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng HVishwanath VKoziol QTang HRavi JMainzer JByna S(2022)HDF5 Cache VOL: Efficient and Scalable Parallel I/O through Caching Data on Node-local Storage2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00015(61-70)Online publication date: May-2022
https://doi.org/10.1109/CCGrid54584.2022.00015
Fernando EMeyliana MHidayanto APrabowo H(2020)Factors Influencing Social Knowledge Management in Social Society: A Systematic Literature ReviewAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0503265:3(198-206)Online publication date: 2020
https://doi.org/10.25046/aj050326
Chien SMarkidis SKarim RLaure ENarasimhamurthy S(2019)Exploring Scientific Application Performance Using Large Scale Object StorageHigh Performance Computing10.1007/978-3-030-02465-9_8(117-130)Online publication date: 25-Jan-2019
https://doi.org/10.1007/978-3-030-02465-9_8
Simmendinger CIakymchuk RCebamanos LAkhmetova DBartsch VRotaru TRahn MLaure EMarkidis S(2018)Interoperability strategies for GASPI and MPI in large-scale scientific applicationsThe International Journal of High Performance Computing Applications10.1177/109434201880835933:3(554-568)Online publication date: 14-Nov-2018
https://doi.org/10.1177/1094342018808359
Narasimhamurthy SDanilov NWu SUmanesan GChien SRivas-Gomez SPeng ILaure Ede Witt SPleiter DMarkidis SKaeli DPericàs M(2018)The SAGE project: a storage centric approach for exascale computingProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3205341(287-292)Online publication date: 8-May-2018
https://dl.acm.org/doi/10.1145/3203217.3205341
Rivas-Gomez SMarkidis SLaure EBrabazon KPerks ONarasimhamurthy S(2018)Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00153(921-927)Online publication date: Jun-2018
https://doi.org/10.1109/HPCC/SmartCity/DSS.2018.00153

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents