skip to main content
10.1145/3127024.3127034acmotherconferencesArticle/Chapter ViewAbstractPublication PageseurompiConference Proceedingsconference-collections
research-article

MPI windows on storage for HPC applications

Published: 25 September 2017 Publication History

Abstract

Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI windows on storage, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. Using a modified STREAM micro-benchmark, we measure the sustained bandwidth of MPI windows on storage against MPI memory windows and observe that only a 10% performance penalty is incurred. When using parallel file systems such as Lustre, asymmetric performance is observed with a 10% performance penalty in reading operations and a 90% in writing operations. Nonetheless, experimental results of a Distributed Hash Table and the HACC I/O kernel mini-application show that the overall penalty of MPI windows on storage can be negligible in most cases on real-world applications.

References

[1]
Wesley Bland. 2012. User level failure mitigation in MPI. In European Conference on Parallel Processing. Springer, 499--504.
[2]
Wesley Bland, George Bosilca, Aurelien Bouteiller, Thomas Herault, and Jack Dongarra. 2012. A proposal for User-Level Failure Mitigation in the MPI-3 standard. Department of Electrical Engineering and Computer Science, University of Tennessee (2012).
[3]
Daniel P Bovet and Marco Cesati. 2005. Understanding the Linux Kernel: from I/O ports to process management. O'Reilly.
[4]
Ron Brightwell and Kevin Pedretti. 2011. An intra-node implementation of OpenShmem using virtual address space mapping. In Fifth partitioned global address space conference.
[5]
Franck Cappello, Al Geist, Bill Gropp, Laxmikant Kale, Bill Kramer, and Marc Snir. 2009. Toward exascale resilience. The International Journal of High Performance Computing Applications 23, 4 (2009), 374--388.
[6]
Philip Carns, Kevin Harms, William Allcock, Charles Bacon, Samuel Lang, Robert Latham, and Robert Ross. 2011. Understanding and improving computational science storage access through continuous characterization. ACM Transactions on Storage (TOS) 7, 3 (2011), 8.
[7]
Adrian M Caulfield, Joel Coburn, Todor Mollov, Arup De, Ameen Akel, Jiahua He, Arun Jagatheesan, Rajesh K Gupta, Allan Snavely, and Steven Swanson. 2010. Understanding the impact of emerging non-volatile memories on high-performance, io-intensive computing. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, 1--11.
[8]
Avery Ching, Alok Choudhary, Kenin Coloma, Wei-keng Liao, Robert Ross, and William Gropp. 2003. Noncontiguous i/o accesses through mpi-io. In Cluster Computing and the Grid, 2003. Proceedings. CCGrid 2003. 3rd IEEE/ACM International Symposium on. IEEE, 104--111.
[9]
Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Ng Andrew. 2013. Deep learning with COTS HPC systems. In Proceedings of The 30th International Conference on Machine Learning. 1337--1345.
[10]
Michael Cox and David Ellsworth. 1997. Application-controlled demand paging for out-of-core visualization. In Proceedings of the 8th conference on Visualization'97. IEEE Computer Society Press, 235--ff.
[11]
Piotr Dorozyński, Pawel Czarnul, Artur Malinowski, Krzysztof Czurylo, Lukasz Dorau, Maciej Maciejewski, and Pawel Skowron. 2016. Checkpointing of parallel MPI applications using MPI one-sided API with support for byte-addressable non-volatile RAM. Procedia Computer Science 80 (2016), 30--40.
[12]
Graham E Fagg and Jack J Dongarra. 2000. FT-MPI: Fault tolerant MPI, supporting dynamic applications in a dynamic world. In European Parallel Virtual Machine/Message Passing Interface UsersâĂŹ Group Meeting. Springer, 346--353.
[13]
Huansong Fu, Manjunath Gorentla Venkata, Ahana Roy Choudhury, Neena Imam, and Weikuan Yu. 2017. High-Performance Key-Value Store On OpenSHMEM. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing. IEEE.
[14]
Robert Gerstenberger, Maciej Besta, and Torsten Hoefler. 2014. Enabling highly-scalable remote memory access programming with MPI-3 one sided. Scientific Programming 22, 2 (2014), 75--91.
[15]
William Gropp, Torsten Hoefler, Rajeev Thakur, and Ewing Lusk. 2014. Using advanced MPI: Modern features of the message-passing interface. MIT Press.
[16]
William Gropp and Ewing Lusk. 2004. Fault tolerance in message passing interface programs. The International Journal of High Performance Computing Applications 18, 3 (2004), 363--372.
[17]
William Gropp, Ewing Lusk, Nathan Doss, and Anthony Skjellum. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel computing 22, 6 (1996), 789--828.
[18]
Yanfei Guo, Wesley Bland, Pavan Balaji, and Xiaobo Zhou. 2015. Fault tolerant MapReduce-MPI for HPC clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 34.
[19]
Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, Katrin Heitmann, Kalyan Kumaran, Venkatram Vishwanath, Tom Peterka, Joe Insley, et al. 2016. HACC: extreme scaling and performance across diverse architectures. Commun. ACM 60, 1 (2016), 97--104.
[20]
Torsten Hoefler, James Dinan, Darius Buntinas, Pavan Balaji, Brian Barrett, Ron Brightwell, William Gropp, Vivek Kale, and Rajeev Thakur. 2013. MPI+ MPI: a new hybrid approach to parallel programming with MPI plus shared memory. Computing 95, 12 (2013), 1121--1136.
[21]
Terry Jones, Michael J Brim, Geoffroy Vallee, Benjamin Mayer, Aaron Welch, Tonglin Li, Michael Lang, Latchesar Ionkov, Douglas Otstott, Ada Gavrilovska, et al. 2017. UNITY: Unified Memory and File Space. In Proceedings of the 7th International Workshop on Runtime and Operating Systems for Supercomputers ROSS 2017. ACM, 6.
[22]
Edward Karrels and Ewing Lusk. 1994. Performance analysis of MPI programs. In Proceedings of the Workshop on Environments and Tools For Parallel Scientific Computing. 195--200.
[23]
Sangkuen Lee, Hyogi Sim, Youngjae Kim, and Sudharshan S Vazhkudai. 2017. AnalyzeThat: A Programmable Shared-Memory System for an Array of Processing-In-Memory Devices. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing. IEEE.
[24]
Huong Luu, Marianne Winslett, William Gropp, Robert Ross, Philip Carns, Kevin Harms, Mr Prabhat, Suren Byna, and Yushu Yao. 2015. A multiplatform study of I/O behavior on petascale supercomputers. In HPDC'15. ACM, 33--44.
[25]
John D McCalpin. 1995. A survey of memory bandwidth and machine balance in current high performance computers. IEEE TCCA Newsletter 19 (1995), 25.
[26]
Message Passing Interface Forum. 2015. MPI: A Message-Passing Interface Standard. Vol. 3.1. http://mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf. Accessed: 2017-05-21.
[27]
Todd C Mowry, Angela K Demke, Orran Krieger, et al. 1996. Automatic compiler-inserted I/O prefetching for out-of-core applications. In OSDI, Vol. 96. 3--17.
[28]
Mihir Nanavati, Malte Schwarzkopf, Jake Wires, and Andrew Warfield. 2015. Non-volatile storage. Commun. ACM 59, 1 (2015), 56--63.
[29]
Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Pietro Cicotti, Erwin Laure, and Stefano Markidis. 2017. Exploring the Performance Benefit of Hybrid Memory System on HPC Environments. In Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2017 IEEE International. IEEE, 683--692.
[30]
Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, and Roberto Gioiosa. 2016. Exploring Application Performance on Emerging Hybrid-Memory Supercomputers. In High Performance Computing and Communications. IEEE, 473--480.
[31]
Daniel A Reed and Jack Dongarra. 2015. Exascale computing and big data. Commun. ACM 58, 7 (2015), 56--68.
[32]
Sergio Rivas-Gomez, Ivy Bo Peng, Stefano Markidis, Erwin Laure, Gokcen Kestor, and Roberto Gioiosa. 2017. Poster: Extending Message Passing Interface Windows to Storage. In Proceedings of the International Symposium on Cluster, Cloud and Grid Computing. IEEE.
[33]
Rajeev Thakur and Alok Choudhary. 1996. An extended two-phase method for accessing sections of out-of-core arrays. Scientific Programming 5, 4 (1996), 301--317.
[34]
Rajeev Thakur, William Gropp, and Ewing Lusk. 1999. Data sieving and collective I/O in ROMIO. In Frontiers of Massively Parallel Computation, 1999. IEEE, 182--189.
[35]
Rajeev Thakur, Ewing Lusk, and William Gropp. 1997. Users guide for ROMIO: A high-performance, portable MPI-IO implementation. Technical Report. Technical Report ANL/MCS-TM-234, Mathematics and Computer Science Division, Argonne National Laboratory.
[36]
Sivan Toledo. 1999. A survey of out-of-core algorithms in numerical linear algebra. External Memory Algorithms and Visualization 50 (1999), 161--179.

Cited By

View all

Index Terms

  1. MPI windows on storage for HPC applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EuroMPI '17: Proceedings of the 24th European MPI Users' Group Meeting
    September 2017
    169 pages
    ISBN:9781450348492
    DOI:10.1145/3127024
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    • Mellanox: Mellanox Technologies
    • Intel: Intel

    In-Cooperation

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 September 2017

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. MPI windows on storage
    2. out-of-core computation
    3. parallel I/O

    Qualifiers

    • Research-article

    Conference

    EuroMPI/USA '17
    Sponsor:
    • Mellanox
    • Intel
    EuroMPI/USA '17: 24th European MPI Users' Group Meeting
    September 25 - 28, 2017
    Illinois, Chicago

    Acceptance Rates

    EuroMPI '17 Paper Acceptance Rate 17 of 37 submissions, 46%;
    Overall Acceptance Rate 66 of 139 submissions, 47%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 07 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)HDF5 Cache VOL: Efficient and Scalable Parallel I/O through Caching Data on Node-local Storage2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid54584.2022.00015(61-70)Online publication date: May-2022
    • (2020)Factors Influencing Social Knowledge Management in Social Society: A Systematic Literature ReviewAdvances in Science, Technology and Engineering Systems Journal10.25046/aj0503265:3(198-206)Online publication date: 2020
    • (2019)Exploring Scientific Application Performance Using Large Scale Object StorageHigh Performance Computing10.1007/978-3-030-02465-9_8(117-130)Online publication date: 25-Jan-2019
    • (2018)Interoperability strategies for GASPI and MPI in large-scale scientific applicationsThe International Journal of High Performance Computing Applications10.1177/109434201880835933:3(554-568)Online publication date: 14-Nov-2018
    • (2018)The SAGE project: a storage centric approach for exascale computingProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3205341(287-292)Online publication date: 8-May-2018
    • (2018)Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2018.00153(921-927)Online publication date: Jun-2018

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media