skip to main content
10.1145/2503210.2503239acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Using simulation to explore distributed key-value stores for extreme-scale system services

Published: 17 November 2013 Publication History

Abstract

Owing to the significant high rate of component failures at extreme scales, system services will need to be failure-resistant, adaptive and self-healing. A majority of HPC services are still designed around a centralized paradigm and hence are susceptible to scaling issues. Peer-to-peer services have proved themselves at scale for wide-area internet workloads. Distributed key-value stores (KVS) are widely used as a building block for these services, but are not prevalent in HPC services. In this paper, we simulate KVS for various service architectures and examine the design trade-offs as applied to HPC service workloads to support extreme-scale systems. The simulator is validated against existing distributed KVS-based services. Via simulation, we demonstrate how failure, replication, and consistency models affect performance at scale. Finally, we emphasize the general use of KVS to HPC services by feeding real HPC service workloads into the simulator and presenting a KVS-based distributed job launch prototype.

References

[1]
Overview of the ibm blue gene/p project. IBM Journal of Research and Development, 52(1.2):199--220, jan. 2008. ISSN: 0018--8646.
[2]
Nawab Ali, Philip Carns, Kamil Iskra, et al. Scalable I/O Forwarding Framework for High-Performance Computing Systems.
[3]
I. Baumgart, B. Heep, and S. Krause. Oversim: A flexible overlay network simulation framework. In IEEE Global Internet Symposium, 2007, pages 79--84, may 2007.
[4]
Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720--748, Sept., 1999.
[5]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, et al. Dynamo: Amazon's highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, SOSP '07, pages 205--220, Stevenson, Washington, USA, 2007. ACM. Available from http://doi.acm.org/10.1145/1294261.1294281.
[6]
Ibrahima Diane, Ibrahima Niang, and Bamba Gueye. A Hierarchical DHT for Fault Tolerant Management in P2P-SIP Networks. In Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems, ICPADS '10, pages 788--793, Washington, DC, USA, 2010. IEEE Computer Society. Available from http://dx.doi.org/10.1109/ICPADS.2010.43.
[7]
Tien Tuan Anh Dinh, Georgios Theodoropoulos, and Rob Minson. Evaluating Large Scale Distributed Simulation of P2P Networks. In Proceedings of the 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications, DS-RT '08, pages 51--58, Washington, DC, USA, 2008. IEEE Computer Society. Available from http://dx.doi.org/10.1109/DS-RT.2008.36.
[8]
A. Feinberg. Project Voldemort: Reliable Distributed Storage. ICDE, 2011.
[9]
Brad Fitzpatrick. Distributed caching with memcached. Linux J., 2004(124):5--, August 2004. ISSN: 1075--3583. Available from http://dl.acm.org/citation.cfm?id=1012889.1012894.
[10]
Bogdan Ghit, Florin Pop, and Valentin Cristea. Epidemic-Style Global Load Monitoring in Large-Scale Overlay Networks. In Proceedings of the 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 3PGCIC '10, pages 393--398, Washington, DC, USA, 2010. IEEE Computer Society. Available from http://dx.doi.org/10.1109/3PGCIC.2010.62.
[11]
Gary Grider. Parallel Reconfigurable Observational Environment (PRObE), October 2012. Available from http://www.nmc-probe.org.
[12]
Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, et al. Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In Proceedings of the nineteenth ACM symposium on Operating systems principles, SOSP '03, pages 314--329, Bolton Landing, NY, USA, 2003. ACM. Available from http://doi.acm.org/10.1145/945445.945475.
[13]
Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13(1):124--149, Jan., 1991.
[14]
Michael A. Heroux. Toward Resilient Algorithms and Applications, April 2013. Available from http://www.sandia.gov/~maherou/docs/HerouxTowardResilientAlgsAndApps.pdf.
[15]
Morris A. Jette, Andy B. Yoo, and Mark Grondona. SLURM: Simple Linux utility for resource management. In Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, editors, 9th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2003), volume 2862 of Lecture Notes in Computer Science, pages 44--60, Seattle, Washington, USA, June 24, 2003. Springer-Verlag.
[16]
David Karger, Eric Lehman, Tom Leighton, et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC '97, pages 654--663, El Paso, Texas, United States, 1997. ACM. Available from http://doi.acm.org/10.1145/258533.258660.
[17]
Tony Vignaux Klaus Muller. Simpy:documentation, May 2010. Available from http://simpy.sourceforge.net/SimPyDocs/index.html.
[18]
Avinash Lakshman and Prashant Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44:35--40, April 2010. ISSN: 0163--5980. Available from http://doi.acm.org/10.1145/1773912.1773922.
[19]
Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, et al. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS'13), IPDPS '13, Boston, MA, USA, 2013. IEEE Computer Society.
[20]
Eng Keong Lua, Jon Crowcroft, Marcelo Pias, Ravi Sharma, and Steven Lim. A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Surveys and Tutorials, 7(2):72--93, 2005.
[21]
Alberto Montresor and Márk Jelasity. PeerSim: A scalable P2P simulator. In Proc. of the 9th Int. Conference on Peer-to-Peer (P2P'09), pages 99--100, Seattle, WA, September 2009.
[22]
Ioan Raicu, Ian T. Foster, and Pete Beckman. Making a case for distributed file systems at exascale. In Proceedings of the third international workshop on Large-scale system and application performance, LSAP '11, pages 11--18, San Jose, California, USA, 2011. ACM. Available from http://doi.acm.org/10.1145/1996029.1996034.
[23]
Ioan Raicu, Ian T Foster, and Yong Zhao. Many-task computing for grids and supercomputers. In Many-Task Computing on Grids and Supercomputers, 2008. MTAGS 2008. Workshop on, pages 1--11. IEEE, 2008.
[24]
M. Raihan Rahman, W. Golab, A. AuYoung, K. Keeton, and J. J. Wylie. Toward a Principled Framework for Benchmarking Consistency. 2012.
[25]
Philip C. Roth, Dorian C. Arnold, and Barton P. Miller. MRNet: A software-based multicast/reduction network for scalable tools. In in: Proc. IEEE/ACM Supercomputing '03, 2003.
[26]
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: Ascalable peer-to-peer lookup service for internet applications. SIGCOMM Comput. Commun. Rev., 31(4):149--160, August 2001. ISSN: 0146--4833.
[27]
András Varga and Rudolf Hornig. An overview of the omnet++ simulation environment. In Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems & workshops, Simutools '08, pages 60:1--60:10, Marseille, France, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). Available from http://dl.acm.org/citation.cfm?id=1416222.1416290.
[28]
Abhinav Vishnu, Amith R. Mamidala, Hyun-Wook Jin, and Dhabaleswar K. Panda. Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19, IPDPS '05, pages 296.2--, Washington, DC, USA, 2005. IEEE Computer Society. Available from http://dx.doi.org/10.1109/IPDPS.2005.339.
[29]
Werner Vogels. Eventually consistent. Queue, 6(6):14--19, October 2008. ISSN: 1542--7730. Available from http://doi.acm.org/10.1145/1466443.1466448.
[30]
Ke Wang, Kevin Brandstatter, and Ioan Raicu. Simmatrix: Simulator for many-task computing execution fabric at exascale. In 21st High Performance Computing Symposia (HPC'13), Part of the SCS Spring Simulation Multiconference (SpringSim'13) in cooperation with ACM/SIGSIM, San Diego, CA, USA, Apr., 2013.
[31]
KeWang, Anupam Rajendranl, and Ioan Raicu. "matrix: Many-task computing execution fabric at exascale". 2013. Available from http://datasys.cs.iit.edu/projects/MATRIX/index.html.
[32]
Jia Yu and Rajkumar Buyya. A taxonomy of workflow management systems for grid computing. Journal of Grid Computing, 3(3--4):171--200, 2005. ISSN: 1570--7873. Available from http://dx.doi.org/10.1007/s10723-005-9010-8.
[33]
Dongfang Zhao and Ioan Raicu. Distributed file systems for exascale computing. In Doctoral Showcase, SC'12: Proceedings of the 2012 ACM/IEEE Conference on Supercomputing, Salt Lake City, UT, November 2012.

Cited By

View all
  1. Using simulation to explore distributed key-value stores for extreme-scale system services

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
      November 2013
      1123 pages
      ISBN:9781450323789
      DOI:10.1145/2503210
      • General Chair:
      • William Gropp,
      • Program Chair:
      • Satoshi Matsuoka
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 November 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. discrete event simulation
      2. extreme scales
      3. key-value store
      4. system services

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      SC13
      Sponsor:

      Acceptance Rates

      SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;
      Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 06 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media