research-article

Using simulation to explore distributed key-value stores for extreme-scale system services

Authors:

Abhishek Kulkarni,

Ioan RaicuAuthors Info & Claims

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Article No.: 9, Pages 1 - 12

https://doi.org/10.1145/2503210.2503239

Published: 17 November 2013 Publication History

Abstract

Owing to the significant high rate of component failures at extreme scales, system services will need to be failure-resistant, adaptive and self-healing. A majority of HPC services are still designed around a centralized paradigm and hence are susceptible to scaling issues. Peer-to-peer services have proved themselves at scale for wide-area internet workloads. Distributed key-value stores (KVS) are widely used as a building block for these services, but are not prevalent in HPC services. In this paper, we simulate KVS for various service architectures and examine the design trade-offs as applied to HPC service workloads to support extreme-scale systems. The simulator is validated against existing distributed KVS-based services. Via simulation, we demonstrate how failure, replication, and consistency models affect performance at scale. Finally, we emphasize the general use of KVS to HPC services by feeding real HPC service workloads into the simulator and presenting a KVS-based distributed job launch prototype.

References

[1]

Overview of the ibm blue gene/p project. IBM Journal of Research and Development, 52(1.2):199--220, jan. 2008. ISSN: 0018--8646.

[2]

Nawab Ali, Philip Carns, Kamil Iskra, et al. Scalable I/O Forwarding Framework for High-Performance Computing Systems.

[3]

I. Baumgart, B. Heep, and S. Krause. Oversim: A flexible overlay network simulation framework. In IEEE Global Internet Symposium, 2007, pages 79--84, may 2007.

[4]

Robert D. Blumofe and Charles E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720--748, Sept., 1999.

Digital Library

[5]

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, et al. Dynamo: Amazon's highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, SOSP '07, pages 205--220, Stevenson, Washington, USA, 2007. ACM. Available from http://doi.acm.org/10.1145/1294261.1294281.

Digital Library

[6]

Ibrahima Diane, Ibrahima Niang, and Bamba Gueye. A Hierarchical DHT for Fault Tolerant Management in P2P-SIP Networks. In Proceedings of the 2010 IEEE 16th International Conference on Parallel and Distributed Systems, ICPADS '10, pages 788--793, Washington, DC, USA, 2010. IEEE Computer Society. Available from http://dx.doi.org/10.1109/ICPADS.2010.43.

Digital Library

[7]

Tien Tuan Anh Dinh, Georgios Theodoropoulos, and Rob Minson. Evaluating Large Scale Distributed Simulation of P2P Networks. In Proceedings of the 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications, DS-RT '08, pages 51--58, Washington, DC, USA, 2008. IEEE Computer Society. Available from http://dx.doi.org/10.1109/DS-RT.2008.36.

Digital Library

[8]

A. Feinberg. Project Voldemort: Reliable Distributed Storage. ICDE, 2011.

[9]

Brad Fitzpatrick. Distributed caching with memcached. Linux J., 2004(124):5--, August 2004. ISSN: 1075--3583. Available from http://dl.acm.org/citation.cfm?id=1012889.1012894.

Digital Library

[10]

Bogdan Ghit, Florin Pop, and Valentin Cristea. Epidemic-Style Global Load Monitoring in Large-Scale Overlay Networks. In Proceedings of the 2010 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, 3PGCIC '10, pages 393--398, Washington, DC, USA, 2010. IEEE Computer Society. Available from http://dx.doi.org/10.1109/3PGCIC.2010.62.

Digital Library

[11]

Gary Grider. Parallel Reconfigurable Observational Environment (PRObE), October 2012. Available from http://www.nmc-probe.org.

[12]

Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, et al. Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In Proceedings of the nineteenth ACM symposium on Operating systems principles, SOSP '03, pages 314--329, Bolton Landing, NY, USA, 2003. ACM. Available from http://doi.acm.org/10.1145/945445.945475.

Digital Library

[13]

Maurice Herlihy. Wait-free synchronization. ACM Transactions on Programming Languages and Systems (TOPLAS), 13(1):124--149, Jan., 1991.

Digital Library

[14]

Michael A. Heroux. Toward Resilient Algorithms and Applications, April 2013. Available from http://www.sandia.gov/~maherou/docs/HerouxTowardResilientAlgsAndApps.pdf.

[15]

Morris A. Jette, Andy B. Yoo, and Mark Grondona. SLURM: Simple Linux utility for resource management. In Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, editors, 9th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2003), volume 2862 of Lecture Notes in Computer Science, pages 44--60, Seattle, Washington, USA, June 24, 2003. Springer-Verlag.

[16]

David Karger, Eric Lehman, Tom Leighton, et al. Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC '97, pages 654--663, El Paso, Texas, United States, 1997. ACM. Available from http://doi.acm.org/10.1145/258533.258660.

Digital Library

[17]

Tony Vignaux Klaus Muller. Simpy:documentation, May 2010. Available from http://simpy.sourceforge.net/SimPyDocs/index.html.

[18]

Avinash Lakshman and Prashant Malik. Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44:35--40, April 2010. ISSN: 0163--5980. Available from http://doi.acm.org/10.1145/1773912.1773922.

Digital Library

[19]

Tonglin Li, Xiaobing Zhou, Kevin Brandstatter, et al. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table. In Proceedings of the 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS'13), IPDPS '13, Boston, MA, USA, 2013. IEEE Computer Society.

Digital Library

[20]

Eng Keong Lua, Jon Crowcroft, Marcelo Pias, Ravi Sharma, and Steven Lim. A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Surveys and Tutorials, 7(2):72--93, 2005.

Digital Library

[21]

Alberto Montresor and Márk Jelasity. PeerSim: A scalable P2P simulator. In Proc. of the 9th Int. Conference on Peer-to-Peer (P2P'09), pages 99--100, Seattle, WA, September 2009.

[22]

Ioan Raicu, Ian T. Foster, and Pete Beckman. Making a case for distributed file systems at exascale. In Proceedings of the third international workshop on Large-scale system and application performance, LSAP '11, pages 11--18, San Jose, California, USA, 2011. ACM. Available from http://doi.acm.org/10.1145/1996029.1996034.

Digital Library

[23]

Ioan Raicu, Ian T Foster, and Yong Zhao. Many-task computing for grids and supercomputers. In Many-Task Computing on Grids and Supercomputers, 2008. MTAGS 2008. Workshop on, pages 1--11. IEEE, 2008.

[24]

M. Raihan Rahman, W. Golab, A. AuYoung, K. Keeton, and J. J. Wylie. Toward a Principled Framework for Benchmarking Consistency. 2012.

[25]

Philip C. Roth, Dorian C. Arnold, and Barton P. Miller. MRNet: A software-based multicast/reduction network for scalable tools. In in: Proc. IEEE/ACM Supercomputing '03, 2003.

Digital Library

[26]

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: Ascalable peer-to-peer lookup service for internet applications. SIGCOMM Comput. Commun. Rev., 31(4):149--160, August 2001. ISSN: 0146--4833.

Digital Library

[27]

András Varga and Rudolf Hornig. An overview of the omnet++ simulation environment. In Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems & workshops, Simutools '08, pages 60:1--60:10, Marseille, France, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). Available from http://dl.acm.org/citation.cfm?id=1416222.1416290.

Digital Library

[28]

Abhinav Vishnu, Amith R. Mamidala, Hyun-Wook Jin, and Dhabaleswar K. Panda. Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM. In Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19, IPDPS '05, pages 296.2--, Washington, DC, USA, 2005. IEEE Computer Society. Available from http://dx.doi.org/10.1109/IPDPS.2005.339.

Digital Library

[29]

Werner Vogels. Eventually consistent. Queue, 6(6):14--19, October 2008. ISSN: 1542--7730. Available from http://doi.acm.org/10.1145/1466443.1466448.

Digital Library

[30]

Ke Wang, Kevin Brandstatter, and Ioan Raicu. Simmatrix: Simulator for many-task computing execution fabric at exascale. In 21st High Performance Computing Symposia (HPC'13), Part of the SCS Spring Simulation Multiconference (SpringSim'13) in cooperation with ACM/SIGSIM, San Diego, CA, USA, Apr., 2013.

Digital Library

[31]

KeWang, Anupam Rajendranl, and Ioan Raicu. "matrix: Many-task computing execution fabric at exascale". 2013. Available from http://datasys.cs.iit.edu/projects/MATRIX/index.html.

[32]

Jia Yu and Rajkumar Buyya. A taxonomy of workflow management systems for grid computing. Journal of Grid Computing, 3(3--4):171--200, 2005. ISSN: 1570--7873. Available from http://dx.doi.org/10.1007/s10723-005-9010-8.

[33]

Dongfang Zhao and Ioan Raicu. Distributed file systems for exascale computing. In Doctoral Showcase, SC'12: Proceedings of the 2012 ACM/IEEE Conference on Supercomputing, Salt Lake City, UT, November 2012.

Cited By

Anwar ACheng YHuang HHan JSim HLee DDouglis FButt A(2020)Customizable Scale-Out Key-Value StoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.298264031:9(2081-2096)Online publication date: 1-Sep-2020
https://doi.org/10.1109/TPDS.2020.2982640
Cotroneo DNatella RRosiello S(2020)Dependability Evaluation of Middleware Technology for Large-scale Distributed Caching2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE5003.2020.00029(218-228)Online publication date: Oct-2020
https://doi.org/10.1109/ISSRE5003.2020.00029
Dongarra JTourancheau BKim JVetter J(2019)Implementing efficient data compression and encryption in a persistent key-value store for HPCInternational Journal of High Performance Computing Applications10.1177/109434201984726433:6(1098-1112)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1177/1094342019847264
Show More Cited By

Using simulation to explore distributed key-value stores for extreme-scale system services
1. Computer systems organization
  1. Architectures
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures

Recommendations

LSM-tree managed storage for large-scale key-value store
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Key-value stores are increasingly adopting LSM-trees as their enabling data structure in the backend storage, and persisting their clustered data through a file system. A file system is expected to not only provide file/directory abstraction to organize ...
Towards Read-Intensive Key-Value Stores with Tidal Structure Based on LSM-Tree
ASPDAC '20: Proceedings of the 25th Asia and South Pacific Design Automation Conference

Key-value store has played a critical role in many large-scale data storage applications. The log-structured merge-tree (LSM-tree) based key-value store achieves excellent performance on write-intensive workloads which is mainly benefited from the ...
MTDB: an LSM-tree-based key-value store using a multi-tree structure to improve read performance
Abstract
Traditional LSM-tree-based key-value storage systems face significant read amplification issues due to the multi-level structure of LSM-tree, the unordered SSTable files in Level 0, and the lack of an in-memory index structure for key-value pairs. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '13: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

November 2013

1123 pages

ISBN:9781450323789

DOI:10.1145/2503210

General Chair:
William Gropp
University of Illinois at Urbana-Champaign, Urbana, Illinois
,
Program Chair:
Satoshi Matsuoka
Tokyo Institute of Technology, Tokyo, Japan

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SC13

Sponsor:

SIGHPC
SIGARCH
IEEE-CS

SC13: International Conference for High Performance Computing, Networking, Storage and Analysis

November 17 - 21, 2013

Colorado, Denver

Acceptance Rates

SC '13 Paper Acceptance Rate 91 of 449 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
315
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Anwar ACheng YHuang HHan JSim HLee DDouglis FButt A(2020)Customizable Scale-Out Key-Value StoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.298264031:9(2081-2096)Online publication date: 1-Sep-2020
https://doi.org/10.1109/TPDS.2020.2982640
Cotroneo DNatella RRosiello S(2020)Dependability Evaluation of Middleware Technology for Large-scale Distributed Caching2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE)10.1109/ISSRE5003.2020.00029(218-228)Online publication date: Oct-2020
https://doi.org/10.1109/ISSRE5003.2020.00029
Dongarra JTourancheau BKim JVetter J(2019)Implementing efficient data compression and encryption in a persistent key-value store for HPCInternational Journal of High Performance Computing Applications10.1177/109434201984726433:6(1098-1112)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1177/1094342019847264
Anwar ACheng YHuang HHan JSim HLee DDouglis FButt A(2018)bespoKVProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291659(1-16)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291659
Anwar ACheng YHuang HHan JSim HLee DDouglis FButt A(2018)bespoKVProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00005(1-16)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1109/SC.2018.00005
Kim JLee SVetter JMohr BRaghavan P(2017)PapyrusKVProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3126908.3126943(1-14)Online publication date: 12-Nov-2017
https://dl.acm.org/doi/10.1145/3126908.3126943
Zhao DQiao KZhou ZLi TLu ZXu X(2017)Toward Efficient and Flexible Metadata Indexing of Big Data SystemsIEEE Transactions on Big Data10.1109/TBDATA.2016.26403013:1(107-117)Online publication date: 1-Mar-2017
https://doi.org/10.1109/TBDATA.2016.2640301
Huang XWang JQiao JZheng LZhang JWong R(2017)Performance and Replica Consistency Simulation for Quorum-Based NoSQL System CassandraApplication and Theory of Petri Nets and Concurrency10.1007/978-3-319-57861-3_6(78-98)Online publication date: 5-May-2017
https://doi.org/10.1007/978-3-319-57861-3_6
Wang KKulkarni ALang MArnold DRaicu I(2016)Exploring the Design Tradeoffs for Extreme-Scale High-Performance Computing System SoftwareIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.243085227:4(1070-1084)Online publication date: 1-Apr-2016
https://dl.acm.org/doi/10.1109/TPDS.2015.2430852
(2016)Toward high-performance key-value stores through GPU encoding and locality-aware encodingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.04.01596:C(27-37)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.jpdc.2016.04.015
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents