skip to main content
10.1145/2755644.2755648acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
short-paper
Public Access

High-Performance Storage Support for Scientific Applications on the Cloud

Published: 16 June 2015 Publication History

Abstract

Although cloud computing has become one of the most popular paradigms for executing data-intensive applications (for example, Hadoop), the storage subsystem is not optimized for scientific applications. We believe that when executing scientific applications in the cloud, a node-local distributed storage architecture is a key approach to overcome the challenges from the conventional shared/parallel storage systems. We analyze and evaluate four representative file systems (S3FS, HDFS, Ceph, and FusionFS) on three platforms (Kodiak cluster, Amazon EC2 and FermiCloud) with a variety of benchmarks to explore how well these storage systems can handle metadata intensive, write intensive, and read intensive workloads.

References

[1]
K. Shvachko, et al. The hadoop distributed file system. In MSST, 2010.
[2]
S. Ghemawat, et al. The Google file system. In SOSP, 2003.
[3]
P. Carns, et al. Small-file access in parallel file systems. In IPDPS, 2009.
[4]
S3FS. https://code.google.com/p/s3fs/.
[5]
FUSE. http://fuse.sourceforge.net.
[6]
S. Weil, et al. Ceph: A scalable, high-performance distributed file system. In OSDI, 2006.
[7]
D. Zhao, et al. FusionFS: Toward supporting data-intensive scientific applications on extreme-scale distributed systems. In Big Data Conference, 2014.
[8]
S. Weil, et al. Crush: Controlled, scalable, decentralized placement of replicated data. In SC, 2006.
[9]
D. Zhao, et al. Distributed file systems for exascale computing. In SC, 2012.
[10]
D. Zhao, et al. Towards high-performance and cost-effective distributed storage systems with information dispersal algorithms. In CLUSTER, 2013.
[11]
D. Zhao, et al. Distributed data provenance for large-scale data-intensive computing. In CLUSTER, 2013.
[12]
D. Zhao, et al. Hycache+: Towards scalable high-performance caching middleware for parallel file systems. In CCGrid, 2014.
[13]
D. Zhao, et al. HyCache: A user-level caching middleware for distributed file systems. In IPDPSW, 2013.
[14]
D. Zhao, et al. Virtual chunks: On supporting random accesses to scientific data in compressible storage systems. In Big Data Conference, 2014.
[15]
D. Zhao, et al. Improving the i/o throughput for data-intensive scientific applications with efficient compression mechanisms. In SC, 2013.
[16]
Kodiak. https://www.nmc-probe.org/wiki/machines:kodiak.
[17]
Amazon EC2. http://aws.amazon.com/ec2.
[18]
FermiCloud. http://fclweb.fnal.gov/.
[19]
B. Welch, et al. Optimizing a hybrid SSD/HDD HPC storage system based on file size distributions. In MSST, 2013.
[20]
D. Nagle, et al. The Panasas activescale storage cluster: Delivering scalable high bandwidth storage. In SC, 2004.
[21]
D. Zhao, et al. Exploring reliability of exascale systems through simulations. In HPC, 2013.
[22]
F. Schmuck, et al. GPFS: A shared-disk file system for large computing clusters. In FAST, 2002.
[23]
P. Schwan. Lustre: Building a file system for 1,000-node clusters. In Linux symposium, 2003.
[24]
H. Wu, et al. A reference model for virtual machine launching overhead. IEEE Transactions on Cloud Computing, 2014.
[25]
H. Wu, et al. Modeling the virtual machine launching overhead under fermicloud. In CCGrid, 2014.
[26]
P. Carns, et al. PVFS: A parallel file system for linux clusters. In Annual Linux Showcase and Conference, 2000.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ScienceCloud '15: Proceedings of the 6th Workshop on Scientific Cloud Computing
June 2015
46 pages
ISBN:9781450335706
DOI:10.1145/2755644
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud computing
  2. high-performance storage systems
  3. scientific computing

Qualifiers

  • Short-paper

Funding Sources

  • KISTI
  • NSF CAREER
  • DOE

Conference

HPDC'15
Sponsor:

Acceptance Rates

ScienceCloud '15 Paper Acceptance Rate 3 of 6 submissions, 50%;
Overall Acceptance Rate 44 of 151 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)11
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media