skip to main content
10.1145/1851476.1851537acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

AzureBlast: a case study of developing science applications on the cloud

Published: 21 June 2010 Publication History

Abstract

Cloud computing has emerged as a new approach to large scale computing and is attracting a lot of attention from the scientific and research computing communities. Despite its growing popularity, it is still unclear just how well the cloud model of computation will serve scientific applications. In this paper we analyze the applicability of cloud to the sciences by investigating an implementation of a well known and computationally intensive algorithm called BLAST. BLAST is a very popular life sciences algorithm used commonly in bioinformatics research. The BLAST algorithm makes an excellent case study because it is both crucial to many life science applications and its characteristics are representative of many applications important to data intensive scientific research. In our paper we introduce a methodology that we use to study the applicability of cloud platforms to scientific computing and analyze the results from our study. In particular we examine the best practices of handling the large scale parallelism and large volumes of data. While we carry out our performance evaluation on Microsoft's Windows Azure the results readily generalize to other cloud platforms.

References

[1]
}}S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403--410, 1990.
[2]
}}M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Zaharia. Above the clouds: A berkeley view of cloud computing, Feb 2009.
[3]
}}R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. In PPOPP '95: Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, pages 207--216, New York, NY, USA, 1995. ACM.
[4]
}}R. C. Braun, K. T. Pedretti, T. L. Casavant, T. E. Scheetz, C. L. Birkett, and C. A. Roberts. Parallelization of local blast service on workstation clusters. Future Generation Computer Systems, 17(6):745--754, 2001.
[5]
}}B. Calder and A. Edwards. Windows azure drive. Technical report, Microsoft, 2010.
[6]
}}B. Calder, T. Wang, S. Mainali, and J. Wu. Windwos azure blob. Technical report, Microsoft, 2009.
[7]
}}A. E. Darling, L. Carey, and W. chun Feng. The design, implementation, and evaluation of mpiblast. In In Proceedings of ClusterWorld, 2003.
[8]
}}J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.
[9]
}}R. L. Henderson. Job scheduling under the portable batch system. In IPPS '95: Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing, pages 279--294, London, UK, 1995. Springer-Verlag.
[10]
}}M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: distributed data-parallel programs from sequential building blocks. In EuroSys '07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, pages 59--72, New York, NY, USA, 2007. ACM.
[11]
}}H.-S. Kim, H.-J. Kim, and D.-S. Han. Performance evaluation of blast on smp machines. pages 668--676. 2006.
[12]
}}R. Lucchi and M. Mazzara. A pi-calculus based semantics for ws-bpel. Journal of Logic and Algebraic Programming, 70(1):96--118, January 2007.
[13]
}}A. Matsunaga, M. Tsugawa, and J. Fortes. Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics applications. eScience, IEEE International Conference on, 2008.
[14]
}}Microsoft. Windows azure queue. Technical report, Microsoft, 2008.
[15]
}}M. C. Schatz. Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics, (11):1363--1369, June 2009.
[16]
}}S. Toub. Patterns for parallel programming: Understanding and applying parallel patterns with the .net framework 4. Technical report, Microsoft, 2010.
[17]
}}J. Varia. Architecting for the cloud: Best practices. Technical report, Amazon, 2010.
[18]
}}J. Wilkening, A. Wilke, N. Desai, and F. Meyer. Using clouds for metagenomics: A case study. Proceedings IEEE Cluster, 2009.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '10: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
June 2010
911 pages
ISBN:9781605589428
DOI:10.1145/1851476
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. BLAST
  2. Windows Azure
  3. cloud computing

Qualifiers

  • Research-article

Conference

HPDC '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media