skip to main content
10.1145/1272366.1272383acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
Article

Data driven workflow planning in cluster management systems

Published: 25 June 2007 Publication History

Abstract

Traditional scientific computing has been associated with harnessing computation cycles within and across clusters of machines. In recent years, scientific applications have become increasingly data-intensive. This is especially true in the fields of astronomy and high energy physics. Furthermore, the lowered cost of disks and commodity machines has led to a dramatic increase in the amount of free disk space spread across machines in a cluster. This space is not being exploited by traditional distributed computing tools. In this paper we have evaluated ways to improve the data management capabilities of Condor, a popular distributed computing system. We have augmented the Condor system by providing the capability to store data used and produced by workflows on the disks of machines in the cluster. We have also replaced the Condor matchmaker with a new workflow planning framework that is cognizant of dependencies between jobs in a workflow and exploits these new data storage capabilities to produce workflow schedules. We show that our data caching and workflow planning framework can significantly reduce response times for data-intensive workflows by reducing data transfer over the network in a cluster. We also consider ways in which this planning framework can be made adaptive in a dynamic, multi-user, failure-prone environment.

References

[1]
Biomedical informatics research network. http://www.nbirn.net.
[2]
Condor fair share scheduling. http://www.cs.wisc.edu/condor/manual/v6.7/ 3 5User Priorities.html.
[3]
Grid physics network. http://www.griphyn.org.
[4]
Grid physics network in atlas. http://www.usatlas.bnl.gov/computing/grid/griphyn/.
[5]
Ncbi blast. http://www.ncbi.nlm.nih.gov/BLAST/.
[6]
Sloan Digital Sky Survey. http://www.sdss.org.
[7]
A. Adya et al. Farsite: Federated, Available, and Reliable Storage for an Incompletely Trusted Environment. SIGOPS Oper. Syst. Rev., 36(SI):1--14, 2002.
[8]
J. Bent, D. Thain, A. Arpaci-Dusseau, and R. Arpaci-Dusseau. Explicit Control in the Batch-Aware Distributed File System. In NSDI, pages 365--378, 2004.
[9]
J. Blythe et al. Task Scheduling Strategies for Workflow-based Applications in Grids. CCGrid 2005, 2005.
[10]
L. Bright and D. Maier. Efficient Scheduling and Execution of Scientific Workflow Tasks. In SSDBM, pages 65--78, 2005.
[11]
A. L. Chervenak et al. Giggle: A Framework for Constructing Scalable Replica Location Services. In SC, pages 1--17, 2002.
[12]
E. Deelman, J. Blythe, et al. Pegasus: Mapping scientific workflows onto the grid. In European Across Grids Conference, pages 11--20, 2004.
[13]
D. DeWitt et al. The Gamma Database Machine Project. IEEE Trans. Knowl. Data Eng., 2(1):44--62, 1990.
[14]
I. T. Foster, J.-S. Vöckler, M. Wilde, and Y. Zhao. Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation. In SSDBM, pages 37--46, 2002.
[15]
J.-J. Hwang et al. Scheduling precedence graphs in systems with interprocessor communication times. SIAM J. Comput., 18(2):244--257, 1989.
[16]
Y. E. Ioannidis et al. ZOO: A Desktop Experiment Management Environment. In SIGMOD Conference, pages 580--583, 1997.
[17]
T. Kosar and M. Livny. Stork: Making Data Placement a First Class Citizen in the Grid. In ICDCS, pages 342--349, 2004.
[18]
Y.-K. Kwok and I. Ahmad. Benchmarking and Comparison of the Task Graph Scheduling Algorithms. Journal of Parallel and Distributed Computing, 59(3):381--422, 1999.
[19]
D. Lee et al. LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies. IEEE Trans. Computers, 50(12):1352--1361, 2001.
[20]
D. T. Liu and M. J. Franklin. The Design of GridDB: A Data-Centric Overlay for the Scientific Grid. In VLDB, pages 600--611, 2004.
[21]
G. M. Lohman et al. Query processing in R*. In Query Processing in Database Systems, pages 31--47. Springer, 1985.
[22]
M. A. Nieto-Santisteban et al. When Database Systems Meet the Grid. In CIDR, pages 154--161, 2005.
[23]
J. Quarfoth, A. Korth, and D. Lopez. Task Allocation Algorithms with Communication costs considered. Midwest Instruction and Computing Symposium, 2005.
[24]
K. Ranganathan et al. Improving Data Availability through Dynamic Model-Driven Replication in Large Peer-to-Peer Communities. In CCGRID, pages 376--381, 2002.
[25]
A. Romosan, D. Rotem, A. Shoshani, and D. Wright. Co-Scheduling of Computation and Data on Computer Clusters. In SSDBM, pages 103--112, 2005.
[26]
D. Thain, J. Bent, A. C. Arpaci-Dusseau, R. H. Arpaci-Dusseau, and M. Livny. Pipeline and Batch Sharing in Grid Workloads. In Proceedings of High-Performance Distributed Computing (HPDC-12), pages 152--161, Seattle, Washington, June 2003.
[27]
M. Wieczorek, R. Prodan, and T. Fahringer. Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Record, 34(3):56--62, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
HPDC '07: Proceedings of the 16th international symposium on High performance distributed computing
June 2007
256 pages
ISBN:9781595936738
DOI:10.1145/1272366
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cluster management
  2. condor
  3. data management
  4. planning
  5. scheduling
  6. scientific computing
  7. workflow management

Qualifiers

  • Article

Conference

HPDC07
Sponsor:

Acceptance Rates

Overall Acceptance Rate 166 of 966 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media