research-article

Adaptive Process Migrations in Coupled Applications for Exchanging Data in Local File Cache

Authors:

Francois Trahay,

Guoqiang XiaoAuthors Info & Claims

ACM Transactions on Autonomous and Adaptive Systems (TAAS), Volume 13, Issue 2

Article No.: 9, Pages 1 - 25

https://doi.org/10.1145/3226027

Published: 31 July 2018 Publication History

Abstract

Many problems in science and engineering are usually emulated as a set of mutually interacting models, resulting in a coupled or multiphysics application. These component models show challenges originating from their interdisciplinary nature and from their computational and algorithmic complexities. In general, these models are independently developed and maintained, so that they commonly employ the global file system for exchanging their data in the coupled application.

To effectively use the local file cache on the compute node for exchanging the data among the processes of such applications, and consequently boosting I/O performance, this article presents a novel mechanism to migrate a process from one compute node to another node on the basis of block I/O dependency. In this newly proposed mechanism, the block I/O dependency between two involved processes running on the different nodes is profiled as block access similarity by taking advantage of the Cohen’s kappa statistic. Then, the process is supposed to be dynamically migrated from its source node to the destination node, on which there is another process having heavy block I/O dependency. As a result, both processes can exchange their data by utilizing the local file cache instead of the global file system to reduce I/O time. The experimental results demonstrate that the I/O performance can be significantly improved, and the time required for executing the application can be resultantly decreased, as expected.

References

[1]

H. Abdi. 2007. The Kendall rank correlation coefficient. In Encyclopedia of Measurement and Statistics. Sage, Thousand Oaks, CA. 508--510.

[2]

R. Ahmad, A. Gani, and S. Hamid. 2015. Virtual machine migration in cloud data centers: a review, taxonomy, and open research issues. J Supercomput. 71, 7 (2015), 2473--2515.

Digital Library

[3]

Y. Amir, B. Awerbuch, and A. Barak et al. 2000. An opportunity cost approach for job assignment in a scalable computing cluster. IEEE Trans. Parallel Distrib. Syst. 11, 7 (2000), 760--768.

Digital Library

[4]

V. Anthony and J. Garrett. 2005. Understanding interobserver agreement: the kappa statistic. Fam. Med. 37, 5 (2005), 360--363.

[5]

K. Barker, A. Chernikov, N. Chrisochoides et al. 2004. A load balancing framework for adaptive and asynchronous applications. IEEE Trans. Parallel Distrib. Syst. 15, 2 (2004), 183--192.

Digital Library

[6]

J. Benesty, J. Chen, and Y. Huang et al. 2009. Pearson correlation coefficient. In Noise Reduction in Speech Processing. Springer, Berlin, 1--4.

[7]

BTIO Benchmark. 2011. Retrieved from http://www.nas.nasa.gov/.

[8]

A. Choudhary. 2015. Active Storage with Analytics Capabilities and I/O Runtime System for Petascale Systems. Report No. DOE-NWU-25848. Northwestern University, Evanston, IL.

[9]

I. Cores, G. Rodriguez, and P. Gonzalez et al. 2014. Failure avoidance in MPI applications using an application-level approach. Comput. J. 57, 1 (2014), 100--114.

[10]

X. Cui, P. Zhu, and X. Yang et al. 2014. Optimized big data K-means clustering using MapReduce. J. Supercomput. 70, 3 (2014), 1249--1259.

Digital Library

[11]

M. DeGroot and M. Schervish. 2011. Probability and Statistics, 4th ed. Pearson Education Limited, London.

[12]

X. Ding, S. Jiang, F. Chen, and X. Zhang et al. 2007. DiskSeen: Exploiting Disk Layout and Access History to Enhance I/O Prefetch. In Proceedings of the USENIX Annual Technical Conference (ATC’07). USENIX Association.

Digital Library

[13]

J. Dongarra and P. Beckman et al. 2011. The International Exascale Software Roadmap. Int. J. High Perf. Comput. Appl. 25, 1 (2011), 3--60.

Digital Library

[14]

J. Duell. 2000. The design and implementation of berkeley labs linux checkpoint/restart. Technique Report, Lawrence Berkeley National Laboratory.

[15]

FUSE: Filesystem in Userspace. Retrieved from http://fuse.sourceforge.net/.

[16]

B. Hunt, E. Kostelich, and I. Szunyogh. 2007. Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D 230, 1 (2007), 112--126.

[17]

K. Ibrahim, S. Hofmeyr, C. Iancu, and E. Roman. 2011. Optimized pre-copy live migration for memory intensive applications. In Proceedings of the International Conference on High Performance Computing, Network, and Storage Analysis (SC’2011).

Digital Library

[18]

Iozone Filesystem Benchmark. Retrieved from http://www.iozone.org/.

[19]

S. Jiang, X. Ding, Y. Xu, and K. Davis. 2013. A prefetching scheme exploiting both data layout and access history on disk. ACM Trans. Stor. 9, 3 (2013), Article 10.

Digital Library

[20]

E. Jeannot, G. Mercier, and F. Tessier. 2014. Process placement in multicore clusters: Algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst. 25, 4 (2014), 993--1002.

Digital Library

[21]

F. Joseph, J. Cohen, and B. Everitt. 1969. Large sample standard errors of kappa and weighted kappa. Psychol. Bull. 72, 5 (1969), 323--327.

[22]

KDD Cup 1999 Data. Retrieved from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

[23]

J. Larson, R. Jacob, and E. Ong. 2005. The model coupling toolkit: A new Fortran90 toolkit for building multiphysics parallel coupled models. Int. J. High Perform. C 19, 3 (2005), 277--292.

Digital Library

[24]

Z. Li, Z. Chen, S. Srinivasan, and Y. Zhou. 2004. C-Miner: Mining Block Correlations in Storage Systems. In Proceedings of the 3rd USENIX Conference on File and Storage Technologies (ATC’04). USENIX.

Digital Library

[25]

J. Liao, Y. Ishikawa. 2012a. Partial replication of metadata to achieve high metadata availability in parallel file systems. In Proceedings of 41st International Conference on Parallel Processing (ICPP’12). 168--177.

Digital Library

[26]

J. Liao. 2012b. A new concurrent checkpoint mechanism for embeded multi-core systems. Comput. Inform. 31, 3 (2012), 693--709.

[27]

J. Liao, F. Trahay, B. Gerofi, Y. Ishikawa. 2016. Prefetching on storage servers through mining access patterns on blocks. IEEE Trans. Parallel Distrib. Syst. 27, 9 (Sep. 2016), 2698--2710.

Digital Library

[28]

J. Liao, B. Gerofi, G. Lien, S. Nishizawa, T. Miyoshi, H. Tomita, W. Liao, A. Choudhary, and Y. Ishikawa. 2017. A flexible I/O arbitration framework for netCDF based big data processing workflows on high-end supercomputers. Concurr. Comput. Pract. Exper. 29, 15 (Aug. 2017), 12 pages.

[29]

A. Mashtizadeh, M. Cai, and G. Tarasuk-Levin et al. 2014. XvMotion: Unified virtual machine migration over long distance. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC’14).

Digital Library

[30]

V. Medina and J. Garcia. 2014. A survey of migration mechanisms of virtual machines. ACM Comput. Surv. 46, 3 (2014), Article 30.

Digital Library

[31]

V. Melnykov, W. Chen, and R. Maitra. 2012. Mixsim: An R package for simulating data to study performance of clustering algorithms. J. Stat. Softw. 51, 12 (2012), 1--25.

[32]

F. Milojicic, F. Douglis, Y. Paindaveine, and S. Zhou et al. 2000. Process migration. ACM Comput. Surv. 32, 3 (2000), 241--299.

Digital Library

[33]

T. Miyoshi, G. Lien, S. Satoh, and Y. Ishikawa et al. 2016. “Big data assimilation” toward post-peta-scale severe weather prediction: An overview and progress. Proc. IEEE 104, 11 (Nov. 2016), 2155--2179.

[34]

F. Molteni. 2003. Atmospheric simulations using a GCM with simplified physical parametrizations. I: Model climatology and variability in multi-decadal experiments. In Climate Dynamics, Vol. 20, 175--191.

[35]

L. Myers and J. Sirois. 2006. Spearman correlation coefficients, differences between. Wiley StatsRef: Statistics Reference Online.

[36]

X. Ouyang, S. Marcarelli, R. Rajachandrasekar, and D. Panda. 2010. RDMA-based job migration framework for MPI over infiniband. In Proceedings of the IEEE International Conference on Cluster Computing (Cluster’10). 116--125.

Digital Library

[37]

X. Ouyang, R. Rajachandrasekar, X. Besseron, and D. Panda. 2011a. High performance pipelined process migration with RDMA. In Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’11). IEEE Computer Society, 314--323.

Digital Library

[38]

X. Ouyang, R. Rajachandrasekar, X. Besseron, and D. Panda et al. 2011b. CRFS: A lightweight user-level filesystem for generic checkpoint/restart. In Proceedings of 2011 International Conference on Parallel Processing (ICPP’11). 375--384.

Digital Library

[39]

S. Petri and H. Langendorfer. 1995. Load balancing and fault tolerance in workstation clusters migrating groups of communicating processes. ACM SIGOPS Operat. Syst. Rev. 29, 4 (1995), 25--36.

Digital Library

[40]

E. Riedel, G. Gibson, and C. Faloutsos. 1998. Active storage for large-scale data mining and multimedia applications. In Proceedings of 24th Conference on Very Large Databases (VLDB’98). 62--73.

Digital Library

[41]

S. Valcke, R. Budich, and M. Carter et al. 2006. The PRISM software framework and the OASIS coupler. In Proceedings of the 18th Annual BMRC Modelling Workshop.

[42]

C. Vecchiola, S. Pandey, and R. Buyya. 2009. High-performance cloud computing: A view of scientific applications. In Proceedings of the 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN’09). 4--16.

Digital Library

[43]

R. Vyas, H. Maheta, and V. Dabhi et al. 2014. Load balancing using process migration for linux based distributed system. In Proceedings of International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT’14). 248--252.

[44]

C. Wang, F. Mueller, C. Engelmann, and S. Scott. 2008. Proactive process-level live migration in HPC environments. In Proceedings of the International Conference on High Performance Computing, Networks, and Storage Analysis (SC’08). 1--12.

Digital Library

[45]

J. Wang and X. Liang. 2005. Qualitative Data Analysis. East China Normal University Press, 92--93. {in Chinese}

[46]

D. Williams, H. Jamjoom, and H. Weatherspoon. 2012. The Xen-Blanket: Virtualize once, run everywhere. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys’12). ACM, New York, NY, 113--126.

Digital Library

[47]

J. Wong. 2018. C-MapReduce. Retrieved January 2018 from https://github.com/jeffrey-garcia/C-MapReduce.

[48]

Y. Xie, D. Feng, Y. Li, and D. Long. 2016. Oasis: An active storage framework for object storage platform. Future Generation Computer Systems 56 (2016), 746--758.

Digital Library

[49]

F. Xu, F. Liu, L. Liu, and H. Jin et al. 2014. iaware: Making live migration of virtual machines interference-aware in the cloud. IEEE Trans. Comput. 63, 12 (2014), 3012--3025.

Digital Library

[50]

F. Xu, F. Liu, L. Liu, and H. Jin et al. 2014b. Managing performance overhead of virtual machines in cloud computing: a survey, state of the art, and future directions. Proc. IEEE 102, 1 (2014), 11--31.

[51]

X. Zhang, K. Davis, and S. Jiang. 2011. Qos support for end users of i/o-intensive applications using shared storage systems. In Proceedings of the 2011 ACM/IEEE Conference on Supercomputing (SC’11). ACM, New York, NY.

Digital Library

[52]

F. Zhang, C. Docan, M. Parashar et al. 2012. Enabling in-situ execution of coupled scientific workflow on multi-core platform. In Proceedings of IEEE 26th International Parallel 8 Distributed Processing Symposium (IPDPS’12). 1352--1363.

Digital Library

[53]

F. Zheng, H. Zou, and G. Eisenhauer et al. 2013. Flexio: I/O middleware for location-flexible scientific data analytics. In Proceedings of IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13). 320--331.

Digital Library

Cited By

Liao JSha ZCai ZLiu ZLi KLiao WChoudhary AIshiakwa Y(2020)Toward Efficient Block Replication Management in Distributed StorageACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34124505:3(1-27)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3412450

Index Terms

Adaptive Process Migrations in Coupled Applications for Exchanging Data in Local File Cache
1. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        File systems management

Recommendations

Dynamic Process Migration Based on Block Access Patterns Occurring in Storage Servers

An emerging trend in developing large and complex applications on today’s high-performance computers is to couple independent components into a comprehensive application. The components may employ the global file system to exchange their data when ...
Optimizing Local File Accesses for FUSE-Based Distributed Storage
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

Modern distributed file systems can store huge amounts of information while retaining the benefits of high reliability and performance. Many of these systems are prototyped with FUSE, a popular framework for implementing user-level file systems. ...
Implementation of a deduplication cache mechanism using content-defined chunking

Many application programs in data-intensive science read and write large files. Large data consume significant memory because the data is loaded into the page cache. Since memory resources are critically valuable in data-intensive computing, reducing the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems

ACM Transactions on Autonomous and Adaptive Systems Volume 13, Issue 2

June 2018

113 pages

ISSN:1556-4665

EISSN:1556-4703

DOI:10.1145/3243657

Editor:
Bashar Nuseibeh
The Open University, UK

Issue’s Table of Contents

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 July 2018

Accepted: 01 April 2018

Revised: 01 February 2018

Received: 01 July 2016

Published in TAAS Volume 13, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

the Fundamental Research Funds for the Central Universities
Hunan Provincial Natural Science Foundation of China
the Opening Project of State Key Laboratory for Novel Software Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
242
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liao JSha ZCai ZLiu ZLi KLiao WChoudhary AIshiakwa Y(2020)Toward Efficient Block Replication Management in Distributed StorageACM Transactions on Modeling and Performance Evaluation of Computing Systems10.1145/34124505:3(1-27)Online publication date: 19-Oct-2020
https://dl.acm.org/doi/10.1145/3412450

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents