A framework for elastic execution of existing mpi programs
A Raveendran, T Bicer… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
2011 IEEE International Symposium on Parallel and Distributed …, 2011•ieeexplore.ieee.org
There is a clear trend towards using cloud resources in the scientific or the HPC community,
with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a
cloud environment, it will clearly be desirable to exploit elasticity of cloud environments, and
increase or decrease the number of instances an application is executed on during the
execution of the application, to meet time and/or cost constraints. Unfortunately, HPC
applications have almost always been designed to use a fixed number of resources. This …
with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a
cloud environment, it will clearly be desirable to exploit elasticity of cloud environments, and
increase or decrease the number of instances an application is executed on during the
execution of the application, to meet time and/or cost constraints. Unfortunately, HPC
applications have almost always been designed to use a fixed number of resources. This …
There is a clear trend towards using cloud resources in the scientific or the HPC community, with a key attraction of cloud being the elasticity it offers. In executing HPC applications on a cloud environment, it will clearly be desirable to exploit elasticity of cloud environments, and increase or decrease the number of instances an application is executed on during the execution of the application, to meet time and/or cost constraints. Unfortunately, HPC applications have almost always been designed to use a fixed number of resources. This paper describes our initial work towards the goal of making existing MPI applications elastic for a cloud framework. Considering the limitations of the MPI implementations currently available, we support adaptation by terminating one execution and restarting a new program on a different number of instances. The components of our envisioned system include a decision layer which considers time and cost constraints, a framework for modifying MPI programs, and a cloud-based runtime support that can enable redistributing of saved data, and support automated resource allocation and application restart on a different number of nodes. Using two MPI applications, we demonstrate the feasibility of our approach, and show that outputting, redistributing, and reading back data can be a reasonable approach for making existing MPI applications elastic.
ieeexplore.ieee.org