skip to main content
10.5555/1387589.1387601guideproceedingsArticle/Chapter ViewAbstractPublication PagesnsdiConference Proceedingsconference-collections
Article

Remus: high availability via asynchronous virtual machine replication

Published: 16 April 2008 Publication History

Abstract

Allowing applications to survive hardware failure is an expensive undertaking, which generally involves reengineering software to include complicated recovery logic as well as deploying special-purpose hardware; this represents a severe barrier to improving the dependability of large or legacy applications. We describe the construction of a general and transparent high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs. Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, while completely preserving host state such as active network connections. Our approach encapsulates protected software in a virtual machine, asynchronously propagates changed state to a backup host at frequencies as high as forty times a second, and uses speculative execution to concurrently run the active VM slightly ahead of the replicated system state.

References

[1]
BARAK, A., AND WHEELER, R. Mosix: an integrated multiprocessor unix. 41-53.
[2]
BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles (New York, NY, USA, 2003), ACM Press, pp. 164-177.
[3]
BRADFORD, R., KOTSOVINOS, E., FELDMANN, A., AND SCHIÖBERG, H. Live wide-area migration of virtual machines including local persistent state. In VEE '07: Proceedings of the 3rd international conference on Virtual execution environments (New York, NY, USA, 2007), ACM Press, pp. 169-179.
[4]
BRESSOUD, T. C., AND SCHNEIDER, F. B. Hypervisor-based fault-tolerance. In Proceedings of the Fifteenth ACM Symposium on Operating System Principles (December 1995), pp. 1-11.
[5]
CHANDRA, S., AND CHEN, P. M. The impact of recovery mechanisms on the likelihood of saving corrupted state. In ISSRE '02: Proceedings of the 13th International Symposium on Software Reliability Engineering (ISSRE'02) (Washington, DC, USA, 2002), IEEE Computer Society, p. 91.
[6]
CLARK, C., FRASER, K., HAND, S., HANSEN, J. G., JUL, E., LIMPACH, C., PRATT, I., AND WARFIELD, A. Live migration of virtual machines. In Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation (Berkeley, CA, USA, 2005), USENIX Association.
[7]
CULLY, B., AND WARFIELD, A. Secondsite: disaster protection for the common server. In HOTDEP'06: Proceedings of the 2nd conference on Hot Topics in System Dependability (Berkeley, CA, USA, 2006), USENIX Association.
[8]
DUNLAP, G. Execution Replay for Intrusion Analysis. PhD thesis, University of Michigan, 2006.
[9]
DUNLAP, G. W., KING, S. T., CINAR, S., BASRAI, M. A., AND CHEN, P. M. Revirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th Symposium on Operating Systems Design & Implementation (OSDI 2002) (2002).
[10]
GUPTA, D., YOCUM, K., MCNETT, M., SNOEREN, A. C., VAHDAT, A., AND VOELKER, G. M. To infinity and beyond: time warped network emulation. In SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles (2005).
[11]
HOWARD, J. H., KAZAR, M. L., MENEES, S. G., NICHOLS, D. A., SATYANARAYANAN, M., SIDEBOTHAM, R. N., AND WEST, M. J. Scale and performance in a distributed file system. ACM Transactions on Computer Systems 6, 1 (1988), 51-81.
[12]
HP. NonStop Computing. http://h20223.www2.hp.com/non-stopcomputing/cache/76385-0-0-0-121.aspx.
[13]
KING, S. T., DUNLAP, G. W., AND CHEN, P. M. Debugging operating systems with time-traveling virtual machines. In ATEC'05: Proceedings of the USENIX Annual Technical Conference 2005 on USENIX Annual Technical Conference (Berkeley, CA, USA, 2005), USENIX Association.
[14]
Lvm2. http://sources.redhat.com/lvm2/.
[15]
MARQUES, D., BRONEVETSKY, G., FERNANDES, R., PINGALI, K., AND STODGHILL, P. Optimizing checkpoint sizes in the c3 system. In 19th International Parallel and Distributed Processing Symposium (IPDPS 2005) (April 2005).
[16]
MCHARDY, P. Linux imq. http://www.linuximq.net/.
[17]
MEYER, D., AGGARWAL, G., CULLY, B., LEFEBVRE, G., HUTCHINSON, N., FEELEY, M., AND WARFIELD, A. Parallax: Virtual disks for virtual machines. In EuroSys '08: Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2008 (New York, NY, USA, 2008), ACM.
[18]
MULLENDER, S. J., VAN ROSSUM, G., TANENBAUM, A. S., VAN RENESSE, R., AND VAN STAVEREN, H. Amoeba: A distributed operating system for the 1990s. Computer 23, 5 (1990), 44-53.
[19]
netem. http://linux-net.osdl.org/index.php/Netem.
[20]
NIGHTINGALE, E. B., CHEN, P. M., AND FLINN, J. Speculative execution in a distributed file system. In SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles (New York, NY, USA, 2005), ACM Press, pp. 191-205.
[21]
NIGHTINGALE, E. B., VEERARAGHAVAN, K., CHEN, P. M., AND FLINN, J. Rethink the sync. In USENIX'06: Proceedings of the 7th conference on USENIX Symposium on Operating Systems Design and Implementation (Berkeley, CA, USA, 2006), USENIX Association.
[22]
OSMAN, S., SUBHRAVETI, D., SU, G., AND NIEH, J. The design and implementation of zap: a system for migrating computing environments. SIGOPS Oper. Syst. Rev. 36, SI (2002), 361-376.
[23]
OUSTERHOUT, J. K., CHERENSON, A. R., DOUGLIS, F., NELSON, M. N., AND WELCH, B. B. The sprite network operating system. Computer 21, 2 (1988), 23-36.
[24]
PENG, G. Distributed checkpointing. Master's thesis, University of British Columbia, 2007.
[25]
RASHID, R. F., AND ROBERTSON, G. G. Accent: A communication oriented network operating system kernel. In SOSP '81: Proceedings of the eighth ACM symposium on Operating systems principles (New York, NY, USA, 1981), ACM Press, pp. 64-75.
[26]
REISNER, P., AND ELLENBERG, L. Drbd v8 -replicated storage with shared disk semantics. In Proceedings of the 12th International Linux System Technology Conference (October 2005).
[27]
RUSSELL, R. Netfilter. http://www.netfilter.org/.
[28]
SCHINDLER, J., AND GANGER, G. Automated disk drive characterization. Tech. Rep. CMU SCS Technical Report CMU-CS-99-176, Carnegie Mellon University, December 1999.
[29]
STELLNER, G. CoCheck: Checkpointing and Process Migration for MPI. In Proceedings of the 10th International Parallel Processing Symposium (IPPS '96) (Honolulu, Hawaii, 1996).
[30]
SYMANTEC CORPORATION. Veritas Cluster Server for VMware ESX. http://eval.symantec.com/mktginfo/products/Datasheets /High_Availability/vcs22vmware_datasheet.pdf, 2006.
[31]
VMWARE, INC. Vmware high availability (ha). http://www.vmware.com/products/vi/vc/ha.html, 2007.
[32]
WARFIELD, A. Virtual Devices for Virtual Machines. PhD thesis, University of Cambridge, 2006.
[33]
WARFIELD, A., ROSS, R., FRASER, K., LIMPACH, C., AND HAND, S. Parallax: managing storage for a million machines. In HOTOS'05: Proceedings of the 10th conference on Hot Topics in Operating Systems (Berkeley, CA, USA, 2005), USENIX Association.
[34]
XU, M., BODIK, R., AND HILL, M. D. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture (New York, NY, USA, 2003), ACM Press, pp. 122-135.
[35]
YANG, Q., XIAO, W., AND REN, J. Trap-array: A disk array architecture providing timely recovery to any point-in-time. In ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture (Washington, DC, USA, 2006), IEEE Computer Society, pp. 289-301.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NSDI'08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
April 2008
437 pages
ISBN:1119995555221

Sponsors

  • USENIX Assoc: USENIX Assoc

Publisher

USENIX Association

United States

Publication History

Published: 16 April 2008

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media