Article

Remus: high availability via asynchronous virtual machine replication

Authors:

Geoffrey Lefebvre,

Norm Hutchinson,

Andrew WarfieldAuthors Info & Claims

NSDI'08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation

Pages 161 - 174

Published: 16 April 2008 Publication History

Abstract

Allowing applications to survive hardware failure is an expensive undertaking, which generally involves reengineering software to include complicated recovery logic as well as deploying special-purpose hardware; this represents a severe barrier to improving the dependability of large or legacy applications. We describe the construction of a general and transparent high availability service that allows existing, unmodified software to be protected from the failure of the physical machine on which it runs. Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, while completely preserving host state such as active network connections. Our approach encapsulates protected software in a virtual machine, asynchronously propagates changed state to a backup host at frequencies as high as forty times a second, and uses speculative execution to concurrently run the active VM slightly ahead of the replicated system state.

References

[1]

BARAK, A., AND WHEELER, R. Mosix: an integrated multiprocessor unix. 41-53.

[2]

BARHAM, P., DRAGOVIC, B., FRASER, K., HAND, S., HARRIS, T., HO, A., NEUGEBAUER, R., PRATT, I., AND WARFIELD, A. Xen and the art of virtualization. In SOSP '03: Proceedings of the nineteenth ACM symposium on Operating systems principles (New York, NY, USA, 2003), ACM Press, pp. 164-177.

Digital Library

[3]

BRADFORD, R., KOTSOVINOS, E., FELDMANN, A., AND SCHIÖBERG, H. Live wide-area migration of virtual machines including local persistent state. In VEE '07: Proceedings of the 3rd international conference on Virtual execution environments (New York, NY, USA, 2007), ACM Press, pp. 169-179.

Digital Library

[4]

BRESSOUD, T. C., AND SCHNEIDER, F. B. Hypervisor-based fault-tolerance. In Proceedings of the Fifteenth ACM Symposium on Operating System Principles (December 1995), pp. 1-11.

Digital Library

[5]

CHANDRA, S., AND CHEN, P. M. The impact of recovery mechanisms on the likelihood of saving corrupted state. In ISSRE '02: Proceedings of the 13th International Symposium on Software Reliability Engineering (ISSRE'02) (Washington, DC, USA, 2002), IEEE Computer Society, p. 91.

Digital Library

[6]

CLARK, C., FRASER, K., HAND, S., HANSEN, J. G., JUL, E., LIMPACH, C., PRATT, I., AND WARFIELD, A. Live migration of virtual machines. In Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation (Berkeley, CA, USA, 2005), USENIX Association.

Digital Library

[7]

CULLY, B., AND WARFIELD, A. Secondsite: disaster protection for the common server. In HOTDEP'06: Proceedings of the 2nd conference on Hot Topics in System Dependability (Berkeley, CA, USA, 2006), USENIX Association.

Digital Library

[8]

DUNLAP, G. Execution Replay for Intrusion Analysis. PhD thesis, University of Michigan, 2006.

Digital Library

[9]

DUNLAP, G. W., KING, S. T., CINAR, S., BASRAI, M. A., AND CHEN, P. M. Revirt: Enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th Symposium on Operating Systems Design & Implementation (OSDI 2002) (2002).

Digital Library

[10]

GUPTA, D., YOCUM, K., MCNETT, M., SNOEREN, A. C., VAHDAT, A., AND VOELKER, G. M. To infinity and beyond: time warped network emulation. In SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles (2005).

Digital Library

[11]

HOWARD, J. H., KAZAR, M. L., MENEES, S. G., NICHOLS, D. A., SATYANARAYANAN, M., SIDEBOTHAM, R. N., AND WEST, M. J. Scale and performance in a distributed file system. ACM Transactions on Computer Systems 6, 1 (1988), 51-81.

Digital Library

[12]

HP. NonStop Computing. http://h20223.www2.hp.com/non-stopcomputing/cache/76385-0-0-0-121.aspx.

[13]

KING, S. T., DUNLAP, G. W., AND CHEN, P. M. Debugging operating systems with time-traveling virtual machines. In ATEC'05: Proceedings of the USENIX Annual Technical Conference 2005 on USENIX Annual Technical Conference (Berkeley, CA, USA, 2005), USENIX Association.

Digital Library

[14]

Lvm2. http://sources.redhat.com/lvm2/.

[15]

MARQUES, D., BRONEVETSKY, G., FERNANDES, R., PINGALI, K., AND STODGHILL, P. Optimizing checkpoint sizes in the c3 system. In 19th International Parallel and Distributed Processing Symposium (IPDPS 2005) (April 2005).

Digital Library

[16]

MCHARDY, P. Linux imq. http://www.linuximq.net/.

[17]

MEYER, D., AGGARWAL, G., CULLY, B., LEFEBVRE, G., HUTCHINSON, N., FEELEY, M., AND WARFIELD, A. Parallax: Virtual disks for virtual machines. In EuroSys '08: Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems 2008 (New York, NY, USA, 2008), ACM.

Digital Library

[18]

MULLENDER, S. J., VAN ROSSUM, G., TANENBAUM, A. S., VAN RENESSE, R., AND VAN STAVEREN, H. Amoeba: A distributed operating system for the 1990s. Computer 23, 5 (1990), 44-53.

Digital Library

[19]

netem. http://linux-net.osdl.org/index.php/Netem.

[20]

NIGHTINGALE, E. B., CHEN, P. M., AND FLINN, J. Speculative execution in a distributed file system. In SOSP '05: Proceedings of the twentieth ACM symposium on Operating systems principles (New York, NY, USA, 2005), ACM Press, pp. 191-205.

Digital Library

[21]

NIGHTINGALE, E. B., VEERARAGHAVAN, K., CHEN, P. M., AND FLINN, J. Rethink the sync. In USENIX'06: Proceedings of the 7th conference on USENIX Symposium on Operating Systems Design and Implementation (Berkeley, CA, USA, 2006), USENIX Association.

Digital Library

[22]

OSMAN, S., SUBHRAVETI, D., SU, G., AND NIEH, J. The design and implementation of zap: a system for migrating computing environments. SIGOPS Oper. Syst. Rev. 36, SI (2002), 361-376.

Digital Library

[23]

OUSTERHOUT, J. K., CHERENSON, A. R., DOUGLIS, F., NELSON, M. N., AND WELCH, B. B. The sprite network operating system. Computer 21, 2 (1988), 23-36.

Digital Library

[24]

PENG, G. Distributed checkpointing. Master's thesis, University of British Columbia, 2007.

[25]

RASHID, R. F., AND ROBERTSON, G. G. Accent: A communication oriented network operating system kernel. In SOSP '81: Proceedings of the eighth ACM symposium on Operating systems principles (New York, NY, USA, 1981), ACM Press, pp. 64-75.

Digital Library

[26]

REISNER, P., AND ELLENBERG, L. Drbd v8 -replicated storage with shared disk semantics. In Proceedings of the 12th International Linux System Technology Conference (October 2005).

[27]

RUSSELL, R. Netfilter. http://www.netfilter.org/.

[28]

SCHINDLER, J., AND GANGER, G. Automated disk drive characterization. Tech. Rep. CMU SCS Technical Report CMU-CS-99-176, Carnegie Mellon University, December 1999.

[29]

STELLNER, G. CoCheck: Checkpointing and Process Migration for MPI. In Proceedings of the 10th International Parallel Processing Symposium (IPPS '96) (Honolulu, Hawaii, 1996).

Digital Library

[30]

SYMANTEC CORPORATION. Veritas Cluster Server for VMware ESX. http://eval.symantec.com/mktginfo/products/Datasheets /High_Availability/vcs22vmware_datasheet.pdf, 2006.

[31]

VMWARE, INC. Vmware high availability (ha). http://www.vmware.com/products/vi/vc/ha.html, 2007.

[32]

WARFIELD, A. Virtual Devices for Virtual Machines. PhD thesis, University of Cambridge, 2006.

[33]

WARFIELD, A., ROSS, R., FRASER, K., LIMPACH, C., AND HAND, S. Parallax: managing storage for a million machines. In HOTOS'05: Proceedings of the 10th conference on Hot Topics in Operating Systems (Berkeley, CA, USA, 2005), USENIX Association.

Digital Library

[34]

XU, M., BODIK, R., AND HILL, M. D. A "flight data recorder" for enabling full-system multiprocessor deterministic replay. In ISCA '03: Proceedings of the 30th annual international symposium on Computer architecture (New York, NY, USA, 2003), ACM Press, pp. 122-135.

Digital Library

[35]

YANG, Q., XIAO, W., AND REN, J. Trap-array: A disk array architecture providing timely recovery to any point-in-time. In ISCA '06: Proceedings of the 33rd annual international symposium on Computer Architecture (Washington, DC, USA, 2006), IEEE Computer Society, pp. 289-301.

Digital Library

Cited By

Cohen DCohen SNaor DWaddington DHershcovitch M(2024)Dictionary Based Cache Line CompressionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665941(8-14)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665941
Tsalapatis EHancock RHossain RMashtizadeh ATsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)MemSnap μCheckpoints: A Data Single Level Store for Fearless PersistenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651334(622-638)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651334
Haeberlen APhan LMcGuire M(2023)Metaverse as a ServiceProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624662(298-307)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624662
Show More Cited By

Recommendations

Adaptive Remus: adaptive checkpointing for Xen-based virtual machine replication

Adaptive Remus quantifies VM metrics to infer the current hosted application load. With this information, the mechanism adjusts the checkpointing frequency between two modes. I networking mode: increases the checkpointing frequency whenever output ...
Remus: Efficient Live Migration for Distributed Databases with Snapshot Isolation
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Shared-nothing, distributed databases scale transactional and analytical processing over a large data volume by spreading data across servers. However, static sharding of data across nodes makes such systems fail to timely adapt to changing workloads and ...
SRVM: Hypervisor Support for Live Migration with Passthrough SR-IOV Network Devices
VEE '16

Single-Root I/O Virtualization (SR-IOV) is a specification that allows a single PCI Express (PCIe) device (ysical function or PF) to be used as multiple PCIe devices (virtual functions or VF). In a virtualization system, each VF can be directly assigned ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NSDI'08: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation

April 2008

437 pages

ISBN:1119995555221

Editors:
Jon Crowcroft
University of Cambridge
,
Mike Dahlin
University of Texas at Austin

Sponsors

USENIX Assoc: USENIX Assoc

Publisher

USENIX Association

United States

Publication History

Published: 16 April 2008

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

175
Total Citations
View Citations
2
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cohen DCohen SNaor DWaddington DHershcovitch M(2024)Dictionary Based Cache Line CompressionProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665941(8-14)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3655038.3665941
Tsalapatis EHancock RHossain RMashtizadeh ATsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)MemSnap μCheckpoints: A Data Single Level Store for Fearless PersistenceProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651334(622-638)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651334
Haeberlen APhan LMcGuire M(2023)Metaverse as a ServiceProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624662(298-307)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624662
Flauzino JDuarte Jr. E(2023)Towards a Novel Model for Availability as a ServiceProceedings of the 12th Latin-American Symposium on Dependable and Secure Computing10.1145/3615366.3622795(170-175)Online publication date: 16-Oct-2023
https://dl.acm.org/doi/10.1145/3615366.3622795
Decourcelle JNgoc TTeabe BHagimont D(2023)Fast VM Replication on Heterogeneous Hypervisors for Robust Fault ToleranceProceedings of the 24th International Middleware Conference10.1145/3590140.3592849(15-28)Online publication date: 27-Nov-2023
https://dl.acm.org/doi/10.1145/3590140.3592849
Li TChandramouli BBurckhardt SMadden S(2023)DARQ Matter Binds Everything: Performant and Composable Cloud Programming via Resilient StepsProceedings of the ACM on Management of Data10.1145/35892621:2(1-27)Online publication date: 20-Jun-2023
https://dl.acm.org/doi/10.1145/3589262
Agarwal ASun JNoghabi SIyengar SBadam AChandra RSeshan SKalyanaraman S(2021)Redesigning Data Centers for Renewable EnergyProceedings of the 20th ACM Workshop on Hot Topics in Networks10.1145/3484266.3487394(45-52)Online publication date: 10-Nov-2021
https://dl.acm.org/doi/10.1145/3484266.3487394
Tsalapatis EHancock RBarnes TMashtizadeh A(2021)The Aurora Single Level Store Operating SystemProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483563(788-803)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3477132.3483563
Tsalapatis EHancock RBarnes TMashtizadeh AAngel SKasikci BKohler E(2021)The Aurora operating systemProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3458336.3465285(136-143)Online publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1145/3458336.3465285
Gandhi NRoth ESandler BHaeberlen APhan LBarbalace ABhatotia PAlvisi LCadar C(2021)REBOUNDProceedings of the Sixteenth European Conference on Computer Systems10.1145/3447786.3456257(523-539)Online publication date: 21-Apr-2021
https://dl.acm.org/doi/10.1145/3447786.3456257
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents