skip to main content
10.1145/1024393.1024419acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
Article

Devirtualizable virtual machines enabling general, single-node, online maintenance

Published: 07 October 2004 Publication History

Abstract

Maintenance is the dominant source of downtime at high availability sites. Unfortunately, the dominant mechanism for reducing this downtime, cluster rolling upgrade, has two shortcomings that have prevented its broad acceptance. First, cluster-style maintenance over many nodes is typically performed a few nodes at a time, mak-ing maintenance slow and often impractical. Second, cluster-style maintenance does not work on single-node systems, despite the fact that their unavailability during maintenance can be painful for organizations. In this paper, we propose a novel technique for online maintenance that uses virtual machines to provide maintenance on single nodes, allowing parallel maintenance over multiple nodes, and online maintenance for standalone servers. We present the Microvisor, our prototype virtual machine system that is custom tailored to the needs of online maintenance. Unlike general purpose virtual machine environments that induce continual 10-20% over-head, the Microvisor virtualizes the hardware only during periods of active maintenance, letting the guest OS run at full speed most of the time. Unlike past attempts at virtual machine optimization, we do not compromise OS transparency. We instead give up generality and tailor our virtual machine system to the minimum needs of online maintenance, eschewing features, such as I/O and memory virtualization, that it does not strictly require. The result is a very thin virtual machine system that induces only 5.6% CPU overhead when virtualizing the hardware, and zero CPU overhead when devirtualized. Using the Microvisor, we demonstrate an online OS upgrade on a live, single-node web server, reducing downtime from one hour to less than one minute.

References

[1]
J. Appavoo, K. Hui, C. A. N. Soules, R. W. Wisniewski, D. M. Da Silva, O. Krieger, M. A. Auslander, D. J. Edel-sohn, B. Gamsa, G. R. Ganger, P. McKenney, M. Ostrowski, B. Rosenburg, M. Stumm, and J. Xenidis. Enabling auto-nomic behavior in systems software with hot swapping. IBM Systems Journal, 42(1):60-76, 2003.
[2]
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, and Andrew Warfield. Xen and the Art of Virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Princi-ples (SOSP '03), October 2003.
[3]
Terry L. Borden, James P. Hennessy, and James W. Rymarczyk. Multiple operating systems on one processor complex. IBM Systems Journal, 28(1):104--123, 1989.
[4]
Pradip Bose. Keynote address: Power-Efficient Microarchitectural Choices at the Early Definition Stage. In 2003 IEEE/ACM Workshop on Power-Aware Computing Systems (PACS-2003 at Micro-36), December 2003.
[5]
Thomas C. Bressoud and Fred B. Schneider. Hypervisor-based Fault-tolerance. In Proceedings of the 1995 Sympo-sium on Operating Systems Principles, pages 1--11, Decem-ber 1995.
[6]
Eduard Bugnion, Scott Devine, and Mendel Rosenblum. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. In Proceedings of the 1997 ACM Sympo-sium on Operating Systems Principles, October 1997.
[7]
Jeffrey P. Buzen and Robert P. Goldberg. Virutal Machine Techniques for Introducing Peripherals into Computer Sys-tems. In Proceedings of COMPCON 1974, February 1974.
[8]
Steve Chapin. Distributed and Multiprocessor Scheduling. ACM Computing Surveys, 28(1), March 1996.
[9]
Scott W. Devine, Edouard Bugnion, and Mendel Rosenblum. Virtualization system including a virtual machine monitor for a computer with a segmented architecture. US Paten 6,397,242, May 2002.
[10]
George W. Dunlap, Samuel T. King, Sukru Cinar, Murtaza Basrai, and Peter M. Chen. Scale and Performance in the De-nali Isolation Kernel. In Proceedings of the 2002 USENIX Symposium on Operating Systems Design and Implementation (OSDI), December 2002.
[11]
Robert P. Goldberg. Survey of Virtual Machine Research. Computer, pages 34--45, June 1974.
[12]
David Golub, Randall Dean, Allessandro Forin, and Richard Rashid. Unix as an Application Program. In Proceedings of the 1990 USENIX Summer Conference, 1990.
[13]
Kinshuk Govil, Dan Teodosiu, and Mendel Rosenblum. Cel-lular Disco: Resource Management Using Virutal Clusters on Shared-Memory Multiprocessors. In Proceedings of the 1999 ACMSymposium on Operating Systems Principles, De-cember 1999.
[14]
Hermann Hartig, Michael Hohmuth, Jochen Liedtke, Sebas-tian Schonberg, and Jean Wolter. The Performance of Micro-Kernel-Based Systems. In Proceedings of the 1997 ACM Symposium on Operating Systems Principles, October 1997.
[15]
Michael Hicks, Jonathan T. Moore, and Scott Nettles. Dynamic Software Updating. In Proceedings of the 2001 ACM Conference on Programming Language Design and Imple-mentation (PLDI), pages 258--266, May 1996.
[16]
Y. Huang, P. Y. Chung, C. M. R. Kintala, D. Liang, and C. Wang. NT-SwiFT: Software-implemented Fault Tolerance for Windows-NT. In Proceedings of the 1998 USENIX WindowsNT Symposium, August 1998.
[17]
John Humpreys, Mark Melenovsky, and Vernon Turner. Service-Centric Computing: An Infrastructure Perspective, Out-look and Analysis. Technical Report 28934, IDC Corporation, March 2003.
[18]
D. D. Keefe. Hierarchical Control Programs for Systems Evaluation. IBM Systems Journal, 7(2):123--133, 1968.
[19]
Mark M. Levin. Data Center Trends and Best Practices. Technical report, META Group, 2002.
[20]
Windows 2000 Clustering: Performing a Rolling Upgrade. Technical Report http://www.microsoft.com/windows2000/ techinfo/planning/incremental/rollupgr.asp, Microsoft Cor-poration, 2000.
[21]
HP NonStop Group. Personal communication. 1998.
[22]
Open Source Database Benchmark. http://osdb.source-forge.net/, 2003.
[23]
Steven Osman, Dinesh Subhraveti, Gong Su, and Jason Nieh. The Design and Implementation of Zap: A System for Mi-grating Computing Environments. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Im-plementation (OSDI '02), December 2002.
[24]
David Patterson, Aaron Brown, Pete Broadwell, George Candea, Mike Chen, James Cutler, Patricia Enriquez, Ar-mando Fox, Emre Kiciman, Matthew Merzbacher, David Oppenheimer, Naveen Sastry, William Tetzlaff, Jonathan Traupman, and Noah Treuhaft. Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies. Technical Report UCB//CSD-02-1175, UC Berkeley, March 2002.
[25]
David A. Patterson. A Simple Way to Estimate the Cost of Downtime. In Proceedings of the 16th USENIX Large Installation Systems Administration Conference (LISA '02), pages 185--188, November 2002.
[26]
Gerald J. Popek and Charles S. Kline. Verifiable Secure Operating System Software. In Proceedings of 1974 AFIPS National Computer Conference, 1974.
[27]
John Scott Robin and Cynthia E. Irvine. Analysis of the Intel Pentium's Ability to Support a Secure Virtual Machine Monitor. In Proceedings of the 9th USENIX Security Symposium, pages 129--144, August 2000.
[28]
Constantine P. Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim Chow, Monica S. Lam, and Mendel Rosenblum. Opti-mizing the Migration of Virtual Computers. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementation (OSDI '02), December 2002.
[29]
Constantine P. Sapuntzakis, David Brumley, Ramesh Chandra, Nickolai Zeldovich, Jim Chow, Monica S. Lam, and Mendel Rosenblum. Virtual Appliances for Deploying and Maintaining Software. In Proceedings of the 17th USENIX Large Installation Systems Administration Conference (LISA '03), October 2003.
[30]
Mark E. Segal and Ophir Frieder. On-the-fly Program Modi-fication: Systems for Dynamic Updating. IEEE Software, pages 53--65, March 1993.
[31]
Jeremy Sugerman, Ganesh Venkitachalam, and Beng-Hong Lim. Virtualizing I/O Devices on VMware Workstation's Hosted Virtual Machine Monitor. In Proceedings of the 2001 USENIX Annual Technical Conference, June 2001.
[32]
Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient software-based fault isolation. ACM SIGOPS Operating Systems Review, 27(5):203--216, December 1993.
[33]
Carl A. Waldspurger. Memory Resource Management in VMware ESX Server. In Proceedings of the 5th USENIX Symposium on Operating Systems Design and Implementa-tion (OSDI '02), December 2002.
[34]
Andrew Whitaker, Marianne Shaw, and Steven D. Gribble. Scale and Performance in the Denali Isolation Kernel. In Proceedings of the 2002 USENIX Symposium on Operating Systems Design and Implementation (OSDI), December 2002.
[35]
Joel M. Winett. Virtual Machines for Developing Systems Software. In Proceedings of 1971 IEEE Computer Society Conference, September 1971.
[36]
Yuanyuan Zhou, Peter M. Chen, and Kai Li. Fast Cluster Failover Using Virtual Memory-Mapped Communication. In Proceedings of the 1999 ACM Conference on Supercomputing, June 1999.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
October 2004
296 pages
ISBN:1581138040
DOI:10.1145/1024393
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 39, Issue 11
    ASPLOS '04
    November 2004
    283 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/1037187
    Issue’s Table of Contents
  • cover image ACM SIGOPS Operating Systems Review
    ACM SIGOPS Operating Systems Review  Volume 38, Issue 5
    ASPLOS '04
    December 2004
    283 pages
    ISSN:0163-5980
    DOI:10.1145/1037949
    Issue’s Table of Contents
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 32, Issue 5
    ASPLOS 2004
    December 2004
    283 pages
    ISSN:0163-5964
    DOI:10.1145/1037947
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. availability
  2. online maintenance
  3. planned downtime
  4. virtual machines

Qualifiers

  • Article

Conference

ASPLOS04

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media