skip to main content
article

Kernel-level single system image for petascale computing

Published: 01 April 2006 Publication History

Abstract

Scientific computing users typically prefer UNIX or UNIX-like operating systems as their runtime for managing software and hardware resources. These UNIX-like systems were originally designed for a single processor as well as for a broad range of programming and usage models. Although UNIX-like systems have successfully been modified to work in SMP or NUMA configuration, their internal structures remain relatively the same over the years. As we move toward the era of petascale computing, these UNIX-like systems are no longer suitable. For instance, the relative cost of supporting generic usages and system services will increase by a magnitude and thus affect the overall system performance; there are insufficient system services to globally manage parallelism, processes, and resources; users may not see the petascale system as a single powerful machine but rather as a set of multiple independent servers. A single system image (SSI) operating system is essential for efficiently manage parallelism, resources and processes as well as providing parallel processing transparency for a system possibly equipped with hundred thousand of processors. However, the success of a petascale SSI operating system goes beyond technical challenges. In particular, it must look very much like the normal UNIX, run unmodified software, scale incrementally, and equip with built-in high availability supports. This position paper focuses on these issues and discusses the development of a petascale SSI, based on an existing kernel-level SSI system, OpenSSI.

References

[1]
Sterling, T. and Foster, I. In Proceedings of the Petaflops, Systems Workshops, Technical Report CACR-133, California Institute of Technology, Oct. 1996.
[2]
Fast-OS Forum to address scalable technology for runtime and operating system. http://www.fastos.org/
[3]
OpenSSI Website: http://www.openssi.org/
[4]
Pfister, G. F. In Search For Clusters. Prentice Hall, 1998.
[5]
OpenMosix Website: http://www.openmosix.org/
[6]
Morin, C., Lottiaux R., Valle, G., Gallard, P., Margery, D., Berthou, J., and Scherson, I. Kerrighed and data parallelism: Cluster computing on single system image operating systems. In Proceedings of IEEE Cluster 2004, September 2004.
[7]
Yilmaz, G. and Erdogan, N. Partitioned Object Models for Distributed Abstractions, In Proceeding 14th International Symp. on Computer and Information Sciences (ISCIS XIV), Kusadasi, Turkey, 1999.
[8]
Appavoo, J., Auslander, J., DaSilva, D., Edelsohn, D., Krieger, O., Ostrowski, M., Rosenburg, B., Wisniewski, R. W., and Xenidis, J. "K42 Overview," IBM TJ Watson Research, 2002
[9]
Mooney, R. et al. NWPerf: A System Wide Performance Monitoring Tool, Poster Session 31, Supercomputing 2004, Pittsburg, PA.
[10]
Bolosky, W. J., Scott, M., L., Fitzgerald, R. P. Fowler, R. J. and Cox, A. L. NUMA Policies and Their Relation to Memory Architecture. In Proceedings of the dth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 212--221, April 1991.
[11]
Cox, A. L. and Fowler, R. J. The Implementation of a Coherent Memory Abstraction on a NUMA Multiprocessor: Experiences with PLATINUM. In Proceedings of the 12th ACM Symposium on Operating Systems Principles, pages 32--44, December 1989.
[12]
Petrini, F., Kerbyson, D., and Pakin, S. "The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q." In IEEE/ACM SC2003, Phoenix, Arizona.
[13]
Fetrini, F., Kerbyson, D. J., and Pakin, S. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q Performance and Architecture Laboratory (PAL). Computer and Computational Sciences (CCS) Division, Los Alamos National Laboratory, Los Alamos, New Mexico.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 40, Issue 2
April 2006
107 pages
ISSN:0163-5980
DOI:10.1145/1131322
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2006
Published in SIGOPS Volume 40, Issue 2

Check for updates

Author Tags

  1. SSI
  2. availability
  3. kernel
  4. scalability

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media