skip to main content
article
Free access

Scheduler activations: effective kernel support for the user-level management of parallelism

Published: 01 February 1992 Publication History

Abstract

Threads are the vehicle for concurrency in many approaches to parallel programming. Threads can be supported either by the operating system kernel or by user-level library code in the application address space, but neither approach has been fully satisfactory.
This paper addresses this dilemma. First, we argue that the performance of kernel threads is inherently worse than that of user-level threads, rather than this being an artifact of existing implementations; managing parallelism at the user level is essential to high-performance parallel computing. Next, we argue that the problems encountered in integrating user-level threads with other system services is a consequence of the lack of kernel support for user-level threads provided by contemporary multiprocessor operating systems; kernel threads are the wrong abstraction on which to support user-level management of parallelism. Finally, we describe the design, implementation, and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromising the performance and flexibility advantages of user-level management of parallelism.

References

[1]
AGHA, G. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, Cmabridge, Mass. 1986.
[2]
ANDERSON, T, LAZOWSKA, E., AND LEVY, I-I. The performance implications of thread management alternatives for shared memory multiprocessors. IEEE Trans. Comput. 38, 12 (Dec. 1989), 1631-1644. Also appeared in Proceedings of the 1989 ACM SIGMETRICS and Performance '89 Conference on Measurement and Modeling of Computer Systems (Oakland, Calif., May 1989), pp. 49-60.
[3]
BARNES, J., AND HUT, P. A hierarchical O(N log N) force-calculation algorithm. Nature 324 (1986), 446-449.
[4]
BmRELL, A., GUTTAG, J., HORNING, J., AND LEVIN, R. Synchronization primitives for a multiprocessor: A formal specification. In Proceedings of the 11th ACM Symposium on Operating Systems Principles (Austin, Tex., Nov. 1987), pp. 94-102.
[5]
BLACK, D. Scheduling support for concurrency and paraUelism in the Mach operating system. IEEE Comput. Mag. 23, 5 (May 1990), 35-43.
[6]
C}tAS~, J., AMADOR, F., LAZOWSKA, E., LEVY, H., AND LITTLEFIELD, R. The Amber system: Parallel programming on a network of multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (Litchfield Park, Ariz., Dec 1989), pp. 147-158.
[7]
CHERITON, D. The V distributed system. Commun ACM. 31, 3 (Mar. 1988), 314-333.
[8]
DRAVES, R., AND COOPER, E. C Threads. Tech. Rep. CMU-CS-88-154, School of Computer Science, Carnegie Mellon Umv., June 1988.
[9]
EDLER, J., LIPKIS, J., AND SCHONBERG, E Process management for highly parallel UNIX systems. In Proceedings of the USEN{X Workshop on UNIX and Supercomputers (Sept. 1988), pp. 1-17.
[10]
GUPTA, A, TVCKER, A., AND STEVENS, L Making effective use of shared-memory multiprocessors: The process control approach Tech. Rep. CSL-TR-91-475A, Computer Systems Laboratory, Stanford Univ., July 1991.
[11]
HALSTEAD, R. Multilisp: A language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7, 4 (Oct. 1985), 501-538.
[12]
HERLmY, M. A methodology for implementing highly concurrent data structures. In Proceedmgs of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Seattle, Wash., Mar 1990), pp. 197-206.
[13]
KARLIN, A., LI, K., MANASSE, M., AND OWlCKI, S. Empirical studies of competitive spinning for a shared-memory multiprocessor In Proceedings of the 13th ACM Sympostum on Operating Systems Principles (Pacific Grove, Calif., Oct. 1991), pp. 41-55.
[14]
LAMPSON, B., AND REOELL, D. Experiences with processes and monitors in Mesa Commun. ACM. 23, 2 (Feb. 1980), 104-117.
[15]
Lo, S.-P., AND GL~GOR, V. A comparative analysis of multiprocessor scheduling algorithms. In Proceedings of the 7th International Conference on Distributed Computing Systems (Sept. 1987), pp 356~363.
[16]
MARSH, B., SCOTT, M., LEBLANC, T, AND MARKATOS, E. First-class user-level threads. In Proceedings of the 13th ACM Symposium on Operating Systems Prznctples (Pacific Grove, Calif, Oct. 1991), pp.110-121.
[17]
MOELLER-N~ELSEN, P., AND STAUNSTRUP, J. Problem-heap: A paradigm for multiprocessor algorithms. Parallel Comput. 4, I (Feb. 1987), 63-74.
[18]
Moss, J., ANn KO~L~R, W. Concurrency features for the Trellis/Owl language. In Proceedings of European Conference on Object-Oriented Programming 1987 (ECOOP 87) (June 1987), pp. 171-180.
[19]
REOELL, D. Experience with Topaz teledebugging. In Proceedings of the ACM SIGPLAN / SIGOPS Workshop on Parallel and Distributed Debugging (Madison, Wisc., May 1988), pp. 35-44.
[20]
SCHROEDER, M., AND BURROWS, M. Performance of Firefly RPC. ACM Trans. Comput. Syst. 8, i (Feb. 1990), 1-17.
[21]
SCOTT, M., LEBLANC, T., AND MARSH, B. Multi-model parallel programming in Psyche. In Proceedings of the 2nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Mar. 1990), pp. 70-78.
[22]
TEVANIAN, A., RASHID, R., GOLUB, D., BLACK, D., CooPER, E., AND YOUNG, M. Mach Threads and the Unix Kernel: The battle for control. In Proceedings of the 1987 USENIX Summer Conference (1987), pp. 185-197.
[23]
THACKER, C., STEWART, L., AND SATTERTHWAITE, JR., E. Firefly: A multiprocessor workstation. IEEE Trans. Comput. 37, 8 (Aug. 1988), 909-920.
[24]
TUCKER, A., AND GUPTA, A. Process control and scheduling issues for multiprogrammed shared memory multiprocessors. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (Litchfield Park, Ariz., Dec. 1989), pp. 159-166.
[25]
VANDEVOORDE, M., AND ROBERTS, E. WorkCrews: An abstraction for controlling parallelism. Int. J. Parallel Program. 17, 4 (Aug. 1988), 347-366.
[26]
WEISER, M., DEMERS, A., AND HAUSER, C. The portable common runtime approach to interoperability. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (Litchfield Park, Ariz., Dec. 1989), pp. 114-122.
[27]
WULF, W., LEVIN, R., AND HARmSON, S. Hydra/C. mmp: An Experimental Computer System. McGraw-Hill, New York, 1981.
[28]
ZA}{ORJAN, J., AND McCANN, C. Processor scheduling in shared memory multiprocessors. In Proceedings of the 1990 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (Boulder, Colo., May 1990), pp. 214-225.
[29]
ZAHORJAN, J., LAZOWSKA, E., AND EAGER, D. The effect of scheduling discipline on spin overhead in shared memory multiprocessors. IEEE Trans. Parallel Distrib. Syst. 2, 2 (Apr. 1991), 180-198.

Cited By

View all

Recommendations

Reviews

Teodor Rus

Managing parallelism at the user level is essential to high-performance parallel computing on multiprocessor systems. The operating systems of most high-performance computer systems in use today, however, have been developed to handle parallel computations on uniprocessor systems, called multiprogramming. Multiprogramming exploits the differences in speed between IO operations and control processor operations. That is, by multiprogramming, an IO processor executes the IO operations of one program in parallel with the computations performed by the control processor for another program. The operating system manages this by manipulating heavy weighted processes, where a process is a tuple P= Processor,Program,Status . A UNIX process is a typical example in this respect. Although the overhead of creating, controlling, and terminating processes is high in a multiprogramming environment, it is tolerable for a uniprocessor machine because the granularity of the parallel streams of control is very coarse. The processes thus controlled by the operating system are rather independent and do not really communicate with each other. This situation changes for multiprocessor computers, where multiple streams of control that belong to the same program must be run in parallel. The overhead of creating, controlling, and terminating such processes running in parallel in the same program becomes intolerable. So far, two solutions have been proposed to cope with this situation. Both use a lighter concept of a process, usually called a thread , as a computation unit in terms of which the parallel execution of a program can be expressed. Although the concept of a thread is imprecise, the two solutions to managing parallelism at the user level can be formulated as follows: Develop runtime library routines that can be linked into each application and that provide operations allowing the application to handle its own parallel threads. Extend the kernel of the operating system to support multiple threads of control per program. The first solution provides for user-level thread management, where a user can create a thread environment in which a fixed number of processors run threads in parallel. In the second solution, the operating system kernel allows the programmer to run her or his program by running parallel threads on processors managed by the system. Experiences with thread management show that while user-level threads can be designed to obtain high performance, thread management at the user level may lead to system integration problems that sometimes can lead to incorrect solutions. At the same time, kernel threads can be shown to have the benefit of a correct functionality, but they have lower performance than user-level threads. This paper addresses this dilemma, discussing a kernel interface and a user-level thread package that together combine the functionality of kernel threads with the performance and flexibility of user-level threads. If the reader has hands-on experience with using various parallel libraries p rovided on various multiprocessor machines controlled by versions of UNIX, reading this paper can help him or her understand the strange behavior, limitations, and poor performance of parallel processes created by fork() under UNIX. The conclusions drawn by this paper are experimental and do not refer to all parallel libraries developed and implemented so far on multiprocessor systems. Some parallel packages, particularly under the Umax version of UNIX, have not been supported and experienced long enough and are not examined in this paper; one example is the tasking package implemented on the Encore multiprocessor in 1989-1990. Since these packages were not examined, the experience reported in this paper is incomplete.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Computer Systems
ACM Transactions on Computer Systems  Volume 10, Issue 1
Feb. 1992
77 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/146941
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 February 1992
Published in TOCS Volume 10, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multiprocessor
  2. thread

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)479
  • Downloads (Last 6 weeks)80
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media