Scheduler activations: effective kernel support for the user-level management of parallelism

Published: 01 February 1992


Threads are the vehicle for concurrency in many approaches to parallel programming. Threads can be supported either by the operating system kernel or by user-level library code in the application address space, but neither approach has been fully satisfactory.
This paper addresses this dilemma. First, we argue that the performance of kernel threads is inherently worse than that of user-level threads, rather than this being an artifact of existing implementations; managing parallelism at the user level is essential to high-performance parallel computing. Next, we argue that the problems encountered in integrating user-level threads with other system services is a consequence of the lack of kernel support for user-level threads provided by contemporary multiprocessor operating systems; kernel threads are the wrong abstraction on which to support user-level management of parallelism. Finally, we describe the design, implementation, and performance of a new kernel interface and user-level thread package that together provide the same functionality as kernel threads without compromising the performance and flexibility advantages of user-level management of parallelism.


Teodor Rus

Managing parallelism at the user level is essential to high-performance parallel computing on multiprocessor systems. The operating systems of most high-performance computer systems in use today, however, have been developed to handle parallel computations on uniprocessor systems, called multiprogramming. Multiprogramming exploits the differences in speed between IO operations and control processor operations. That is, by multiprogramming, an IO processor executes the IO operations of one program in parallel with the computations performed by the control processor for another program. The operating system manages this by manipulating heavy weighted processes, where a process is a tuple P= Processor,Program,Status . A UNIX process is a typical example in this respect. Although the overhead of creating, controlling, and terminating processes is high in a multiprogramming environment, it is tolerable for a uniprocessor machine because the granularity of the parallel streams of control is very coarse. The processes thus controlled by the operating system are rather independent and do not really communicate with each other. This situation changes for multiprocessor computers, where multiple streams of control that belong to the same program must be run in parallel. The overhead of creating, controlling, and terminating such processes running in parallel in the same program becomes intolerable. So far, two solutions have been proposed to cope with this situation. Both use a lighter concept of a process, usually called a thread , as a computation unit in terms of which the parallel execution of a program can be expressed. Although the concept of a thread is imprecise, the two solutions to managing parallelism at the user level can be formulated as follows: Develop runtime library routines that can be linked into each application and that provide operations allowing the application to handle its own parallel threads. Extend the kernel of the operating system to support multiple threads of control per program. The first solution provides for user-level thread management, where a user can create a thread environment in which a fixed number of processors run threads in parallel. In the second solution, the operating system kernel allows the programmer to run her or his program by running parallel threads on processors managed by the system. Experiences with thread management show that while user-level threads can be designed to obtain high performance, thread management at the user level may lead to system integration problems that sometimes can lead to incorrect solutions. At the same time, kernel threads can be shown to have the benefit of a correct functionality, but they have lower performance than user-level threads. This paper addresses this dilemma, discussing a kernel interface and a user-level thread package that together combine the functionality of kernel threads with the performance and flexibility of user-level threads. If the reader has hands-on experience with using various parallel libraries p rovided on various multiprocessor machines controlled by versions of UNIX, reading this paper can help him or her understand the strange behavior, limitations, and poor performance of parallel processes created by fork() under UNIX. The conclusions drawn by this paper are experimental and do not refer to all parallel libraries developed and implemented so far on multiprocessor systems. Some parallel packages, particularly under the Umax version of UNIX, have not been supported and experienced long enough and are not examined in this paper; one example is the tasking package implemented on the Encore multiprocessor in 1989-1990. Since these packages were not examined, the experience reported in this paper is incomplete.

