skip to main content
research-article

Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems

Published: 01 March 1997 Publication History

Abstract

Real-time systems are being increasingly used in several applications which are time critical in nature. Fault-tolerance is an important requirement of such systems, due to the catastrophic consequences of not tolerating faults. In this paper, we study a scheme that provides fault-tolerance through scheduling in real-time multiprocessor systems. We schedule multiple copies of dynamic, aperiodic, nonpreemptive tasks in the system, and use two techniques that we call deallocation and overloading to achieve high acceptance ratio (percentage of arriving tasks scheduled by the system). This paper compares the performance of our scheme with that of other fault-tolerant scheduling schemes, and determines how much each of deallocation and overloading affects the acceptance ratio of tasks. The paper also provides a technique that can help real-time system designers determine the number of processors required to provide fault-tolerance in dynamic systems. Lastly, a formal model is developed for the analysis of systems with uniform tasks.

References

[1]
S. Balaji L. Jenkins L.M. Patnaik and P.S. Goel, "Workload Redistribution for Fault-tolerance in a Hard Real-Time Distributed Computing System," Proc. IEEE Fault-tolerance Computing Symp. (FTCS-19), pp. 366-383, 1989.
[2]
A.A. Bertossi and L.V. Mancini, "Scheduling Algorithms for Fault-Tolerance in Hard-Real-Time Systems," Technical Report TR-15/91, Univ. of Pica, Corso, Italy, 1991.
[3]
E. Cooper, "Replicated Distributed Programs," Proc. 10th ACM Symp. Operating System Principles, pp. 63-78, Dec. 1985.
[4]
M.L. Dertouzos and A.K. Mok, "Multiprocessor On-Line Scheduling of Hard-Real-Time Tasks," IEEE Trans. Software Eng., vol. 15, no. 12, pp. 1,497-1,506, Dec. 1989.
[5]
M. DiNatale and J. Stankovic, "Dynamic End-to-End Guarantees in Distributed Real-Time Systems," Proc. Real-Time Systems Symp., 1994.
[6]
M.R. Garey and D.S. Johnson, Computers and Intractability, a Guide to the Theory of Completeness. San Francisco: W.H. Freeman, 1979.
[7]
S. Ghosh R. Melhem and D. Mossé, "Fault-Tolerant Scheduling on a Hard Real-Time Multiprocessor System," Proc. Int'l Parallel Processing Symp., Apr. 1994.
[8]
K. Jeffay D.F. Stanat and C.U. Martel, "On Non-Preemptive Scheduling of Periodic and Sporadic Tasks," Proc. IEEE Real-Time Systems Symp., pp. 129-139, Dec. 1991.
[9]
B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems. Addison Wesley, 1989.
[10]
R.M. Kieckhafer C.J. Walter A.M. Finn and P.M. Thambidurai, "The MAFT Architecture for Distributed Fault-tolerance," IEEE Trans. Computers, vol. 37, no. 4, pp. 398-405, Apr. 1988.
[11]
K.H. Kim and A. Damm, "Fault-tolerance Approaches in Two Experimental Real-Time Systems," Proc. Seventh Workshop Real-Time Operating Systems and Software, pp. 94-98, May 1990.
[12]
H. Kopetz A. Damm C. Koza M. Mulazzani W. Schwabl C. Senft and R. Zainlinger, "Distributed Fault-Tolerant Real-Time Systems: The MARS Approach," IEEE Micro, vol. 9, no. 1, pp. 25-40, Feb. 1989.
[13]
C.M. Krishna and K.G. Shin, "On Scheduling Tasks with a Quick Recovery from Failure," IEEE Trans. Computers, vol. 35, no. 5, pp. 448-455, May 1986.
[14]
J.H. Lala and R.E. Harper, "Architectural Principles for Safety-Critical Real-Time Applications," Proc. IEEE, vol. 82, no. 1, pp. 25-40, Jan. 1994.
[15]
A.L. Liestman and R.H. Campbell, "A Fault-Tolerant Scheduling Problem," IEEE Trans. Software Eng., vol. 12, no. 11, pp. 1,089-1,095, Nov. 1988.
[16]
A.K. Mok and M.L. Dertouzos, "Multiprocessor Scheduling in a Hard Real-Time Environment," Proc. Texas Conf. Computing Systems, 1978.
[17]
J.J. Molini S.K. Maimon and P.H. Watson, "Real-Time System Scenarios," Proc. 11th Real-Time Systems Symp., pp. 214-225, Lake Buena Vista, Fla., Dec. 1990.
[18]
D. Mossé R. Melhem and S. Ghosh, "Analysis of a Fault-Tolerant Multiprocessor Scheduling Algorithm," Proc. 24th Int'l Symp. Fault-Tolerant Computing, Austin, Tex., June 1994.
[19]
S.K. Oh and G. MacEwen, "Toward Fault-Tolerant Adaptive Real-Time Distributed Systems," External Technical Report 92-325, Dept. of Computing and Information Science, Queen's Univ., Kingston, Ontario, Canada, Jan. 1992.
[20]
Y. Oh and S. Son, "Multiprocessor Support for Real-Time Fault Tolerant Scheduling," Proc. IEEE 1991 Workshop Architectural Aspects of Real-Time Systems, pp. 76-80, San Antonio, Tex., Dec. 1991.
[21]
Y. Oh and S. Son, "Fault-Tolerant Real-Time Multiprocessor Scheduling," Technical Report TR-92-09, Univ. of Virginia, Apr. 1992.
[22]
M. Pandya and M. Malek, "Minimum Achievable Utilization for Fault-Tolerant Processing of Periodic Tasks," Technical Report TR 94-07, Univ. of Texas at Austin, Dept. of Computer Science, 1994.
[23]
D.K. Pradhan, Fault Tolerant Computing: Theory and Techniques. Englewood Cliffs, N.J.: Prentice Hall, 1986.
[24]
K. Ramamritham and J.A. Stankovic, "Scheduling Algorithms and Operating Systems Support for Real-Time Systems," Proc. IEEE, vol. 82, no. 1, pp. 55-67, Jan. 1994.
[25]
K. Ramamritham and J.A. Stankovic, "Dynamic Task Scheduling in Hard Real-Time Distributed Systems," IEEE Software, pp. 65-75, July 1984.
[26]
S. Ramos-Thuel and J.K. Strosnider, "The Transient Server Approach to Scheduling Time-Critical Recovery Operations," Proc. Real-Time Systems Symp., pp. 286-295, San Antonio, Tex., Dec. 1991.
[27]
B. Randell, "System Structure for Software Fault-tolerance," IEEE Trans. Software Eng., vol. 1, no. 2, pp. 220-232, June 1975.
[28]
R.L. Sedlmeyer and D.J. Thuente, "The Application of the Rate-Monotonic Algorithm to Signal Processing Systems," Proc. Real-Time Systems Symp., Development Sessions, 1991.
[29]
K.G. Shin and P. Ramanathan, "Real-Time Computing: A New Discipline of Computer Science and Engineering," Proc. IEEE, vol. 82, no. 1, pp. 6-24, Jan. 1994.
[30]
J.A. Stankovic, "Decentralized Decision Making for Task Reallocation in a Hard Real-Time System," IEEE Trans. Computers, vol. 38, no. 3, pp. 341-355, Mar. 1989.
[31]
J.A. Stankovic and K. Ramamritham, "The Spring Kernel: A New Paradigm for Real-Time Operating Systems," ACM SIGOPS, Operating Systems Review, vol. 23, no. 3, pp. 54-71, July 1989.
[32]
T. Tsuchiya Y. Kakuda and T. Kikuno, "Fault-Tolerant Scheduling Algorithm for Distributed Real-Time Systems," Proc. Third Workshop Parallel and Distributed Real-Time Systems, 1995.
[33]
M. Weber, "Operating Systems Enhancements for a Fault-Tolerant Dual Processor Structure for the Control of an Industrial Process," Software Practice and Experience, vol. 17, no. 5, pp. 345-350, May 1985.
[34]
J.H. Wensley, et al., "SIFT: Design and Analysis of a Fault Tolerant Computer for Aircraft Control," Proc. IEEE, pp. 1,240-1,255, Oct. 1978.
[35]
C.M. Woodside and D.W. Craig, "Local Non-Preemptive Scheduling Policies for Hard Real-Time Distributed Systems," Proc. Real-Time Systems Symp., pp. 12-16, 1987.
[36]
W. Zhao and K. Ramamritham, "Distributed Scheduling Using Bidding and Focused Addressing," Proc. IEEE Real-Time Systems Symp., pp. 103-111, Dec. 1985.
[37]
W. Zhao K. Ramamritham and J.A. Stankovic, "Scheduling Tasks with Resource Requirements in a Hard Real-Time System," IEEE Trans. Software Eng., vol. 13, no. 5, pp. 564-577, May 1987.

Cited By

View all

Index Terms

  1. Fault-Tolerance Through Scheduling of Aperiodic Tasks in Hard Real-Time Multiprocessor Systems

                          Recommendations

                          Comments

                          Information & Contributors

                          Information

                          Published In

                          cover image IEEE Transactions on Parallel and Distributed Systems
                          IEEE Transactions on Parallel and Distributed Systems  Volume 8, Issue 3
                          March 1997
                          95 pages
                          ISSN:1045-9219
                          Issue’s Table of Contents

                          Publisher

                          IEEE Press

                          Publication History

                          Published: 01 March 1997

                          Author Tags

                          1. Real-time scheduling
                          2. fault-tolerance
                          3. operating systems
                          4. primary/backup
                          5. redundancy.
                          6. reliability

                          Qualifiers

                          • Research-article

                          Contributors

                          Other Metrics

                          Bibliometrics & Citations

                          Bibliometrics

                          Article Metrics

                          • Downloads (Last 12 months)0
                          • Downloads (Last 6 weeks)0
                          Reflects downloads up to 14 Sep 2024

                          Other Metrics

                          Citations

                          Cited By

                          View all

                          View Options

                          View options

                          Media

                          Figures

                          Other

                          Tables

                          Share

                          Share

                          Share this Publication link

                          Share on social media