5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz
5 May Outline General consepts about tasks and scheduling Real time systems Fault Tolerant Scheduling Basic approaches used in Fault Tolerant Scheduling Algorithms and their execution details
5 May Task Deadline time the task should be finished Preemptive tasks can be stopped during execution restarted Nonpreemptive tasks cannot be restarted interrupted during execution
5 May Task Properties Periodic Aperiodic activated only when certain events occur arrival times are not known scheduled dynamically Dependent Independent
5 May Task Scheduling Distribution of tasks to the processors according to a given policy. Major goals of task scheduling: distribute the system load reduce total execution time
5 May Static & Dynamic Scheduling Static Scheduling compile time scheduling an accurate weight estimation is needed schedules of all tasks are predetermined Dynamic Scheduling scheduling at run time uses actual values of execution times of processes and communication times
5 May Real Time Systems Hard Real Time Correctness depends on logical results the result production time missing a deadline may be catastrophic mission-critical or life-critical applications fault tolerance is extremely important Soft Real Time
5 May Processors In The System Uniprocessor there is a single processor Multiprocessor there are n processors in the system can be identical (homogenous) can have different properties (heterogenous)
5 May Hard Real Time Systems Use multiprocessor Advantages more reliable unless a processor failure causes the whole system to fail can happen if no fault-tolerant capability is provided one processor failure does not cause the whole system to fail more computational power Disadvantage the probability of processor failure is higher
5 May Fault Tolerant System The system should produce correct results even in the presence of faults Important for most real time applications Tasks can have deadlines, and should be finished before the deadline fault tolerance required hard real time systems
5 May Error Detection in Fault Tolerant Scheduling Fail-Signal notify other processors of a detected fault Alarms or watchdogs detection of timing failures Signatures detection of HW/SW faults Acceptance Tests test results for HW/SW faults
5 May Fault Tolerance In Multiprocessor Systems Multiple copies of tasks scheduled on different processors Aim: the task completes before its deadline
5 May Fault Tolerante Scheduling In Multiprocessor Systems (Cont.) Multiple copies of tasks are scheduled to different processors One or more copies can run to ensure task completion before deadline PB (Primary/Backup Approach) TMR (Triple Modular Redundancy) Error checking is done by comparing results
5 May PB (Primary/Backup Approach) If incorrect results are generated from primary processor, backup processor is activated Small HW resource requirements Tasks are nonpreemptive, aperiodic, real-time
5 May An Algorithm For Real Time Fault Tolerant Scheduling in Multiprocessor Systems N periodic tasks are scheduled on a number of processors For each task i, there is a primary copy P i and a backup copy B i If primary copy fails, backup copy is activated Enough time needed to execute backup copies Static scheduling of tasks
5 May Scheduling Requirements Each task is executed by one processor at a time All tasks should meet their deadlines Maximize the number of processor failures to be tolerated P i and B i are assigned to only one processor which are different. Tasks are preemptive The number of processors used should be minimized
5 May Scheduling Algorithm Primary tasks are arranged in order of decreasing computation times Primary copies are scheduled (m processors are used) assign each copy to existing processors Primary schedule is dublicated for the backup copies (m processors are used) Any pair of primary and backup copies should not overlap
5 May An Example Distribution S={T 1, T 2, T 3, T 4, T 5 } C={5, 4, 4, 3, 2} T 1 -> P 1 T 2 -> P 2 T 3 -> P 1 T 4 -> P 2 T 5 -> P 2
5 May Example Cont.
5 May Another Algorithm Two copies of tasks allowed to start execution on different times Improves schedulability of tasks N identical processors and a scheduling processor are used Dynamic scheduling
5 May System Model A task is scheduled if Previously scheduled and the arrived task meet their deadlines Otherwise Task is rejected because its deadline is not met despite of a fault
5 May Techniques Used Backup copies are activated only when a fault occurs on the processor executing the primary copy Backup Overloading overlaping multiple slots for backups Backup De-allocation Release the slot for a backup copy when its primary copy is completed successfully
5 May Backup Overloading
5 May Backup Deallocation
5 May Proposed Technique The primary copy and backup copy are scheduled and executed in parallel The backup copy is divided into preceding part executed together with primary copy (redundant part) remaining part executed after the primary copy is completed (backup part) Backup overloading and backup deallocation is used
5 May Proposed Technique (Cont.)
5 May Scheduling Algorithm Schedule primary copy try to find a free slot between arrival time and deadline time Schedule backup copy schedule both redundant and backup parts
5 May System Overwiev
5 May Experiments Basic parameters used in experiments system load number of processors and tasks used computation time window size Analysing results rejection rate
5 May Experimental Results
5 May Thank You ANY QUESTIONS?