Download presentation
Presentation is loading. Please wait.
1
Mechanisms for Detecting and Handling Timing Errors
David B. Stewart Pradeep K. Khosla Shadab Ambat 4/5/2006
2
Contents Detection and Handling of Timing Errors Detection Mechanism
Handling Mechanism Timing Error Handling Policies Aperiodic Servers Soft Real-Time Threads Imprecise Computations Adaptive Real-Time Scheduling Summary Shadab Ambat 4/5/2006
3
Introduction “High-assurance software systems are often implemented with the dangerous assumption that timing errors will never occur.” One of the main factors that determine schedulability of a system is the Worst-case Execution Time (WCET) However following factors prevent accurate WCET measurements Interrupts with varying lengths and arrival rates, pre-empting highest priority tasks. Pipelining, caching and bus arbitration causing variations in processor speed No easy method of accurately measuring WCET in embedded code. Inaccurate WCETs cause undetected timing errors that cause failures. The authors have created low-overhead, policy-independent RTOS mechanisms that handle these timing errors They can also be extended to aperiodic servers, soft real-time threads, imprecise computations, and adaptive real-time scheduling. They are currently implemented in the Chimera RTOS Shadab Ambat 4/5/2006
4
Introduction (contd.) Although there has been research in developing scheduling algorithms that can be used to guarantee timing errors will not occur, there is no research to their actual detection and handling. The problem arises in hard real-time systems. Due to a heavy dependence on accurate WCETs and on the guarantee that the timing requirement is always met, missed deadlines might go undetected. Real-Time Euclid and RTC++ have the timing error handling mechanisms as part of the language. However they still depend on the kernel to actually report the error. The current mechanism can be used to do this. Flex (an extension of C++) does implement a detection and handling mechanism independent of the OS, but at the cost of significant overhead. Typical C++ code is 2-10 times slower Shadab Ambat 4/5/2006
5
Detection and Handling of Timing Errors
Shown below is Chimera’s C framework for a periodic thread. pause(float restart, float exectime, float deadline) is a system call that programs one of Chimera’s virtual timers to start the next thread cycle at restart (seconds). pause() is policy independent so it can be used in several static and dynamic scheduling algorithms. Shadab Ambat 4/5/2006
6
Detection Mechanism Both the kernel and the scheduler can detect timing errors. The following describes this method for the Chimera READY queue – contains currently running thread BLOCKED queue – threads waiting for a semaphore or message PAUSE queue – threads waiting next restart. Still not assigned a deadline The scheduler programs the earliest deadline into a particular microkernel virtual timer. When this time is reached, the kernel calls the corresponding scheduler policy to determine the faulty thread and calls the respective handler The kernel monitors thread maximum execution times (ETs) in a similar manner by checking it against a software down-counter. If the task is blocked or preempted, the ET is saved and restored later. A thread cycle completes with a call to pause(). Time values are renewed and scheduler checks for the next earliest deadline, or allows the task to continue running (non-EDF) Shadab Ambat 4/5/2006
7
Handling Mechanism A timing error detection triggers a user-defined timing failure handler (TFH). This can be used to abort, restart, force a miss in the period, gracefully shutdown, sound an alarm, or return an approximate value (interactive algorithms) for a thread. The TFH must be re-entrant. As C does not have the necessary provisions, the code is written in assembly. In Chimera the TFH is in assembly and it in turn calls a C subroutine through a look-up table tfhInstall(funcptr handler,int hpriority) is used to install a TFH hpriority can be used to modify priorities of critical sections of a task (distinguish between hard and soft deadlines) The following figure shows an example TFH implementation DEADLINE and MAXEXEC are failure types respectively leading to a restart or a graceful shutdown. Shadab Ambat 4/5/2006
8
Handling Mechanism (contd.)
Shadab Ambat 4/5/2006
9
Handling Mechanism (contd.)
Interrupt handlers (TFHs) do not have access to all of a failing threads data as they run in the kernel. In Chimera, kernel modifies the stack and program counter (PC) of the stored threads The current PC is added to the stack, and modified to point to the TFH and its priority is modified to hpriority, it is then scheduled like a thread. Thread contexts can be modified directly except for running threads which need to be switched out first. After a TFH a thread can Restart –setjmp() , longjmp()) Continue – return() Exit Provisions present for critical sections between the thread and TFH along with priority inheritance The latency to call a TFH is the length of one context switch, plus 12 μsec (on a 25MHz MC68030) for updating the stored context and executing the generic assembly language framework (for the TFH). Above latency only in case of a timing error Actual monitoring (of deadlines and ETs) using the counters takes up less than 1 µsec Shadab Ambat 4/5/2006
10
Timing Error Handling Policies
System should be designed to expect timing errors and handle them accordingly instead of optimistically assuming they will not occur. Mechanism can be extended to advanced scheduling procedures (aperiodic servers, adaptive threads etc.) and are compatible with RM, DM and maximum-urgency-first (MUF) (static priority) and with EDF (dynamic priority) Aperiodic Servers The Real-Time Mach OS required modifications in its scheduler to implement the deferrable and sporadic servers. The above-mentioned mechanisms will however remove the need for modifications as long as the system supports them Deferrable Server (DS): A periodic thread with the highest priority Ph, period Tds, and max. capacity Cds (budget) is created. Cds = Uds/Tds is the max. CPU time the server can use in one period. If more time is required, the priority is lowered proportionately. Uds – server size is the max. exec. time/cycle server can use without making the periodic tasks lose deadlines An example DS outline using a TFH is shown below Shadab Ambat 4/5/2006
11
Aperiodic Servers (contd.)
Shadab Ambat 4/5/2006
12
Aperiodic Servers (contd.)
TFH dshandler() is called when Cds has expired or when the server needs to be replenished If the DS has used up its entire allotted slot, it indicates an error (MAXEXEC) and the TFH reduces its priority from Ph to Pl – below the critical periodic task set When the DS deadline arrives (DEADLINE error) the TFH sets the new deadline, resets the priority to Ph, and capacity to Cds In case of MUF the criticality (hybrid priority) is used instead of the priority Sporadic Server (SS): It is similar to the DS except in its replenishment mechanism which consists of additional specs like execution start times, amount of execution used. A Chimera implemented SS is shown below. Initially, max. execution time = Css, deadline = ∞ Replenishment occurs when server finishes its execution time budget with the amount depending on the time taken to use it up – (execstart(), execend(), execleft()) Shadab Ambat 4/5/2006
13
Shadab Ambat 4/5/2006
14
Soft Real-Time Threads
Threads with soft deadlines are implemented in Chimera using TFHs similar to a DS When it exceeds its execution time, the thread priority is lowered below the critical task set for that cycle until a DEADLINE error occurs Imprecise Computations Threads using approximate computations have 2 parts – essential and optional. A missed deadline during an optional part causes the optional computations to be discarded and only ones from the essential part to be used. Adaptive Real-Time Scheduling Streich developed an adaptive scheduling policy TaskPair Scheduling for soft RT threads that is based on optimistic case execution times along with the WCET The thread ETs are constantly monitored and a TFH called when the optimistic ET is consumed This can be implemented in Chimera with the discussed mechanisms. Shadab Ambat 4/5/2006
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.