Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:

Similar presentations


Presentation on theme: "1 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:"— Presentation transcript:

1 1 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL: http://kovan.ceng.metu.edu.tr/ceng334 Monitors, Condition variabless Topics: Monitors Condition Variables

2 2 Peterson’s Algorithm int flag[2] = {0, 0}; int turn; P0: do{ flag[0] = 1; turn = 1; while (flag[1] == 1 && turn == 1) { // busy wait } // critical section flag[0] = 0; //remainder section }while(1); P1: do{ flag[1] = 1; turn = 0; while (flag[0] == 1 && turn == 0) { // busy wait } // critical section flag[1] = 0; // remainder section } while(1); turn : indicates whose turn is it to enter critical section. If turn==i process Pi is allowed to get in. flag[2]: indicates if process Pi is ready to enter critical section. If flag[i]is set, then Pi is ready to enter critical section.

3 3 Peterson’s Algorithm int flag[2] = {0, 0}; int turn; P0: do{ flag[0] = 1; turn = 1; while (flag[1] == 1 && turn == 1) { // busy wait } // critical section flag[0] = 0; //remainder section }while(1); P1: do{ flag[1] = 1; turn = 0; while (flag[0] == 1 && turn == 0) { // busy wait } // critical section flag[1] = 0; // remainder section } while(1); Mutual Exclusion: Only one process Pi (the one which set turn=i last) enters the critical section.

4 4 Peterson’s Algorithm int flag[2] = {0, 0}; int turn; P0: do{ flag[0] = 1; turn = 1; while (flag[1] == 1 && turn == 1) { // busy wait } // critical section flag[0] = 0; //remainder section }while(1); P1: do{ flag[1] = 1; turn = 0; while (flag[0] == 1 && turn == 0) { // busy wait } // critical section flag[1] = 0; // remainder section } while(1); Progress: If process P1 is not in critical section then flag[1] = 0. Therefore while loop of P0 quits immediately and P0 can get into its critical section. And vice versa.. Bounded waiting: Process Pi keeps waiting in spinlocking only while the other process is in its critical section.

5 5 Peterson’s Algorithm int flag[2] = {0, 0}; int turn; P0: do{ flag[0] = 1; turn = 1; while (flag[1] == 1 && turn == 1) { // busy wait } // critical section flag[0] = 0; //remainder section }while(1); P1: do{ flag[1] = 1; turn = 0; while (flag[0] == 1 && turn == 0) { // busy wait } // critical section flag[1] = 0; // remainder section } while(1); Uses spinlocking for waiting. No strict alternation is required between processes. That is, P0,P0,P0,P1,P1 is doable. Requires that processes alternate between critical and remainder sections. Can be extended to n processes, only if n is known apriori (in advance). HOW?

6 6 Peterson’s Algorithm int flag[2] = {0, 0}; int turn; P0: do{ flag[0] = 1; turn = 1; while (flag[1] == 1 && turn == 1) { // busy wait } // critical section flag[0] = 0; //remainder section }while(1); P1: do{ flag[1] = 1; turn = 0; while (flag[0] == 1 && turn == 0) { // busy wait } // critical section flag[1] = 0; // remainder section } while(1); Prone to priority inversion: Assume that P0 has a higher priority than P1. When P1 is in its critical section, P0 may get scheduled to do spinlocking. P1 never gets scheduled to finish its critical section and both processes end up waiting.

7 7 Semaphore Implementation struct semaphore { int val; mutex mtx; // makes sure the down and ups are atomic threadlist L; // List of threads waiting for semaphore } down(semaphore S){ // Wait until > 0 then decrement while(1){ acquire(S.mtx); if (S.val <= 0) { add_this_thread(S.L); release_mutex_and)block (S.mtx); //this should be atomic }else{ S.val = S.val -1; release(S.mtx); break; } up(semaphore S){ // Increment value and wake up next thread acquire(S.mtx); S.val = S.val + 1; remote_one_thread_and_wakeup(S.L); release(S.mutex); } Adapted from Matt Welsh’s (Harvard University) slides.

8 8 Issues with Semaphores Much of the power of semaphores derives from calls to down() and up() that are unmatched See previous example! Unlike locks, acquire() and release() are not always paired. This means it is a lot easier to get into trouble with semaphores. “More rope” Would be nice if we had some clean, well-defined language support for synchronization... Java does! Adapted from Matt Welsh’s (Harvard University) slides.

9 9 Monitors A monitor is an object intended to be used safely by more than one thread. The defining characteristic of a monitor is that its methods are executed with mutual exclusion. That is, at each point in time, at most one thread may be executing any of its methods. also provide Condition Variables (CVs) for threads to temporarily give up exclusive access, in order to wait for some condition to be met, before regaining exclusive access and resuming their task. Use CVs for signaling other threads that such conditions have been met.

10 10 Condition Variables Conceptually a condition variable (CV) is a queue of threads, associated with a monitor, upon which a thread may wait for some assertion to become true. Threads can use CV’s to temporarily give up exclusive access, in order to wait for some condition to be met, before regaining exclusive access and resuming their task. for signaling other threads that such conditions have been met.

11 11 Monitors This style of using locks and CV's to protect access to a shared object is often called a monitor Think of a monitor as a lock protecting an object, plus a queue of waiting threads. Shared data Methods accessing shared data Waiting threads At most one thread in the monitor at a time How is this different than a lock??? Adapted from Matt Welsh’s (Harvard University) slides.

12 12 Monitors Shared data Methods accessing shared data unlocked Adapted from Matt Welsh’s (Harvard University) slides.

13 13 Monitors Shared data Methods accessing shared data locked zzzz... Sleeping thread no longer “in” the monitor. (But not on the waiting queue either! Why?) Adapted from Matt Welsh’s (Harvard University) slides.

14 14 Monitors Shared data Methods accessing shared data locked Monitor stays locked! (Lock now owned by different thread...) zzzz... notify() Adapted from Matt Welsh’s (Harvard University) slides.

15 15 Monitors Shared data Methods accessing shared data locked notify() Adapted from Matt Welsh’s (Harvard University) slides.

16 16 Monitors Shared data Methods accessing shared data locked No guarantee which order threads get into the monitor. (Not necessarily FIFO!) Adapted from Matt Welsh’s (Harvard University) slides.

17 17 Bank Example monitor Bank{ int TL = 1000; condition haveTL; void withdraw(int amount) { if (amount > TL) wait(haveTL); TL -= amount; } void deposit(int amount) { TL += amount; notify(haveTL); }

18 18 Bank Example monitor Bank{ int TL = 1000; condition haveTL; void withdraw(int amount) { while (amount > TL) wait(haveTL); TL -= amount; } void deposit(int amount) { TL += amount; notifyAll(haveTL); }

19 19 Hoare vs. Mesa Monitor Semantics The monitor notify() operation can have two different meanings: Hoare monitors (1974) notify(CV) means to run the waiting thread immediately Causes notifying thread to block Mesa monitors (Xerox PARC, 1980) notify(CV) puts waiting thread back onto the “ready queue” for the monitor But, notifying thread keeps running Adapted from Matt Welsh’s (Harvard University) slides.

20 20 Hoare vs. Mesa Monitor Semantics The monitor notify() operation can have two different meanings: Hoare monitors (1974) notify(CV) means to run the waiting thread immediately Causes notifying thread to block Mesa monitors (Xerox PARC, 1980) notify(CV) puts waiting thread back onto the “ready queue” for the monitor But, notifying thread keeps running What's the practical difference? In Hoare-style semantics, the “condition” that triggered the notify() will always be true when the awoken thread runs For example, that the buffer is now no longer empty In Mesa-style semantics, awoken thread has to recheck the condition Since another thread might have beaten it to the punch Adapted from Matt Welsh’s (Harvard University) slides.

21 21 Hoare Monitor Semantics Hoare monitors (1974) notify(CV) means to run the waiting thread immediately Causes notifying thread to block The signaling thread must wait outside the monitor (at least) until the signaled thread relinquishes occupancy of the monitor by either returning or by again waiting on a condition.

22 22 Mesa Monitor Semantics Mesa monitors (Xerox PARC, 1980) notify(CV) puts waiting thread back onto the “ready queue” for the monitor But, notifying thread keeps running Signaling does not cause the signaling thread to lose occupancy of the monitor. Instead the signaled threads are moved to the e queue.

23 23 Hoare vs. Mesa monitors Need to be careful about precise definition of signal and wait. while (n==0) { wait(not_empty); // If nothing, sleep } item = getItemFromArray(); // Get next item Why didn’t we do this? if (n==0) { wait(not_empty); // If nothing, sleep } removeItemFromArray(val);// Get next item Answer: depends on the type of scheduling Hoare-style (most textbooks): Signaler gives lock, CPU to waiter; waiter runs immediately Waiter gives up lock, processor back to signaler when it exits critical section or if it waits again Mesa-style (Java, most real operating systems): Signaler keeps lock and processor Waiter placed on ready queue with no special priority Practically, need to check condition again after wait

24 24 Revisit: Readers/Writers Problem Correctness Constraints: Readers can access database when no writers Writers can access database when no readers or writers Only one thread manipulates state variables at a time State variables (Protected by a lock called “lock”): int NReaders: Number of active readers; initially = 0 int WaitingReaders: Number of waiting readers; initially = 0 int NWriters: Number of active writers; initially = 0 int WaitingWriters: Number of waiting writers; initially = 0 Condition canRead = NIL Conditioin canWrite = NIL

25 25 Readers and Writers Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite; Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) Signal(CanRead); else Signal(CanWrite); } Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead); --WaitingReaders; } ++NReaders; Signal(CanRead); } Void EndRead() { if(--NReaders == 0) Signal(CanWrite); }

26 26 Readers and Writers Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite; Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) Signal(CanRead); else Signal(CanWrite); } Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead); --WaitingReaders; } ++NReaders; Signal(CanRead); } Void EndRead() { if(--NReaders == 0) Signal(CanWrite); }

27 27 Readers and Writers Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite; Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); } Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead); --WaitingReaders; } ++NReaders; Signal(CanRead); } Void EndRead() { if(--NReaders == 0) notify(CanWrite); }

28 28 Readers and Writers Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite; Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); } Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead); --WaitingReaders; } ++NReaders; notify(CanRead); } Void EndRead() { if(--NReaders == 0) notify(CanWrite); }

29 29 Understanding the Solution A writer can enter if there are no other active writers and no readers are waiting

30 30 Readers and Writers Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite; Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); } Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead); --WaitingReaders; } ++NReaders; notify(CanRead); } Void EndRead() { if(--NReaders == 0) notify(CanWrite); }

31 31 Understanding the Solution A reader can enter if There are no writers active or waiting So we can have many readers active all at once Otherwise, a reader waits (maybe many do)

32 32 Readers and Writers Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite; Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); } Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead); --WaitingReaders; } ++NReaders; notify(CanRead); } Void EndRead() { if(--NReaders == 0) notify(CanWrite); }

33 33 Understanding the Solution When a writer finishes, it checks to see if any readers are waiting If so, it lets one of them enter That one will let the next one enter, etc… Similarly, when a reader finishes, if it was the last reader, it lets a writer in (if any is there)

34 34 Readers and Writers Monitor ReadersNWriters { int WaitingWriters, WaitingReaders,NReaders, NWriters; Condition CanRead, CanWrite; Void BeginWrite() { if(NWriters == 1 || NReaders > 0) { ++WaitingWriters; wait(CanWrite); --WaitingWriters; } NWriters = 1; } Void EndWrite() { NWriters = 0; if(WaitingReaders) notify(CanRead); else notify(CanWrite); } Void BeginRead() { if(NWriters == 1 || WaitingWriters > 0) { ++WaitingReaders; Wait(CanRead); --WaitingReaders; } ++NReaders; notify(CanRead); } Void EndRead() { if(--NReaders == 0) notify(CanWrite); }

35 35 Understanding the Solution It wants to be fair If a writer is waiting, readers queue up If a reader (or another writer) is active or waiting, writers queue up … this is mostly fair, although once it lets a reader in, it lets ALL waiting readers in all at once, even if some showed up “after” other waiting writers

36 36 The Big Picture The point here is that getting synchronization right is hard How to pick between locks, semaphores, condvars, monitors??? Locks are very simple for many cases. Issues: Maybe not the most efficient solution For example, can't allow multiple readers but one writer inside a standard lock. Condition variables allow threads to sleep while holding a lock Just be sure you understand whether they use Mesa or Hoare semantics! Semaphores provide pretty general functionality But also make it really easy to botch things up. Adapted from Matt Welsh’s (Harvard University) slides.

37 37 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL: http://kovan.ceng.metu.edu.tr/ceng334 Synchronization patterns Topics Signalling Rendezvous Barrier

38 38 Signalling Possibly the simplest use for a semaphore is signaling, which means that one thread sends a signal to another thread to indicate that something has happened. Signaling makes it possible to guarantee that a section of code in one thread will run before a section of code in another thread; in other words, it solves the serialization problem. Adapted from The Little Book of Semaphores.

39 39 Signalling Imagine that a1 reads a line from a file, and b1 displays the line on the screen. The semaphore in this program guarantees that Thread A has completed a1 before Thread B begins b1. Here’s how it works: if thread B gets to the wait statement first, it will find the initial value, zero, and it will block. Then when Thread A signals, Thread B proceeds. Similarly, if Thread A gets to the signal first then the value of the semaphore will be incremented, and when Thread B gets to the wait, it will proceed immediately. Either way, the order of a1 and b1 is guaranteed. Thread A statement a1; sem.up(); Thread B sem.down(); statement b1; semaphore sem=0; Adapted from The Little Book of Semaphores.

40 40 Rendezvous Generalize the signal pattern so that it works both ways. Thread A has to wait for Thread B and vice versa. In other words, given this code we want to guarantee that a1 happens before b2 and b1 happens before a2. Your solution should not enforce too many constraints. For example, we don’t care about the order of a1 and b1. In your solution, either order should be possible. Two threads rendezvous at a point of execution, and neither is allowed to proceed until both have arrived. Thread A statement a1; statement a2; Thread B statement b1; statement b2; Adapted from The Little Book of Semaphores.

41 41 Rendezvous - Hint Generalize the signal pattern so that it works both ways. Thread A has to wait for Thread B and vice versa. In other words, given this code we want to guarantee that a1 happens before b2 and b1 happens before a2. Your solution should not enforce too many constraints. For example, we don’t care about the order of a1 and b1. In your solution, either order should be possible. Two threads rendezvous at a point of execution, and neither is allowed to proceed until both have arrived. Hint: Create two semaphores, named aArrived and bArrived, and initialize them both to zero. aArrived indicates whether Thread A has arrived at the rendezvous, and bArrived likewise. Thread A statement a1; statement a2; Thread B statement b1; statement b2; semaphore aArrived=0; semaphore bArrived=0; Adapted from The Little Book of Semaphores.

42 42 Rendezvous - Solution Generalize the signal pattern so that it works both ways. Thread A has to wait for Thread B and vice versa. In other words, given this code we want to guarantee that a1 happens before b2 and b1 happens before a2. Your solution should not enforce too many constraints. For example, we don’t care about the order of a1 and b1. In your solution, either order should be possible. Two threads rendezvous at a point of execution, and neither is allowed to proceed until both have arrived. Hint: Create two semaphores, named aArrived and bArrived, and initialize them both to zero. aArrived indicates whether Thread A has arrived at the rendezvous, and bArrived likewise. Thread A statement a1; aArrived.up(); bArrived.down(); statement a2; Thread B statement b1; bArrived.up(); aArrived.down(); statement b2; semaphore aArrived=0; semaphore bArrived=0; Adapted from The Little Book of Semaphores.

43 43 Rendezvous – A less efficient solution This solution also works, although it is probably less efficient, since it might have to switch between A and B one time more than necessary. If A arrives first, it waits for B. When B arrives, it wakes A and might proceed immediately to its wait in which case it blocks, allowing A to reach its signal, after which both threads can proceed.. Thread A statement a1 bArrived.down()‏ aArrived.up()‏ statement a2 Thread B statement b1; bArrived.up(); aArrived.down(); statement b2; semaphore aArrived=0; semaphore bArrived=0; Adapted from The Little Book of Semaphores.

44 44 Rendezvous – How about? Thread A statement a1 bArrived.down()‏ aArrived.up()‏ statement a2 Thread B statement b1; aArrived.down(); bArrived.up(); statement b2; semaphore aArrived=0; semaphore bArrived=0; Adapted from The Little Book of Semaphores.

45 45 Barrier rendezvous(); criticalpoint(); Rendezvous solution does not work with more than two threads. Puzzle: Generalize the rendezvous solution. Every thread should run the following code: The synchronization requirement is that no thread executes critical point until after all threads have executed rendezvous. You can assume that there are n threads and that this value is stored in a variable, n, that is accessible from all threads. When the first n − 1 threads arrive they should block until the nth thread arrives, at which point all the threads may proceed. Adapted from The Little Book of Semaphores.

46 46 Barrier - Hint n = thenumberofthreads; count = 0; Semaphore mutex=1, barrier=0; count keeps track of how many threads have arrived. mutex provides exclusive access to count so that threads can increment it safely. barrier is locked (zero or negative) until all threads arrive; then it should be unlocked (1 or more). Adapted from The Little Book of Semaphores.

47 47 Barrier – Solution? n = thenumberofthreads; count = 0; Semaphore mutex=1, barrier=0; rendezvous(); mutex.down(); count = count + 1; mutex.up(); if (count == n) barrier.up(); else barrier.down(); Criticalpoint(); Since count is protected by a mutex, it counts the number of threads that pass. The first n−1 threads wait when they get to the barrier, which is initially locked. When the nth thread arrives, it unlocks the barrier. What is wrong with this solution? Adapted from The Little Book of Semaphores.

48 48 Barrier – Solution? n = thenumberofthreads; count = 0; Semaphore mutex=1, barrier=0; rendezvous(); mutex.down(); count = count + 1; mutex.up(); if (count == n) barrier.up(); else barrier.down(); Criticalpoint(); Imagine that n = 5 and that 4 threads are waiting at the barrier. The value of the semaphore is the number of threads in queue, negated, which is -4. When the 5th thread signals the barrier, one of the waiting threads is allowed to proceed, and the semaphore is incremented to -3. But then no one signals the semaphore again and none of the other threads can pass the barrier. Adapted from The Little Book of Semaphores.

49 49 Barrier – Solution n = thenumberofthreads; count = 0; Semaphore mutex=1, barrier=0; rendezvous(); mutex.down(); count = count + 1; mutex.up(); if (count == n) barrier.up(); else{ barrier.down(); barrier.up(); } Criticalpoint(); The only change is another signal after waiting at the barrier. Now as each thread passes, it signals the semaphore so that the next thread can pass. Adapted from The Little Book of Semaphores.

50 50 Barrier – Bad Solution n = thenumberofthreads; count = 0; Semaphore mutex=1, barrier=0; rendezvous(); mutex.down(); count = count + 1; if (count == n) barrier.up(); barrier.down(); barrier.up(); mutex.up(); Criticalpoint(); Imagine that the first thread enters the mutex and then blocks. Since the mutex is locked, no other threads can enter, so the condition, count==n, will never be true and no one will ever unlock. Adapted from The Little Book of Semaphores.

51 51 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL: http://kovan.ceng.metu.edu.tr/ceng334 Real-world cases Topics: Race conditions Priority Inversion

52 52 Therac-25 Computer-controlled radiation therapy machine In operation between 1983 and 1987, 11 installations Adapted from Matt Welsh’s (Harvard University) slides.

53 53 Therac-25 Capable of delivering electron and photon (X-Ray) treatments Completely computer controlled No hardware interlocks to prevent misconfigurations or overdoses! All software written in PDP-11 assembly language Cryptic error messages delivered to operator console “Malfunction 23” No documentation of these error codes No indication of which errors are potentially life-threatening Lots of smoke and mirrors by the manufacturer Claimed that 10 -11 chance of delivering wrong dose to patient No justification for this claim in the safety analysis documents Adapted from Matt Welsh’s (Harvard University) slides.

54 54 Accidents On several occasions between June '85 and Jan '87 Massive overdoses to six people Some of these were lethal Typical theraputic doses in the 200 rad range Several overdoses delivered energy of 15,000 – 20,000 rads Various lawsuits, all settled out of court Initially, manufacturer claimed that overdoses were impossible Adapted from Matt Welsh’s (Harvard University) slides.

55 55 The problem Therac-25 operator console layout. The lethal computer error occurs when the operator accidentally sets the field (here in red) to "X", notices her mistake, then changes it to "E". Adapted from Matt Welsh’s (Harvard University) slides.

56 56 Race Condition #1 After some trial and error, it was discovered that overdose could be caused by operator editing the dosage on the console too quickly Operator would enter dosage on console Move cursor to bottom of screen, then move cursor back up to edit dosage “Treat” task Periodically checks “entry done” flag If flag is set, call subroutine to configure the magnets Configuring magnets takes about 8 sec “Magnet” task Called periodically to check if magnets are ready Checks if edits have been made to dosage If so, exits back to calling subroutine to restart the process Critical bug: Only checks if edits made on the first call! How this led to overdose: Operator enters dosage: Triggers magnet setting routine Operator edits dosage while the magnets are being configured Magnet routine does not notice edits have been made after first call Adapted from Matt Welsh’s (Harvard University) slides.

57 57 Race Condition #2 Second bug – totally different causes from the first THERAC-25 has a “turntable” aperature that moves certain elements into the path of the beam Field light mode used to position beam on patient No electron beam expected, instead, a light simulates the beam position Problem: Unfiltered beam exposed to patients on several occasions! Electron scan magnet Field light position (no electron beam) X-Ray field flattner Beam Computer controls position of turntable Adapted from Matt Welsh’s (Harvard University) slides.

58 58 Race Condition #2 1) Prescription entered on console 2) Operator must press “set” button to configure turntable 3) “Set up test” task runs periodically to check position of turntable Increments a variable “Class3” on each iteration If “Class3 == 0”, everything is ready and the dosage can begin Otherwise, a series of interlock checks are performed to ensure turntable in the correct position These checks will set Class3 to 0 when they are complete Can you spot the bug? Adapted from Matt Welsh’s (Harvard University) slides.

59 59 Race Condition #2 The bug: “Class3” variable is 8 bits wide After 256 iterations of “set up test” routine, overflows and becomes zero! So, interlocking checks will not be performed Operator must press “set” button during the short interval that Class3 overflows Fix: Set “Class3” to some nonzero value, rather than incrementing it Why was this done? Probably because “inc” instruction was easy enough... Adapted from Matt Welsh’s (Harvard University) slides.

60 60 Mars Pathfinder July 4, 1997 landing on Martian surface, followed by expeditions by Sojourner rover Series of software glitches started a few days after landing Eventually debugged and patched remotely from Earth! Read the full story at: http://www.ddj.com/184411097 Adapted from Matt Welsh’s (Harvard University) slides.

61 61 VxWorks Operating System Developed by Wind River Systems – premier real time OS Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor Information Bus Mutex Weather Data Thread Communication Thread Information Bus Thread Obtain mutex; write data Wait for mutex to read data Adapted from Matt Welsh’s (Harvard University) slides.

62 62 VxWorks Operating System Developed by Wind River Systems – premier real time OS Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor Information Bus Mutex Weather Data Thread Communication Thread Information Bus Thread Free mutex Adapted from Matt Welsh’s (Harvard University) slides.

63 63 VxWorks Operating System Developed by Wind River Systems – premier real time OS Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor Information Bus Mutex Weather Data Thread Communication Thread Information Bus Thread Lock mutex and read data Adapted from Matt Welsh’s (Harvard University) slides.

64 64 Priority Inversion What happens when threads have different priorities? Information Bus Mutex Weather Data Thread Communication Thread Information Bus Thread Low priorityMed PriorityHigh priority Adapted from Matt Welsh’s (Harvard University) slides.

65 65 Priority Inversion What happens when threads have different priorities? Information Bus Mutex Weather Data Thread Communication Thread Information Bus Thread Low priorityMed PriorityHigh priority Interrupt! Schedule comm thread... long running operation Adapted from Matt Welsh’s (Harvard University) slides.

66 66 Priority Inversion What happens when threads have different priorities? Comm thread runs for a long time Comm thread has higher priority than weather data thread But... the high priority info bus thread is stuck waiting! This is called priority inversion Information Bus Mutex Weather Data Thread Communication Thread Information Bus Thread Low priorityMed PriorityHigh priority Adapted from Matt Welsh’s (Harvard University) slides.

67 67 What is the fix? Problem with priority inversion: A high priority thread is stuck waiting for a low priority thread to finish its work In this case, the (medium priority) thread was holding up the low-prio thread General solution: Priority inheritance If waiting for a low priority thread, allow that thread to inherit the higher priority High priority thread “donates” its priority to the low priority thread Why does this fix the problem? Medium priority comm task cannot preempt weather task Weather task inherits high priority while it is being waited on Adapted from Matt Welsh’s (Harvard University) slides.

68 68 How was this problem fixed? JPL had a replica of the Pathfinder system on the ground Special tracing mode maintrains logs of all interesting system events e.g., context switches, mutex lock/unlock, interrupts After much testing were able to replicate the problem in the lab VxWorks mutex objects have an optional priority inheritance flag Engineers were able to upload a patch to set this flag on the info bus mutex After the fix, no more system resets occurred Lessons: Automatically reset system to “known good” state if things run amuck Far better than hanging or crashing Ability to trace execution of complex multithreaded code is useful Think through all possible thread interactions carefully!! Adapted from Matt Welsh’s (Harvard University) slides.


Download ppt "1 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:"

Similar presentations


Ads by Google