Download presentation
Presentation is loading. Please wait.
1
CENG334 Introduction to Operating Systems
18/02/08 CENG334 Introduction to Operating Systems Deadlocks Topics: Dining philosopher problem Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:
2
18/02/08 What’s a deadlock?
3
Deadlock A deadlock happens when
18/02/08 Deadlock A deadlock happens when Two (or more) threads waiting for each other None of the deadlocked threads ever make progress Mutex 1 holds Thread 1 waits for waits for Thread 2 Mutex 2 holds Adapted from Matt Welsh’s (Harvard University) slides.
4
Deadlock Definition Two kinds of resources:
18/02/08 Deadlock Definition Two kinds of resources: Preemptible: Can take away from a thread e.g., the CPU Non-preemptible: Can't take away from a thread e.g., mutex, lock, virtual memory region, etc. Why isn't it safe to forcibly take a lock away from a thread? Starvation A thread never makes progress because other threads are using a resource it needs Deadlock A circular waiting for resources Thread A waits for Thread B Thread B waits for Thread A Starvation ≠ Deadlock Adapted from Matt Welsh’s (Harvard University) slides.
5
Dining Philosophers Classic deadlock problem
18/02/08 Dining Philosophers Classic deadlock problem Multiple philosophers trying to lunch One chopstick to left and right of each philosopher Each one needs two chopsticks to eat Adapted from Matt Welsh’s (Harvard University) slides.
6
18/02/08 Dining Philosophers What happens if everyone grabs the chopstick to their right? Everyone gets one chopstick and waits forever for the one on the left All of the philosophers starve!!! Adapted from Matt Welsh’s (Harvard University) slides.
7
Deadlock Characterization
18/02/08 Deadlock Characterization Deadlock can arise if four conditions hold simultaneously. Mutual exclusion: only one process at a time can use a resource. Hold and wait: a process holding at least one resource is waiting to acquire additional resources held by other processes. No preemption: a resource can be released only voluntarily by the process holding it, after that process has completed its task. Circular wait: there exists a set {P0, P1, …, P0} of waiting processes such that P0 is waiting for a resource that is held by P1, P1 is waiting for a resource that is held by P2, …, Pn–1 is waiting for a resource that is held by Pn, and P0 is waiting for a resource that is held by P0. Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
8
18/02/08 Deadlock Prevention Restrain the ways request can be made to ensure that at least one of the four conditions DO NOT HOLD! Mutual Exclusion not required for sharable resources; must hold for non-sharable resources, such as a printer. Hold and Wait must guarantee that whenever a process requests a resource, it does not hold any other resources. Require process to request and be allocated all its resources before it begins execution, or allow process to request resources only when the process has none. low resource utilization; starvation possible.
9
Deadlock Prevention (Cont.)
18/02/08 Deadlock Prevention (Cont.) No Preemption If a process that is holding some resources requests another resource that cannot be immediately allocated to it, then all resources currently being held are released. Preempted resources are added to the list of resources for which the process is waiting. Process will be restarted only when it can regain its old resources, as well as the new ones that it is requesting. Can be applied to resources whose state can be saved such as CPU, and memory. Not applicable to resources such as printer and tape drives. Circular Wait impose a total ordering of all resource types, and require that each process requests resources in an increasing order of enumeration.
10
Circular Wait - 1 Each resource is given an ordering:
18/02/08 Circular Wait - 1 Each resource is given an ordering: F(tape drive) = 1 F(disk drive) = 2 F(printer) = 3 F(mutex1) = 4 F(mutex2) = 5 ……. Each process can request resources only in increasing order of enumeration. A process which decides to request an instance of Rj should first release all of its resources that are F(Ri) >= F(Rj).
11
18/02/08 Circular Wait - 2 For instance an application program may use ordering among all of its synchronization primitives: F(semaphore1) = 1 F(semaphore2) = 2 F(semaphore3) = 3 ……. After this, all requests to synchronization primitives should be made only in the increasing order: Correct use: down(semaphore1); down(semaphore2); Incorrect use: down(semaphore3); Keep in mind that it’s the application programmer’s responsibility to obey this order.
12
Methods for Handling Deadlocks
18/02/08 Methods for Handling Deadlocks How should we handle deadlocks Ensure that the system will never enter a deadlock state. Allow the system to enter a deadlock state and then recover. Ignore the problem and pretend that deadlocks never occur in the system; used by most operating systems, including UNIX. Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
13
Dining Philosophers How do we solve this problem??
18/02/08 Dining Philosophers How do we solve this problem?? (Apart from letting them eat with forks.) Adapted from Matt Welsh’s (Harvard University) slides.
14
How to solve this problem?
18/02/08 How to solve this problem? Solution 1: Don't wait for chopsticks Grab the chopstick on your right Try to grab chopstick on your left If you can't grab it, put the other one back down Breaks “no preemption” condition – no waiting! Solution 2: Grab both chopsticks at once Requires some kind of extra synchronization to make it atomic Breaks “multiple independent requests” condition! Solution 3: Grab chopsticks in a globally defined order Number chopsticks 0, 1, 2, 3, 4 Grab lower-numbered chopstick first Means one person grabs left hand rather than right hand first! Breaks “circular dependency” condition Solution 4: Detect the deadlock condition and break out of it Scan the waiting graph and look for cycles Shoot one of the threads to break the cycle Adapted from Matt Welsh’s (Harvard University) slides.
15
18/02/08 Deadlock Avoidance Requires that the system has some additional a priori information available. Simplest and most useful model requires that each process declare the maximum number of resources of each type that it may need. Is this possible at all? The deadlock-avoidance algorithm dynamically examines the resource-allocation state to ensure that there can never be a circular-wait condition. When should the algorithm be called? Resource-allocation state is defined by the number of available and allocated resources, and the maximum demands of the processes.
16
System Model Resource types R1, R2, . . ., Rm
18/02/08 System Model Resource types R1, R2, . . ., Rm CPU, memory, I/O devices disk network Each resource type Ri has Wi instances. For instance a quad-core processor has 4 CPUs Each process utilizes a resource as follows: request use release Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
17
Resource-Allocation Graph
18/02/08 Resource-Allocation Graph A set of vertices V and a set of edges E. V is partitioned into two types: P = {P1, P2, …, Pn}, the set consisting of all the processes in the system. R = {R1, R2, …, Rm}, the set consisting of all resource types in the system. request edge – directed edge P1 Rj assignment edge – directed edge Rj Pi Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
18
Resource Allocation Graph With A Deadlock
18/02/08 Resource Allocation Graph With A Deadlock If there is a deadlock => there is a cycle in the graph. However the reverse is not true! i.e. If there is a cycle in the graph =/> there is a deadlock Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
19
Resource Allocation Graph With A Cycle But No Deadlock
18/02/08 Resource Allocation Graph With A Cycle But No Deadlock However the existence of a cycle in the graph does not necessarily imply a deadlock. Overall message: If graph contains no cycles no deadlock. If graph contains a cycle if only one instance per resource type, then deadlock. if several instances per resource type, possibility of deadlock. Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
20
Resource-Allocation Graph Algorithm
18/02/08 Resource-Allocation Graph Algorithm Claim edge Pi Rj indicated that process Pj may request resource Rj; represented by a dashed line. Claim edge converts to request edge when a process requests a resource. When a resource is released by a process, assignment edge reconverts to a claim edge. Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
21
Resource-Allocation Graph Algorithm
18/02/08 Resource-Allocation Graph Algorithm Claim edge Pi Rj indicated that process Pj may request resource Rj; represented by a dashed line. Claim edge converts to request edge when a process requests a resource. When a resource is released by a process, assignment edge reconverts to a claim edge. Resources must be claimed a priori in the system. Note that the cycle detection algorithm does not work with resources that have multiple instances. Cycle => Unsafe Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
22
Safe, unsafe and deadlock states
18/02/08 Safe, unsafe and deadlock states If a system is in safe state no deadlocks. If a system is in unsafe state possibility of deadlock. Avoidance ensure that a system will never enter an unsafe state. Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
23
18/02/08 Safe State When a process requests an available resource, system must decide if immediate allocation leaves the system in a safe state. System is in safe state if there exists a safe sequence of all processes. Sequence <P1, P2, …, Pn> is safe if for each Pi, the resources that Pi can still request can be satisfied by currently available resources + resources held by all the Pj, with j < i. If Pi resource needs are not immediately available, then Pi can wait until all Pj have finished. When Pj is finished, Pi can obtain needed resources, execute, return allocated resources, and terminate. When Pi terminates, Pi+1 can obtain its needed resources, and so on. Adapted from Operating System Concepts (Silberschatz, Galvin, Gagne) slides.
24
18/02/08 Banker’s Algorithm While giving credits, a banker should ensure that it never allocates all of its cash in such a way that none of its creditors can finish their work and pay back the loan.
25
Example The system has three processes and 12 tape drives.
18/02/08 Example The system has three processes and 12 tape drives. 2 9 P2 4 P1 5 10 P0 Current Needs Maximum Needs t=t0 The system at t0 is safe since the sequence <P1,P0,P2> exists.
26
Example The system has three processes and 12 tape drives.
18/02/08 Example The system has three processes and 12 tape drives. 2 9 P2 4 P1 5 10 P0 Current Needs Maximum Needs t=t0 P2 requests one more drive 3 9 P2 2 4 P1 5 10 P0 Current Needs Maximum Needs t=t1 The system at t1 is no longer safe since P1 requests 2 more tape drives, finishes and releases 4 drives. However 4 drives are not sufficient for P0 or P2 complete its operation and would result in a deadlock.
27
CENG334 Introduction to Operating Systems
18/02/08 CENG334 Introduction to Operating Systems Real-world cases Topics: Race conditions Priority Inversion Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:
28
Therac-25 Computer-controlled radiation therapy machine
18/02/08 Therac-25 Computer-controlled radiation therapy machine In operation between 1983 and 1987, 11 installations Adapted from Matt Welsh’s (Harvard University) slides.
29
Therac-25 Capable of delivering electron and photon (X-Ray) treatments
18/02/08 Therac-25 Capable of delivering electron and photon (X-Ray) treatments Completely computer controlled No hardware interlocks to prevent misconfigurations or overdoses! All software written in PDP-11 assembly language Cryptic error messages delivered to operator console “Malfunction 23” No documentation of these error codes No indication of which errors are potentially life-threatening Lots of smoke and mirrors by the manufacturer Claimed that chance of delivering wrong dose to patient No justification for this claim in the safety analysis documents Adapted from Matt Welsh’s (Harvard University) slides.
30
Accidents On several occasions between June '85 and Jan '87
18/02/08 Accidents On several occasions between June '85 and Jan '87 Massive overdoses to six people Some of these were lethal Typical theraputic doses in the 200 rad range Several overdoses delivered energy of 15,000 – 20,000 rads Various lawsuits, all settled out of court Initially, manufacturer claimed that overdoses were impossible Adapted from Matt Welsh’s (Harvard University) slides.
31
18/02/08 The problem Therac-25 operator console layout. The lethal computer error occurs when the operator accidentally sets the field (here in red) to "X", notices her mistake, then changes it to "E". Adapted from Matt Welsh’s (Harvard University) slides.
32
18/02/08 Race Condition #1 After some trial and error, it was discovered that overdose could be caused by operator editing the dosage on the console too quickly Operator would enter dosage on console Move cursor to bottom of screen, then move cursor back up to edit dosage “Treat” task Periodically checks “entry done” flag If flag is set, call subroutine to configure the magnets Configuring magnets takes about 8 sec “Magnet” task Called periodically to check if magnets are ready Checks if edits have been made to dosage If so, exits back to calling subroutine to restart the process Critical bug: Only checks if edits made on the first call! How this led to overdose: Operator enters dosage: Triggers magnet setting routine Operator edits dosage while the magnets are being configured Magnet routine does not notice edits have been made after first call Adapted from Matt Welsh’s (Harvard University) slides.
33
Race Condition #2 Second bug – totally different causes from the first
18/02/08 Race Condition #2 Second bug – totally different causes from the first THERAC-25 has a “turntable” aperature that moves certain elements into the path of the beam Field light mode used to position beam on patient No electron beam expected, instead, a light simulates the beam position Problem: Unfiltered beam exposed to patients on several occasions! Beam Computer controls position of turntable X-Ray field flattner Electron scan magnet Field light position (no electron beam) Adapted from Matt Welsh’s (Harvard University) slides.
34
Race Condition #2 1) Prescription entered on console
18/02/08 Race Condition #2 1) Prescription entered on console 2) Operator must press “set” button to configure turntable 3) “Set up test” task runs periodically to check position of turntable Increments a variable “Class3” on each iteration If “Class3 == 0”, everything is ready and the dosage can begin Otherwise, a series of interlock checks are performed to ensure turntable in the correct position These checks will set Class3 to 0 when they are complete Can you spot the bug? Adapted from Matt Welsh’s (Harvard University) slides.
35
Race Condition #2 The bug: “Class3” variable is 8 bits wide
18/02/08 Race Condition #2 The bug: “Class3” variable is 8 bits wide After 256 iterations of “set up test” routine, overflows and becomes zero! So, interlocking checks will not be performed Operator must press “set” button during the short interval that Class3 overflows Fix: Set “Class3” to some nonzero value, rather than incrementing it Why was this done? Probably because “inc” instruction was easy enough... Adapted from Matt Welsh’s (Harvard University) slides.
36
18/02/08 Mars Pathfinder July 4, 1997 landing on Martian surface, followed by expeditions by Sojourner rover Series of software glitches started a few days after landing Eventually debugged and patched remotely from Earth! Read the full story at: Adapted from Matt Welsh’s (Harvard University) slides.
37
VxWorks Operating System
18/02/08 VxWorks Operating System Developed by Wind River Systems – premier real time OS Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor Weather Data Thread Communication Thread Information Bus Thread Obtain mutex; write data Wait for mutex to read data Information Bus Mutex Adapted from Matt Welsh’s (Harvard University) slides.
38
VxWorks Operating System
18/02/08 VxWorks Operating System Developed by Wind River Systems – premier real time OS Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor Weather Data Thread Communication Thread Information Bus Thread Free mutex Information Bus Mutex Adapted from Matt Welsh’s (Harvard University) slides.
39
VxWorks Operating System
18/02/08 VxWorks Operating System Developed by Wind River Systems – premier real time OS Multiple tasks, each with an associated priority Higher priority tasks get to run before lower-priority tasks Information bus – shared memory area used by various tasks Thread must obtain mutex to write data to the info bus – a monitor Weather Data Thread Communication Thread Information Bus Thread Lock mutex and read data Information Bus Mutex Adapted from Matt Welsh’s (Harvard University) slides.
40
18/02/08 Priority Inversion What happens when threads have different priorities? Low priority Med Priority High priority Weather Data Thread Communication Thread Information Bus Thread Information Bus Mutex Adapted from Matt Welsh’s (Harvard University) slides.
41
Priority Inversion Interrupt!
18/02/08 Priority Inversion What happens when threads have different priorities? Interrupt! Schedule comm thread ... long running operation Low priority Med Priority High priority Weather Data Thread Communication Thread Information Bus Thread Information Bus Mutex Adapted from Matt Welsh’s (Harvard University) slides.
42
18/02/08 Priority Inversion What happens when threads have different priorities? Comm thread runs for a long time Comm thread has higher priority than weather data thread But ... the high priority info bus thread is stuck waiting! This is called priority inversion Low priority Med Priority High priority Weather Data Thread Communication Thread Information Bus Thread Mutex Information Bus Adapted from Matt Welsh’s (Harvard University) slides.
43
What is the fix? Problem with priority inversion:
18/02/08 What is the fix? Problem with priority inversion: A high priority thread is stuck waiting for a low priority thread to finish its work In this case, the (medium priority) thread was holding up the low-prio thread General solution: Priority inheritance If waiting for a low priority thread, allow that thread to inherit the higher priority High priority thread “donates” its priority to the low priority thread Why does this fix the problem? Medium priority comm task cannot preempt weather task Weather task inherits high priority while it is being waited on Adapted from Matt Welsh’s (Harvard University) slides.
44
How was this problem fixed?
18/02/08 How was this problem fixed? JPL had a replica of the Pathfinder system on the ground Special tracing mode maintrains logs of all interesting system events e.g., context switches, mutex lock/unlock, interrupts After much testing were able to replicate the problem in the lab VxWorks mutex objects have an optional priority inheritance flag Engineers were able to upload a patch to set this flag on the info bus mutex After the fix, no more system resets occurred Lessons: Automatically reset system to “known good” state if things run amuck Far better than hanging or crashing Ability to trace execution of complex multithreaded code is useful Think through all possible thread interactions carefully!! Adapted from Matt Welsh’s (Harvard University) slides.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.