Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001.

Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001

7 March 2001CS-551, Lecture 62 CS551: Lecture 6 n Topics – Distributed Process Management (Chapter 7) n Distributed Scheduling Algorithm Choices n Scheduling Algorithm Approaches n Coordinator Elections n Orphan Processes – Distributed File Systems (Chapter 8) n Distributed Name Service n Distributed File Service n Distributed Directory Service

7 March 2001CS-551, Lecture 63 Distributed Deadlock Prevention n Assign each process a global timestamp when it starts n No two processes should have same timestamp n Basic idea: “When one process is about to block waiting for a resource that another process is using, a check is made to see which has a larger timestamp (i.e. is younger).” Tanenbaum, DOS (1995)

7 March 2001CS-551, Lecture 64 Distributed Deadlock Prevention n Somehow put timestamps on each process, representing creation time of process n Suppose a process needs a resource already owned by another process n Determine relative ages of both processes Decide if waiting process should Preempt, Wait, Die, or Wound owning process n Two different algorithms

7 March 2001CS-551, Lecture 65 Distributed Deadlock Prevention n Allow wait only if waiting process is older – Since timestamps increase in any chain of waiting processes, cycles are impossible n Or allow wait only if waiting process is younger – Here timestamps decrease in any chain of waiting process, so cycles are again impossible n Wiser to give older processes priority

7 March 2001CS-551, Lecture 66 Example: wait-die algorithm 5479 54 Waits Holds resourceWants resource Holds resource Dies

7 March 2001CS-551, Lecture 67 Example: wound-wait algorithm 5479 54 Preempts Holds resourceWants resource Holds resource Waits

7 March 2001CS-551, Lecture 68 Algorithm Comparison n Wait-die kills young process – When young process restarts and requests resource again, it is killed once more – Less efficient of these two algorithms n Wound-wait preempts young process – When young process re-requests resource, it has to wait for older process to finish – Better of the two algorithms

7 March 2001CS-551, Lecture 69 Figure 7.7 The Bully Algorithm. (Galli, p. 169)

7 March 2001CS-551, Lecture 610 Process Management in a Distributed Environment n Processes in a Uniprocessor n Processes in a Multiprocessor n Processes in a Distributed System – Why need to schedule – Scheduling priorities – How to schedule – Scheduling algorithms

7 March 2001CS-551, Lecture 611 Distributed Scheduling n Basically resource management n Want to distribute processing load among the processing elements in order to maximize performance n Consider having several homogeneous processing elements on a LAN with equal average workloads – Workload may still not be evenly distributed – Some PEs may have idle cycles

7 March 2001CS-551, Lecture 612 Efficiency Metrics n Communication cost – Low if very little or no communication required – Low if all communicating processes n on same PE n not distant (small number of hops) n Execution cost – Relative speed of PE – Relative location of needed resources – Type of n operating system n machine code n architecture

7 March 2001CS-551, Lecture 613 Efficiency Metrics, continued n Resource Utilization – May be based upon n Current PE loads n Load status state n Resource queue lengths n Memory usage n Other resource availability

7 March 2001CS-551, Lecture 614 Level of Scheduling n When to run process locally or to send it to an idle PE? n Local Scheduling – Allocate process to local PE – Review Galli, Chapter 2, for more information n Global Scheduling – Choose which PE executes which process – Also called process allocation – Precedes local scheduling decision

7 March 2001CS-551, Lecture 615 Figure 7.1 Scheduling Decision Chart. (Galli,p.152)

7 March 2001CS-551, Lecture 616 Distribution Goals n Load Balancing – Tries to maintain an equal load throughout system n Load Sharing – Simpler – Tries to prevent any PE from becoming too busy

7 March 2001CS-551, Lecture 617 Load Balancing / Load Sharing n Load Balancing – Try to equalize loads at PEs – Requires more information – More overhead n Load Sharing – Avoid having an idle PE if there is work to do n Anticipating Transfers – Avoid PE idle wait while a task is coming – Get a new task just before PE becomes idle

7 March 2001CS-551, Lecture 618 Figure 7.2 Load Distribution Goals. (Galli,p.153)

7 March 2001CS-551, Lecture 619 Processor Allocation Algorithms n Assume virtually identical PEs n Assume PEs fully interconnected n Assume processes may spawn children n Two strategies – Non-migratory n static binding n non-preemptive – Migratory n dynamic binding n preemptive

7 March 2001CS-551, Lecture 620 Processor Allocation Strategies Non-migratory (static binding, non- preemptive) – Transfer before process starts execution – Once assigned to a machine, process stays there Migratory (dynamic binding, preemptive) – Processes may move after execution begins – Better load balancing – Expensive: must collect and move entire state – More complex algorithms

7 March 2001CS-551, Lecture 621 Efficiency Goals n Optimal – Completion time – Resource Utilization – System Throughput – Any combination thereof n Suboptimal – Suboptimal Approximate – Suboptimal Heuristic

7 March 2001CS-551, Lecture 622 Optimal Scheduling Algorithms n Requires state of all competing processes n Scheduler must have access to all related information n Optimization is a hard problem – Usually NP-Hard for multiple processors n Thus, consider – Suboptimal Approximate solutions – Suboptimal Heuristic solutions

7 March 2001CS-551, Lecture 623 SubOptimal Approximate Solutions n Similar to Optimal Scheduling algorithms n Try to find good solutions, not perfect solutions n Searches are limited n Include intelligent shortcuts

7 March 2001CS-551, Lecture 624 SubOptimal Heuristic Solutions n Heuristics – Employ rules-of-thumb – Employ intuition – May not be provable n Generally considered to work in an acceptable manner n Examples: – If PE has heavy load, don’t give it more to do – Locality of reference for related processes, data

7 March 2001CS-551, Lecture 625 Figure 7.1 Scheduling Decision Chart. (Galli,p.152)

7 March 2001CS-551, Lecture 626 Types of Load Distribution Algs n Static – Decisions are hard-wired in n Dynamic – Use static information to make decisions – Overhead of keeping track of information n Adaptive – A type of dynamic algorithm – May work differently at different loads

7 March 2001CS-551, Lecture 627 Load Distribution Algorithm Issues n Transfer Policy n Selection Policy n Location Policy n Information Policy n Stability n Sender-initiated versus Receiver-Initiated n Symmetrically-Initiated n Adaptive Algorithms

7 March 2001CS-551, Lecture 628 Load Dist. Algs. Issues, cont. n Transfer Policy – When it is appropriate to move a task? – If load at sending PE > threshold – If load at receiving PE < threshold n Location Policy – Find a receiver PE – Methods: n Broadcast messages n Polling: random, neighbors, recent candidates

7 March 2001CS-551, Lecture 629 Load Dist. Algs. Issues, cont. n Selection Policy – Which task should migrate? – Simple n Select new tasks n Non-Preemptive – Criteria n Cost of transfer – should be covered by reduction in response time n Size of task n Number of dependent system calls (use local PE)

7 March 2001CS-551, Lecture 630 Load Dist. Algs. Issues, cont. n Information Policy – What information should be collected? n When? From whom? By whom? – Demand-driven n Get info when PE becomes sender or receiver n Sender-initiated – senders look for receivers n Receiver-initiated – receivers look for senders n Symmetrically-initiated – either of above – Periodic – at fixed time intervals, not adaptive – State-change-driven n Send info about node state (rather than solicit)

7 March 2001CS-551, Lecture 631 Load Dist. Algs. Issues, cont. n Stability – Queuing Theoretic n Stable: Sum(arrival load + overhead) < capacity n Effective: Using the algorithm gives better performance than not doing load distribution n An effective algorithm cannot be unstable n A stable algorithm can be ineffective (overhead) – Algorithmic Stability n E.g. Performing overhead operations, but making no forward progress n E.g. moving a task from PE to PE, only to learn that it increases the PE workload enough that it needs to be transferred again

7 March 2001CS-551, Lecture 632 Load Dist Algs Issues, concluded n Stability – Queuing Theoretic n Stable: Sum(arrival load + overhead) < capacity n Effective: Using the algorithm gives better performance than not doing load distribution n An effective algorithm cannot be unstable n A stable algorithm can be ineffective (overhead) – Algorithmic Stability n E.g. Performing overhead operations, but making no forward progress n E.g. moving a task from PE to PE, only to learn that it increases the PE workload enough that it needs to be transferred again

7 March 2001CS-551, Lecture 633 Load Dist Algs: Sender-Initiated n Sender PE thinks it is overloaded n Transfer Policy – Threshold (T) based on PE CPU queue length (QL) n Sender: QL > T n Receiver: QL < T n Selection Policy – Non-preemptive n Allows only new tasks n Long-lived tasks makes this policy worthwhile

7 March 2001CS-551, Lecture 634 Load Dist Algs: Sender-Initiated n Location (3 different policies) – Random n Select a receiver at random – Useless or wasted if destination is loaded n Want to avoid transferring the same task from PE to PE to PE – Include limit on number of transfers – Threshold n Start polling PEs at random – If ‘receiver’ found, send task to it – Limit search to ‘Poll-limit’ If limit hit, keep task on current PE

7 March 2001CS-551, Lecture 635 LDAs: Sender-Initiated n Location (3 different policies, cont.) – Shortest n Poll a random set of PEs – Choose PE with shortest queue length n Only a little better than Threshold Location Policy – Not worth the additional work

7 March 2001CS-551, Lecture 636 LDAs: Sender-Initiated n Information Policy – Demand-driven n After identifying a sender n Stability – At high load, PE might not find a receiver – Polling will be wasted – Polling increases the load on the system n Could lead to instability

7 March 2001CS-551, Lecture 637 LDAs: Receiver-Initiated n Receiver is trying to find work n Transfer Policy – If local QL < T, try to find a sender n Selection Policy – Non-preemptive n But there may not be any – Worth the effort

7 March 2001CS-551, Lecture 638 LDAs: Receiver-Initiated n Location Policy – Select PE at random – If taking a task does not move that PE’s load below threshold, take it – If no luck after trying the Poll Limit times, n Wait until another task completed n Wait another time period n Information Policy – Demand-driven

7 March 2001CS-551, Lecture 639 LDAs: Receiver-Initiated n Stability – Tends to be stable n At high load, a sender should be found n Problem – Transfers tend to be preemptive n Tasks on sender node have already started

7 March 2001CS-551, Lecture 640 LDAs: Symmetrically-Initiated n Both senders and receivers can search for tasks to transfer n Has both advantages and disadvantages of two previous methods n Above average algorithm – Try to keep load at each PE at acceptable level – Aiming for exact average can cause thrashing

7 March 2001CS-551, Lecture 641 LDAs: Symmetrically-Initiated n Transfer Policy – Each PE n Estimates the average load n Sets both an upper and a lower threshold – Equal distance from any estimate n If load > upper, PE acts as a sender n If load < lower, PE acts as a receiver

7 March 2001CS-551, Lecture 642 LDAs: Symmetrically-Initiated n Location Policy – Sender-initiated n Sender broadcasts a TooHigh message, sets timeout n Receiver sends Accept message, clears timeout, increases Load value, sets timeout n If sender still wants to send when Accept message comes, sends task n If sender gets TooLow message before Accept, sends task n If sender has TooHigh timeout with no Accept – Average estimate is too low – Broadcasts ChangeAvg message to all PEs

7 March 2001CS-551, Lecture 643 LDAs: Symmetrically-Initiated n Location Policy – Receiver-initiated n Receiver sends TooLow message, sets timeout n Rest is converse of sender-initiated algorithm n Selection Policy – Use a reasonable policy n Non-preemptive, if possible n Low cost

7 March 2001CS-551, Lecture 644 LDAs: Symmetrically-Initiated n Information Policy – Demand-driven – Determined at each PE – Low overhead

7 March 2001CS-551, Lecture 645 LDAs: Adaptive n Stable: Symmetrically-Initiated – Previous instability was due to too much polling by the sender – Each PE keeps lists of the other Pes sorted into three categories n Sender overloaded n Receiver overloaded n Okay – Each PE has all other Pes receiver list at start

7 March 2001CS-551, Lecture 646 LDAs: Adaptive n Transfer Policy – Based on PE CPU queue length – Low threshold (LT) and high threshold (HT) n Selection Policy – Sender-initiated: only sends new tasks – Receiver-initiated: takes any task n Trying for low cost n Information Policy – Demand-driven – maintains lists

7 March 2001CS-551, Lecture 647 LDAs: Adaptive n Location Policy – Receiver-initiated n Order of polling – Sender’s list – head to tail (new info first) – OK list – tail to head (out-of-date first) – Receiver list (tail to head) n When PE becomes receiver, QL < LT – Starts polling If it finds a sender, transfer happens Else use replies to update lists – Continues until It finds a sender It is no longer a receiver It hits the Poll Limit

7 March 2001CS-551, Lecture 648 LDAs: Adaptive n Notes – At high loads, activity is sender-initiated, but there sender will soon have an empty receiver list  no polling n So it will go to receiver-initiated – At low loads, receiver-initiated  failure n But overhead doesn’t matter at low load n And lists get updated n So sender-initiated should work quickly

7 March 2001CS-551, Lecture 649 Load Scheduling Algorithms (Galli) n Usage Points – Charged for using remote PEs, resources n Graph Theory – Minimum cutset of assignment graph – Maximum flow of graph n Probes – Messages to locate available, appropriate PEs n Scheduling Queues n Stochastic Learning

7 March 2001CS-551, Lecture 650 Figure 7.3 Usage Points. (Galli,p.158)

7 March 2001CS-551, Lecture 651 Figure 7.4 Economic Usage Points. (Galli, p.159)

7 March 2001CS-551, Lecture 652 Figure 7.5 Two-Processor Min-Cut Example. (Galli, p.161)

7 March 2001CS-551, Lecture 653 Figure 7.6 A Station with Run Queues and Hints. (Galli, p.164)

7 March 2001CS-551, Lecture 654 CPU Queue Length as Metric n PE queue length correlates well with response time – Easy to measure – Caution: n When accepting new migrating process, increment queue length right away n Perhaps time-out needed in case process never arrives n PE queue length does not correlate well with PE utilization – Daemon to monitor PE utilization: overhead

7 March 2001CS-551, Lecture 655 Election Algorithms n Bully algorithm (Garcia-Molina, 1982) n A Ring election algorithm

7 March 2001CS-551, Lecture 656 Bully Algorithm n Each processor has a unique number n One processor notices that the leader/server is missing – Sends messages to all other processes – Requests to be appointed leader – Includes his processor number n Processors with higher (lower) processor numbers can bully the first processor

7 March 2001CS-551, Lecture 657 Figure 7.7 The Bully Algorithm. (Galli, p. 169)

7 March 2001CS-551, Lecture 658 Bully Algorithm, continued n Initial processor need only send messages about election to higher/lower numbered processors n Any processors that respond effectively tell the first processor that they overrule him and that he is out of the running n These processors then start sending election messages to the other top processors

7 March 2001CS-551, Lecture 659 Bully Example 4 4 0 5 2 3 1 0 2 3 1 5 2 calls election 3, 4 respond

7 March 2001CS-551, Lecture 660 Bully Example, continued 4 4 0 5 2 3 1 0 2 3 1 5 3 calls election 4 calls election

7 March 2001CS-551, Lecture 661 Bully Example, concluded 4 4 0 5 2 3 1 0 2 3 1 5 4 responds to 3 4 is the new leader

7 March 2001CS-551, Lecture 662 A Ring Election Algorithm n No token n Each processor knows successor n When a processor notices leader is down, sends election message to successor n If successor is down, sends to next processor n Each sender adds own number to message

7 March 2001CS-551, Lecture 663 Ring Election Algorithm, cont. n First processor eventually receives back the election message containing his number n Election message is changed to coordinator message and resent around ring n The highest processor number in message becomes the new leader n When first processor receives the coordinator message, it is deleted

7 March 2001CS-551, Lecture 664 Ring Election Example 1 0 7 6 5 4 3 2 3 3,4,5,6,0,1,2 3,4,5,6,0,1 3,4,5,6,0 3,4,5,6 3,4,5 3,4

7 March 2001CS-551, Lecture 665 Orphan Processes n A child process that is still active after its parent process has terminated prematurely n Can happen with remote procedure calls n Wastes resources n Can corrupt shared data n Can create more processes n Three solutions follow

7 March 2001CS-551, Lecture 666 Orphan Cleanup n A process must clean up after itself after a crash – Requires each parent keep list of children – Parent thus has access to family tree – Must be kept in nonvolatile storage – On restart, each family tree member told of parent process’s death and halts execution n Disadvantage: parent overhead

7 March 2001CS-551, Lecture 667 Figure 7.8 Orphan Cleanup Family Trees. (Galli, p.170)

7 March 2001CS-551, Lecture 668 Child Process Allowance n All child processes receive a finite time allowance n If no time left, child must request more time from parent n If parent has terminated prematurely, child’s request goes unanswered n With no time allowance, child process dies n Requires more communication n Slows execution of child processes

7 March 2001CS-551, Lecture 669 Figure 7.9 Child Process Allowance. (Galli, p.172)

7 March 2001CS-551, Lecture 670 Process Version Numbers n Each process must keep track of a version number for its parent n After a system crash, the entire distributed system is assigned a new version number n Child forced to terminate if version number is out-of-date n Child may try to find parent – Terminates if unsuccessful n Requires a lot of communication

7 March 2001CS-551, Lecture 671 Figure 7.10 Process Version Numbers. (Galli, p.174)

Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001.

Similar presentations

Presentation on theme: "Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001.

Similar presentations

Presentation on theme: "Distributed Operating Systems CS551 Colorado State University at Lockheed-Martin Lecture 6 -- Spring 2001."— Presentation transcript:

Similar presentations

About project

Feedback