Distributed Process Coordination Presentation 1 - Sept. 14th 2002 CSE Spring 02 Group A4:Chris Sun, Min Fang, Bryan Maden
Introduction Coordination requires either global time or global ordering Can be accomplished through: hardware, software or combination of both Can have a centralized coordination or distributed coordination Used to solve Critical Section Problems
Introduction Cont. Solutions: Mutual Exclusion Centralized Distributed Token Ring Clocks Lamports Timestamp Cristian’s Algorithm Berkeley Algorithm Coordinators Election Algorithm Bully Algorithm
Mutual Exclusion Centralized One Process is chosen be the coordinator. Any process wanting to enter Critical Section (CS) sends message to coordinator. Coordinator chooses who enters CS. Sends reply message to allow process to enter CS. When process is finished in CS, sends ‘finished’ message to coordinator. Coordinator chooses next CS recipient.
Mutual Exclusion Distributed Process P i wants to enter CS. Generates timestamp TS and sends message request(Pi, TS) to all processes in system (including self) When Process P j receives a request message, it may reply immediately with a reply message, or it may defer sending the reply message. When Process P i receives a reply message from all other processes, it enters the CS.
Mutual Exclusion Distributed cont. Process P j decides when to send reply message to process P i based on three factors 1) If process P j is in its critical section, then it defers it reply to P i 2) If process P j does not want to enter its critical section, then it sends a reply immediately to P i 3) If process P j is waiting to enter its critical section, P j compares its own request message timestamp with the timestamp TS of the incoming request message. If its own request message timestamp is greater than TS, process P j sends a reply immediately, otherwise the reply is deferred.
Mutual Exclusion Distributed cont. This algorithm provides Mutual Exclusion Deadlock Prevention Starvation Prevention All Process in the system must Know all the other processes Receive new process information
Token Passing A token is a special type of message that is passed around the system. Possession of the token allows the process to enter the CS. When a process receives the token it can hold the token and enter the CS, or it may pass the token giving up the right to enter the CS. If the token is lost an election is held to create a new one. If a process dies, the ring is broken and a new one must be created. Starvation and deadlock are avoided with uni-directional ring and one token.
Logical Clocks
Logical Clock and Lamport Timestamp Logical clock –Order of events matters more than absolute time –E.g.) UNIX make: input.c input.o Lamport timestamp –Synchronize logical clocks Happens-before relation –A -> B : A happens before B –Two cases which determine “happens-before” –A and B are in same process, and A occurs before B: a -> b –A is send-event of message M, and B is receive-event of same message M Transitive relation –If A -> B and B -> C, then A-> C Concurrent events –Neither A -> B nor B -> A is true
Lamport Algorithm Assign time value C(A) such that –If a happens before b in the same process, C(a) < C(b) –If a and b represent the sending and receiving of a message, C(a) < C(b) Lamport Algorithm –Each process increments local clock between any two successive events –Message contains a timestamp –Upon receiving a message, if received timestamp is ahead, receiver fast forward it clock to be one more than sending time Extension for total ordering –Requirement: For all distinctive events a and b, C(a) C(b) –Solution: Break tie between concurrent events using process number
Lamport Timestamp Example Clocks run at different rate Correct clocks using Lamport Algorithm A B C D
Totally-Ordered Multicast Definition: sending messages to a set of processes, in such a way that all messages are delivered to the correct destinations in the same order. Scenario –Replicated accounts in New York(NY) and San Francisco(SF) –Two transactions occur at the same time and multicast Current balance: $1,000 Add $100 at SF Add interest of 1% at NY –Possible results $1,111 $1,110
Totally Ordered Multicast Use Lamport timestamps Algorithm (Communication history) –Message is time-stamped with sender’s logical time –Message is multicast (including sender itself) –When message is received It is put into local queue Ordered according to timestamp Multicast acknowledgement –Message is delivered to applications only when It is at head of queue It has been acknowledged by all involved processes Other algorithms: sequencer and destination agreement.
Vector Timestamps Problem of Lamport timestamps –C(a) a < b is not always true. –It does not capture “causality” Vector Timestamp –VT(a) < VT(b) when event a causally precede event b –V i [i] : number of events that have occurred so far at P i –If V i [j] = k then P i knows that k events have occurred at P j –Increment V i [i] at each new event at P i –When P i sends message m, it piggybacks current vector vt –When P j receives m, it adjust vector: V j [k] = max{V j [k],vt[k]} for each k V j [i] is incremented by 1
Vector Timestamps Example V i [i] : the number of events that have occurred so far at P i V i [j] : the number of events that have occurred at P j that P i has potentially been affected by, where j i.
Physical Clocks
Clock Sync. Algorithm Distributed System, P.245~P.250 Overview Cristian’s Algorithm UNIX Averaging
Overview TAI: International Atomic Time –BIH, Paris –50 cesium 133 clocks TUC: Universal Coordinated Time –Based on TAI –Basis of all modern timekeeping –Replacing Greenwich, an astronomical time
Overview-2 WWV –Shortwave radio station –NIST, National Institute of Standard Time –Broadcast UTC
Algorithm types Distributed Systems Centralized –Clients ask Server –Server polls Clients Decentralized Host collects times from others
Cristian’s Algorithm Client asks Server Time Server has a WWV receiver No more than t = /2 sec., asking for current time. – : max drift rate, 10 -5, 2 every hour for H=60 – : max time deviation between hosts – =2 t
Cristian’s Algorithm-2 Time never goes back –inconsistent –Gradually slow down Propagation time Interrupt handling time Threshold Multiple asking and averaging
Berkeley UNIX Time daemon polls the clients Computing a standard time Broadcast standard time
Averaging Algorithm Every interval, each hosts broadcast its time to all others Each one computes its own time based on the information from others Discard extreme data Correction: propagation, topology
Process Coordination
Election Algorithm A leader is often needed in distributed systems –As controller or coordinator We need to elect a leader on startup and when current leader fails. –E.g. take over the role of a failed process, pick a master in Berkeley clock synchronization algorithm Assumption –Every process knows ID of all the other processes Conditions –Operational process with largest ID wins –All operational process should be informed of a new leader –Recovering process can find current leader Types of election algorithms: Bully and Ring algorithms
Bully Algorithm Election is initiated by any process (P) notices that coordinator is no longer responding –Concurrent multiple elections are possible Algorithm –P sends ELECTION messages to all process with higher ID –If no one responds, P wins and becomes coordinator –Sends out COORDINATOR messages to all other processes –If one of higher-ups answers, it takes over. P is done. –3 message types: election, OK, I won –Several processes can initiate an election simultaneously –O(n 2 ) messages required with n processes
Bully Algorithm Example Process 4 holds an election Process 5 and 6 respond, telling 4 to stop Now 5 and 6 each hold an election
References Tanenbaum and Steen, “Distributed Systems Principles and Paradigms”, P.245~P Silberschatz, Galvin, Gagne, “Applied Operating System Concepts”, First Edition, Wiley & Sons, 2000, Pg