CS 582 / CMPE 481 Distributed Systems Synchronization (cont.)
Class Overview Synchronization in Distributed Systems Physical and Logical Time Global State Distributed Synchronization Election Algorithm
Global State current state of a distributed computation meaningful global state from local states recorded at different local times distributed snapshot It consists of all local states and messages in transit A distributed snapshot should reflect a consistent state
Global State (cont.) A distributed system is defined as a collection P of N processes pi, i = 1,2,… N history(pi) hi = <ei0, ei1, ei2, …> finite prefix of process history hik = <ei0, ei1, …, eik> state of pi just before kth event occurs sik si0 – initial state Global History of P H = h0 U h1 U … U hN-1 Global State S = (s1, s2, …, sN)
Global State (cont.) Meaningful global state process states that could have occurred at the same time corresponds to initial prefixes of the individual process histories A cut of the system’s execution is a subset of its global history union of prefixes of process histories si ∈ S corresponding to the cut C state of pi immediately after the last event processed by pi in the cut – – frontier of the cut
Global State (cont.)
Global State (cont.) Consistent Cut C Consistent global state C is consistent if for each event it contains, it also contains the events that happened before that event. for all events e ∈ C, f → e ⇒ f ∈ C Consistent global state state that corresponds to a consistent cut
Global State (cont.) Execution of distributed systems Run series of transitions between global states of the system S0 → S1 → S2 → … in each transition one event occurs at some single process Run total ordering of all events in a global history consistent with each local history ordering Consistent Run or Linearization ordering of the events in a global history consistent with the “happened before” relationship on H Reachable States S’ is reachable from S if there is a linearization that passes through S and then S’.
Distributed Snapshot Algorithm Chandy & Lamport [1985] records set of process & channel states for set of processes pi such that recorded global state is consistent state recorded locally at pi assumptions reliable communications (exactly once semantics) processes & channels do not fail unidirectional channels with FIFO delivery path between any two processes (strongly connected process-channel graph) any process may initiate snapshot algorithm at any time processes may continue execution, send or receive normal messages while snapshot algorithm executes
Distributed Snapshot Algorithm (cont) Marker receiving rule for process pi On pi's receipt of a marker message over channel c: if (pi has not yet recorded its state) it records its process state now; records the state of c as the empty set; turns on recording of messages arriving over other incoming channels; else pi records the state of c as the set of messages it has received over c since it saved its state. end if Marker sending rule for process pi After pi has recorded its state, for each outgoing channel c: pi sends one marker message over c (before it sends any other message over c).
Distributed Synchronization In a distributed system resources are shared by multiple processes, whose activities need to be synchronized mutual exclusion is often required to prevent interference and ensure consistency Distributed mutual exclusion ME1: (safety) at most one process may execute in the critical section (CS) at a time ME2: (liveness) a process requesting entry to the CS is eventually granted it ME3: (ordering) entry to the CS should be granted in happened-before order approaches centralized decentralized
Centralized Solution A server process coordinates mutual exclusion Algorithm Clients before entering the CS, a process sends a request message to the server and waits for a reply from it when leaving the CS, a process sends a release message to the server Server on receipt of request if no process in the CS and queue is empty, send a reply message; otherwise, queue the request on receipt of release remove next request from queue and send a reply A single point of failure
Ring Based Distributed Algorithm processes form a ring and token message is circulated around it possession of token implies right to enter CS after leaving CS, pass token to its neighbour Analysis 1 to (N - 1) messages are taken to get token token is not necessarily obtained in happened-before order if one process fails, need reconfiguration process assumed to be failed may inject old token
Distributed Algorithm Ricart & Agrawala [1981] based on distributed agreement using event ordering and timestamps Assumptions processes p1, … , pn know one another's address all messages sent are eventually delivered each process pi keeps a logical clock conforming to LC1 & LC2 token is being used to represent the state of a process RELEASED WANTED HELD
Distributed Algorithm (cont) On initialization state := RELEASED; To enter the section state := WANTED; Multicast request to all processes; request processing deferred here T := request's timestamp; Wait until (number of replies received = (N – 1)); state := HELD; On receipt of a request <Ti, pi> at pj (i ≠ j) if (state = HELD or (state = WANTED and (Tj, pj) < (Ti, pi))) then queue request from pi without replying; else reply immediately to pi; end if To exit the critical section reply to any queued requests;
Distributed Algorithm (cont) Analysis 2 (N - 1) messages are required to access CS expensive and a failure of any process becomes bottleneck extra overhead since even if the process requesting a token was the last to possess, it still goes through the process above
Elections Purpose Algorithms to choose a process from a group select a new master in Berkeley clock synchronization algorithm select a new member generating a token in ring-based distributed synchronization Algorithms ring-based bully
Ring-based Election Chang & Roberts [1979] Algorithm Analysis goal to elect single process, i.e. coordinator, process with largest identifier Algorithm initially, every process is marked as a non-participant any process begins election by marking itself as participant and sending election message to its neighbor when election message is received, check if participant & compare id if not participant arrived id is higher - claims myself as participant & pass message my id is higher - substitute id, claims myself as a participant & pass message receiver already participant – do not forward message if arrived id is smaller election is done when id in election message is same as claimed participant mark itself as non participant & send elected message with id process receives elected message – mark itself as non participant & forward Analysis (3N - 1) messages in worst case and 2N in best case
A ring-based election in progress 3 17 17 4 24 9 24 1 15 24 28 Note: The election was started by process 17. The highest process identifier encountered so far is 24. Participant processes are shown darkened
Bully Algorithm – Garcia-Molina [1982] Assumptions each process has a unique id processes know id and address of every other process communication is assumed reliable but process can fail during election election begins when detecting the coordinator has failed Algorithm to begin election, a process sends election message to all processes with higher id's and awaits answer message if no answer message, process becomes coordinator and sends coordinator message to processes with lower id's if process receives answer message, waits for coordinator message if process receives election message, it returns answer and starts an election if process receives coordinator message, it treats the sender as coordinator if failed process with highest id is restarted, it overrides the current coordinator Analysis: (N - 2) in best case and O(N2) messages in worst case
The bully algorithm – example The election of coordinator p2, after the failure of p4 and then p3