Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 3: Fault-Tolerant Broadcasts Spring 2008 Prof. Idit Keidar
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Today’s Material Distributed Systems 2nd edition, Sape Mullender (Editor) –Causal order, Ch. 4 –Fault-tolerant broadcasts, Ch. 5 –Vector clocks, Ch. 4 –State machine replication, from Ch. 7 Attiya-Welch –Vector clocks, Ch. 6
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Broadcast Sending a message to all the nodes in the system –E.g., our course mailing list What’s it good for? Allows for redundancy –In storage, processing Broadcast Service - building block for replication
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Motivation: Replication 2 Paradigms: Primary-Backup - Passive State Machine Replication (SMR) - Active
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Primary-Backup Replication Primary Backup(s) Client broadcast
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Primary-Backup (Passive) Replication “Hot” standby Client talks to primary server Primary updates backup(s) Client detects server failure using timeout –Performs “fail-over” to backup server –May need to repeat last operation(s) Can be a problem with “false suspicions” Works with benign servers only
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Primary-Backup Variants Backups can detect primary failure Client can be oblivious to failure –Using dispatcher –Using IP takeover Picture from IBM Web site: Highly available Web server cluster on HTTP Server Primary/backup with a network dispatcher model
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring State Machine Replication (SMR): The Idea aaa bb c Replicas are identical deterministic state machines Process operations in the same order remain consistent
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring SMR Architecture Client broadcast
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring SMR Architecture: Option II Client A Client B broadcast
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring State Machine Replication Send updates to all servers All servers are identical deterministic state machines –Servers begin in the same initial state –Perform operations in the same order to remain consistent May be slower than primary backup, but provides quicker, smoother fail-over Can overcome false suspicions and tolerate malicious servers
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Replica Coordination Requirements Agreement: all correct replicas receive all client requests Order: replicas process requests in the same order We want a Broadcast Service satisfying these We’ll start with Agreement
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Before We Begin … Define the model where we want to solve the problem Define the service interface Specify the service –Using properties
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model: Correct vs. Faulty Processes Look at a complete run (execution) –External observer’s view A process that does not fail in a run is correct in that run Otherwise, the process is faulty in the run –A process that fails any time in the run is faulty throughout the entire run
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Threshold Failure Model t out of n processes may fail t is usually given as a function of n, e.g., –t < n –2t < n –3t < n
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model: Synchronous vs. Asynchronous Synchrony: –Bounded latency, clock drift, processing time –Process crash failures can be accurately detected Asynchrony: non-assumption –Process crash failures cannot be accurately detected Failstop –Time-free, but crash failures accurately detected Unreliable failure detectors – later in the course
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Service Interface: Separating Reception from Delivery Application (e.g., state machine) deliver: update state Network receive Delivery Layer: wait for messages that should be acted on first
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Broadcast Service for Replication Primitives: broadcast(m), deliver(m). –For simplicity, assume m is unique. Network Broadcast Algorithm Application deliver broadcast receivesend Broadcast Algorithm Application deliver broadcast receivesend
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast Specification Validity: if a correct process broadcasts m then all correct processes eventually deliver m Agreement: if a correct process delivers m then all correct processes eventually deliver m –Uniform agreement: if any process delivers m then all correct processes eventually deliver m Integrity: m is delivered by a correct process at most once, and only if it was previously broadcast
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Model for Implementation Asynchronous Process crash failures –Note: cannot be detected Pair-wise point-to-point reliable links between correct processes –If a correct process p sends a message to a correct process q, then q receives the message –Safety or liveness?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Reliable Broadcast Implementation Simple implementation … Does it solve Uniform Reliable Broadcast? What if there are some link failures? –Under what condition does the protocol still work?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Replica Coordination Requirements Agreement: all correct replicas receive all client requests Order: replicas process requests in the same order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Possible Reception Orders? Process 1 send “1” receive message send “2” receive message Process 2 send “a” send “b” receive message Space-time diagram P1 P2 1 a 2 b
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Possible Reception Orders? Process 1 send “1” receive message send “2” receive message Process 2 send “a” send “b” receive message Space-time diagram P1 P2 1 a 2 b
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Order If some process broadcasts message m before message m’, then every correct process that delivers m’ delivers m beforehand. Trivial to implement (How?)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Possible Delivery Orders? Process 1 bcast “1” dlvr msg (“a”) dlvr msg bcast “2” dlvr msg Process 2 bcast “a” dlvr msg (“2”) What delivery orders make sense? (“a”)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Cause and Effect When observing a distributed system, there no common clock to order events of different processes We instead use the (weaker) notion of cause and effect If one event e caused another event f to happen, then e and f could never have happened simultaneously –e happened before f When we do not know a given program’s logic but can only observe its communication, –We cannot tell whether one event causes another –We can tell whether it could have caused another
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Happened Before or Causal Order [Lamport 78] Event e happens before (causally precedes) event f, denoted e → f if: –The same process executes e before f or –e is send(m) and f is receive(m) or –Exists h so that e → h and h → f We define concurrent, e || f, as: ¬(e → f f → e) In a broadcast service, application-level causality is between broadcast and deliver
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Order For two messages, m and m’, m →m’ if send(m) causally precedes send(m’) –Causal order: transitive closure of FIFO + some process delivers m before broadcasting m’ Causal Delivery: If m →m’ then every correct process that delivers m’ delivers m beforehand
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Does FIFO Between all Processes Guarantee Causal Order?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Total Order If two correct processes deliver both m and m’, they deliver them in the same order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring FIFO Broadcast Reliable Broadcast + FIFO Simple implementation on top of reliable broadcast (using sequence numbers) Network Reliable Broadcast Application receivesend FIFO Broadcast deliverbroadcast deliverbroadcast Reliable Broadcast Application receivesend FIFO Broadcast deliverbroadcast deliver broadcast Reliable Broadcast FIFO Deliver
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Broadcast Reliable Broadcast + Causal Implementation on top of Reliable Broadcast
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring If we Could Use Clocks …. Assume: processes have access to a global real- time clock RC, message delays are bounded by D –Every message m contains a timestamp TS(m) = RC –DR1: At time t, deliver all received messages with timestamps up to t –D in increasing timestamp order –If two messages have the same timestamp, break ties by process id (deliver the one with the lower id first) Clock Condition: if e → f then RC(e) < RC(f) Hence, if m →m’ then TS(m) < TS(m’)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Lamport’s Timestamps – LTS Logical Clocks Invent a clock that satisfies the clock condition Each process maintains a logical clock – Local positive-integer variable Each message is tagged with the source’s logical clock at the time the message is sent
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring LTS Broadcast Algorithm Part I: Logical Clock Assignment code for process p i TS[j] ← 0, j=0,…,n pending ← empty broadcast (m) TS[i] ← TS[i] + 1 /* p i ’s logical clock respects FIFO */ send (m, TS[i], i ) to all upon receive (m, t, j ) TS[j] ← t add (m, t, j ) to pending TS[i] ← max (TS[i], t) + 1 /* p i ’s logical clock respects causality */
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Example: Logical Clocks LTS= 1,p1 LTS= 1,p2 LTS= 3,p3
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Does This Solve The Problem? When do we deliver a message? Deliver a message with TS = t when no message with TS < t can be received A message m received by p i is stable at p i if no future messages with timestamps smaller than TS(m) can be received by p i DR2: Deliver all received messages that are stable at p i in increasing timestamp order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Detecting Stability Assume FIFO communication If p i receives m from p j with TS(m), p i cannot later receive a message m’ from p j with TS(m’) < TS(m) Stability of m at p i is guaranteed when –p i has received a message with timestamp greater than TS(m) from all processes Note: the timestamp is a pair LC, pid
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring LTS Broadcast Algorithm Part II: Delivery Rule code for process p i let (m, t, j ) be the entry in pending with the smallest t, j if t, j TS[k],k k=0,…n then /* DR2 */ deliver (m) remove (m, t, j ) from pending
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring The LTS Algorithm Implements Causal Broadcast Causal Delivery: If m →m’ then every correct process that delivers m’ delivers m beforehand Timestamps respect the Clock Condition: –If m →m’ then TS(m) < TS(m’) DR2 + FIFO links ensure that if m is in pending, all messages with lower TS were received –Were either delivered or are pending Delivery from pending is by TS order
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Fault Tolerance?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks Process p i has a vector clock VC[1…n] –VC[i] is the local message sequence number of the last message broadcast by p i –For j≠i, VC[j] is the latest message p i delivered from p j VC is sent with each message m –For j≠i, m.VC[j] is p j ’s latest message that causally precedes m
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks: Sending At process p i, on broadcast(m) –VC[i] VC[i]+1 –Use Reliable Broadcast to send (m,VC) to all No need to send to myself –Deliver m locally
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks: Delivery Rule Upon receive m –Place in message buffer Deliver m from p j from buffer if –VC[j] = m.VC[j] -1 and –Forall k≠j : VC[k] ≥ m.VC[k] Upon deliver –VC[j] := VC[j] + 1 VC[j] is the number of messages of p j that causally precede p i ’s subsequent messages FIFO Messages causally preceding m were delivered
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Vector Clocks Example
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Atomic Broadcast Services Atomic Broadcast: – Reliable Broadcast + Total Order FIFO Atomic Broadcast –FIFO + Reliable Broadcast + Total Order Causal Atomic Broadcast –Causal + Reliable Broadcast + Total Order HW question: are FIFO Atomic Broadcast and Causal Atomic Broadcasts equivalent?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Causal Atomic Broadcast in Failstop Model Order messages by logical timestamp (LTS) –Break ties by process id Use FIFO Broadcast When is a message with LTS t delivered? –Reminder: failstop = failures are accurately detected –Assume further that no message from a faulty process arrives after its failure is detected
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Atomic Broadcast in Asynchronous Systems??? Alas, impossible if even one process can crash
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Now, Back to State Machines We can build state machines using Atomic Broadcast A client can –Broadcast to all servers; or –Forward its request to one of the servers to broadcast to the others Resend on timeout if the server fails What about client failures?
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Multicast Problems Processes organized in groups –Groups have names –Messages sent to groups –Like broadcast, but for group members –Processes can join and leave groups Processes may care about who else is a member of the group (group membership)