Revisiting failure detectors Some of you asked questions about implementing consensus using S - how does it differ from reaching consensus using P. Here it is. Recall the definition of S (strong) FD: Strong completeness + weak accuracy
Consensus using S {Program for process p} V p := ( , ,.. ) ; V p [p] := input of p; D p := V p ( Phase 1) Same as phase 1 of consensus with P (Phase 2) send (V p, p) to all; receive (D q, q) from all q, or q is a suspect; k :=1; do k ≠ n if V q [k]: V p [p] ≠ V q [k] = V p [k] := D p [k] := fi od (Phase 3) Decide on the first element V p [j]: V p [j] ≠
Example crashed Never suspected {1, 4} {2, 4} {4} {2, 4} List of suspects - - - - - - - V after Phase 1 - - - - - V after Phase 2
Atomic Commit Protocols Network of servers The initiator of a transaction is called the coordinator, and the remianing servers are participants S1 S3 S2 Servers may crash
Requirements of Atomic Commit Protocols Network of servers Termination. All non-faulty servers must eventually reach an irrevocable decision. Agreement. If any server decides to commit, then every server must have voted to commit. Validity. If all servers vote commit and there is no failure, then all servers must commi t. S1 S3 S2 Servers may crash
One-phase Commit server coordinator client server participant Commit / abort If a participant deadlocks or faces a problem then the coordinator may never be able to find it. Too simplistic.
Two-phase commit (2PC) Phase 1 : The coordinator sends VOTE to the participants. and receive yes / no from them. Phase 2: if server j: vote(j) = yes multicast COMMIT to all severs server j : vote (j) = no multicast ABORT to all servers fi What if failures occur?
Failure scenarios in 2PC (Phase 1 ) Fault:Coordinator did not receive YES / NO: OR Participant did not receive VOTE: Solution:Broadcast ABORT; Abort local transactions
Failure scenarios in 2PC (Phase 2) (Fault) A participant does not receive a COMMIT or ABORT message from the coordinator (it may be the case that the coordinator crashed after sending ABORT or COMIT to a fraction of the servers), then it remains undecided, until the coordinator is repaired and reinstalled into the system. This blocking is a known weakness of 2PC.
Coping with blocking in 2PC A non-faulty participant can ask other participants about what message (COMMIT or ABORT) did they receive from the coordinator, and take appropriate actions. But what if no non-faulty participant received anything? Who knows if the coordinator committed or aborted the local transaction before crashing? Continue to wait …
Non-blocking Atomic Commit A blocking protocol has the potential to prevent non-faulty participants from reaching a final decision. A solution to the atomic commitment problem is called non- blocking, if in spite of server crashes, every non-faulty participant eventually decides. One solution is to impose the requirement of uniform agreement
Uniform agreement If any participant ( faulty or not ) delivers a message m ( commit or abort ) then all correct processes eventually deliver m. To implement uniform agreement, no server should deliver a COMMIT or ABORT message until it has relayed it to all other servers. If a process times out in phase 2, then it decides abort.
Recovery: Stable storage A0 A1 update inspect Creates the illusion of an incorruptible storage, even if a writer or a disk crashes at any time. The implementation Uses at least two independent disks.
Stable storage To write, do the following: copy on disk A0; record timestamp T0; compute checksum S0; copy on disk A1; record timestamp T1; compute checksum S1 Readers check four cases: Both checksums OK and T1>T0 Both checksums OK and T1<T0 Checksum on A1 wrong Checksum on A2 wrong (Which copy to accept in each case?) A0 A1 update inspect
Checkpointing Mechanism for (backward) error recovery. Transaction states are periodically stored on stable storages. Following a failure, the transaction rolls back to the nearest checkpoint. Independent (unsynchronized) or coordinated (synchronized) checkpointing
Classification of checkpointing Coordinated Checkpointing takes a consistent snapshot. Has some overhead. Uncoordinated checkpointing apparently has no overhead. But it may have some efficiency problems.
Checkpointing (continued) Some actions can be reversed, but some cannot be reversed (like dispensing cash from an ATM machine, printing a document etc). Such actions are logged, and during replay, logs substitute real actions.
Group Communication Group oriented activities are steadily increasing. There are many types of groups: Open and Closed groups Peer-to-peer and hierarchical groups
Major issues Atomic multicast Ordered multicast Dynamic groups Failure handling
Atomic multicast A multicast is called atomic, when the message is delivered to every correct (i.e. functioning) member, or to no member at all. Sometimes, certain features available in the infrastructure of a distributed system simplify the implementation of multicast. Examples are (1) multicast on an ethernet LAN (2) IP multicast
Basic vs. reliable multicast Basic multicast does not consider crash failures. Reliable multicast does. Three criteria for basic multicast: Liveness. Each process must receive every message Integrity. No spurious message received No duplicate. Accepts exactly one copy of a message
Reliable atomic multicast Sender’s programReceiver’s program i:=0;if m is new do i ≠ n accept it; send message to i;multicast m; i:= i+1 m is duplicate discard m od fi Tolerates process crashes.