Reliable Communication in the Presence of Failures Kenneth Birman, Thomas Joseph Cornell University, 1987 Julia Campbell 19 November 2003
Agenda Overview Assumptions and Definition of Terms System Components Communication primitives Fault Tolerance Example Summary
Overview Communication facility for distributed system Failures can occur Applicable to local and wide-area networks Fault-tolerant process groups Consistent orderings of events Events delivered despite failures
Assumptions and Definition of Terms Failure – process stops w/o incorrect actions Event orderings controlled by comm layer Fault tolerance – continued operation Failures detected, others notified Logical approach, rather than physical Pretend events took place either before or after No communication among inconsistent processes
Fault Tolerant Process Groups Processes cooperating to perform distributed transaction No shared memory or synchronized clocks Changes in membership ordered wrt events Members monitor each other System Components
Managing Group Membership View Manager Oldest site Calculates “view extensions” View Extension Current view + 1 extension Other changes can get added on Site Manager System Components S1 S3 S2 S4 S2-1 S2-2 S2-3
Communication Primitives Send messages only to members of group Members can be at the same site or remote GBCAST – group broadcast ABCAST – atomic broadcast CBCAST – causal broadcast All are atomic
GBCAST (action, G) Broadcasts membership changes Issued by coordinator – 2 Phase Commit Coordinator calculates change Change received – Acknowledge – Commit Change doesn’t match – NACK w/ missing events Delivered after messages from failed member Failed process will never be heard from again If declared “dead” must go through recovery Communication Primitives
Normal GBCAST Coordinator P1P2 P3 P4 GBCAST (P1 down, (C,P2,P3,P4)) GBCAST (P1 down, …) Compare current view (C, P1, P2, P3, P4) with new view. Save to stable storage. ACK Commit New view: (C, P2, P3, P4) P1 down
ABCAST (msg, label, dests) Assures messages received in same order Issued by sender of message Recipient queues message, assigns max priority, tags undeliverable, replies Sender collects responses, computes max, sends value Recipient changes priority, tag deliverable, resort queue, transfer to delivery queue in order Communication Primitives
CBCAST (msg, clabel, dests) Ensures relative ordering when necessary clables are comparable or incomparable No common destinations, no comparison Previous messages included in transmit Optimization possible Intersite packets Common message pool and pointers to it Flags track where sent to Communication Primitives
GBCAST – Coordinator Fails (1) Coordinator P1P2 P3 P4 GBCAST (P1 down, (C,P2,P3,P4)) GBCAST (P1 down, …) Compare current view (C, P1, P2, P3, P4) with new view. Save to stable storage. ACK Commit New view: (C, P2, P3, P4) P1 down Commit New view: (C, P2, P3, P4) P1 down
GBCAST – Coordinator Fails (2) Coordinator P1 P2 P3 P4 Compare current view (C, P1, P2, P3, P4) with new view. Note there are 2 changes. Save to stable storage. New view: (P2, P3, P4) P1, C down (C, P2, P3, P4) P1 down Coordinator GBCAST (C down, ((P2,P3,P4) P1 down)) ACK New view: (P2, P3, P4) P1, C down Commit
Summary Communication protocols for distributed system Defined members, protocols Failures can be tolerated Members have consistent view Used at Cornell (ISIS): fault tolerant objects, bulletin boards