Ziv Dayan Tom Afek Kafka Instructor Ittay Eyal
What is a failure detector? Our failure detector Software Implementation Gossip style Independent local unit
Communication – by messages Each message contains a list of heartbeats Each heartbeat contains IP of creator Time since creation Each node contains its own Local Node: Local Node Net Members Node Neighbors Versions Neighbor Version
Repeat periodically: Choose the node whose threshold is closest to expiration Wait until the threshold has expired Check the local time of creation of the last heartbeat received by the suspected node: If changed – the node is OK Else – the suspected node had crashed
Computer Listener Main Message Handler Message Sender Sender Detector
A new abstract class is added – NetMessage Method 1: Handle() – decodes the received message using the proper version and returns Message Method 2: toString() – used for serialization NetMessage SHA1MessageNormalMessage Message
H = f(P, n, threshold) Assumptions required Simplicity Vs Efficiency Full topology Spread time << threshold
Assumption – Local Information Strong Assumption Reliability x – number of messages - Probability for false detection We want Result :
Linear Performance The bigger is P the bigger is the slope
Assumptions Synchrony Consistency Calculation for average case
High Performance
Comparison Categories Efficiency Scalability Dynamism Reliability