Download presentation
Presentation is loading. Please wait.
Published byCamron Daniels Modified over 9 years ago
1
Membership and Clique Avoidance in TTP/C Gunther Bauer, Michael Paulitsch Presented by Michael Sirivianos 02/01/2005
2
Overview Membership in hard Real Time systems. What is it and why? Objectives TTP/C Overview Group membership. Clique Avoidance and Implicit Acks Cluster Model-Fault Model General Properties Analysis Conclusions
3
What is a RT Membership Service? Safety critical RT systems use a bus system for communication. A class C system offers the required FT. A membership service gives timely and consistent info on the state of all nodes.
4
Why do we need it? Membership service establishes replica-deterministic agreement on all messages. Prevents clique formation and certain classes of arbitrary faults Allows global knowledge thus consistent and timely reaction to faults. Membership is a critical function for the correct operation of the communication system. Should be placed below the app. Layer within the TTP layer.
5
TTP/C Overview Services: Message transport at specific time instances, with minimal jitter. Fault-tolerant clock synchronization Fault-tolerant membership management. TDMA media access Not necessarily equal sized time slots. MEssage Description List contains TDMA schedule and groups several rounds of TDMA in cluster rounds. Statically assigned to all nodes.
6
TTP/C Overview, cont. State of the distributed system (C-state). It comprises of: Membership The global time last frame B/C started. Number of current TDMA slot I (protocol state info) and X (protocol+app. data info) frames periodically transmit and carry C- state. N (app. data info) frames. Determining consistency of C-state, by calculating CRC over both app. data and C-state.
7
TTP/C Overview, cont. A node in the cluster, which is included in the schedule but has been inactive, can be integrated using global time and C-state info from the I/X frames.
8
TTP Protocol Stack Application software in Host FTU Membership Redundancy Management SRU Membership Clock Synchronization Media Access: TDMA Host Layer FTU CNI FTU Layer RM Layer SRU Layer Data Link/Physical Layer Basic CNI
9
TTP Protocol Stack (cont.) Data Link/Physical Layer Provide the means to exchange frames between the nodes SRU Layer Store the data fields of the received frames RM Layer Provide the mechanisms for the cold start of a TTP/C cluster FTU Layer Group two or more nodes into FTUs Host Layer Provide the application software Basic CNI A data-sharing interface between the RM layer and FTU layer FTU CNI The interface between FTU layer and Host Layer
10
Structure of a TTP/C Based System
11
Timeline in TTP/C TDMA Cycle One FTU sends message twice The pattern is repeated when TDMA round ends Cluster Cycle Cluster cycle involves scheduling all possible messages and tasks
12
TTP/C Frame Structure N-Frame:
13
Paper Objective Investigate properties of the Clique Avoidance algorithm. Performance analysis and study of interaction with Implicit Acks mechanism. Study ability to resolve and detect conflicts in membership views of nodes within a cluster. Provide time bounds for detecting and removing faulty members. For their analysis, they assume arbitrary failures with bounded frequency.
14
Initial TTP/C Fault Hypothesis. Nodes. Only one faulty node within the duration of a TDMA round. A node may become faulty only after any previously faulty node has either shut down or operates correctly again. Transmission fault is consistent (nodes will consistently consider the respective frame faulty or correct) A node does not send faulty or correct data outside its assigned sendings slots. A node never hides its identity when sending frames.
15
Initial TTP/C Fault Hypothesis. Network. Only one channel can be faulty during a TDMA slot. A channel does not spontaneously create correct frames A channel will deliver a frame either within some known time bounds or never. Bus Guardian transforms node errors, to comply with hypothesis. Central Guardian a more cost effective solution. Handles several arbitrary faults.
16
Cluster Model - Extended Fault Hypothesis No more failures besides the one that caused a cluster partition can occur two TDMA rounds before and after the failure. Thus, initially there is a single clique in which all nodes are assigned to. Partition failure should cause both partitions to contain more than one member. Should affect both channels and be inconsistent. Contrary to the to initial hypothesis. TTP/C can handle faults in violation of hypothesis, but in this case there is no guarantee it selects the correct clique.
17
Group Membership Protocol Clique Avoidance algorithm Removes faulty nodes from cluster Prevents several coexisting cliques Implicit Acknowledgement The node inspects the membership list sent by the receiving nodes, to determine whether its message was correctly received.
18
Cluster Model - Slot n slots per TDMA round
19
Clique Avoidance A reception is considered correct if the received C-state matches the local C-state and data are not corrupted. i.e transmission time is correct and memberships match after adding sender. After a successful reception sender is added to receivers ML. After incorrect reception, sender is removed from ML. If the ML of the receiver differs only by the sender, then reception is successful. Accept Counter is increased for every successful reception. Failed Counter is increased for every incorrect reception. If Failed counter >= Accept counter, node raises Ack Error and shuts down (freezes). FC and AC are reset to 0 in each TDMA round.
20
Clique formation under the extended fault hypothesis Prior to failure, there is consensus on membership. Transient failure occurs at slot 0, when node A is transmitting. Asymmetric send fault. As a result, several nodes in cluster correctly received A’s transmission and the rest did not. Two cliques are formed. The one of members with membership that includes A and the one of members that do not include A.
21
Implicit Acks - Successors After successful transmission, A increases AC. B checks frame for correctness. 1. A waits for expected message from B. 2. If reception was successful, B adds A in its ML and transmits a non corrupted message. 3. If ML’s are the same or B’s differ only by A, then A considers B its successor. If ML’s are the same, then A is acked. (case 1). It increases its AC and adds B in ML. If B’s ML differs by A, then A increases FC and removes B. B’s reception was not successful and B removed A. (case 2) 4. Otherwise A removes B from its ML. It increases FC unless B did not transmit at all. A goes to step 1.(case 3)
22
Implicit Acks - Successors 4. A waits for expected message from subsequent node C. 5. If A finds successor C that contains A in ML, then it is acknowledged. B is assumed faulty and both FC and ML were updated correctly. A increases AC and adds C in ML. (case 4) 5. However, if C’s ML does not include A, A considers himself erroneous. A removes itself from local list and adds both B, C. Increases AC. It has the same ML with B, and C (case 5)
23
Implicit Acks - Defector In case 5, A changes clique membership. Becomes defector. Other nodes become aware of a defector only in its next sending round, by the transmitted ML. If defector becomes implicitly acknowledged, then it is no longer defector. If not, it freezes due to CA.
24
Partition failure. Slot 0 Preparation Phase NodeACFCView A0A0 50A 0 A 1 A 2 A 3 A 4 A1A1 40 A2A2 30 A3A3 20 A4A4 10 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
25
Partition failure. Slot 0 Transmission Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time 0 123456789 Node A0A0 Transmitted Membership A 0 A 1 A 2 A 3 A 4
26
Partition failure. Slot 0 Evaluation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time 0 123456789 NodeCorrectly Received Frame ? A0A0 Yes (Itself) A1A1 No A2A2 Yes A3A3 A4A4 No
27
Partition failure. Slot 1 Preparation Phase NodeACFCView A0A0 10A 0 A 1 A 2 A 3 A 4 A1A1 41A 1 A 2 A 3 A 4 A2A2 40A 0 A 1 A 2 A 3 A 4 A3A3 30 A4A4 11 A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
28
Partition failure. Slot 1 Transmission Phase Node A1A1 Transmitted Membership A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
29
Partition failure. Slot 1 Evaluation Phase NodeSame Membership ?Action A0A0 No, by A 0 A 1 becomes 1 st Succ. Remove A 1, Inc. FC A1A1 Yes(Itself)Inc. AC A2A2 No, by A 0 Remove A 1, Inc. FC A3A3 No, by A 0 Remove A 1, Inc. FC A4A4 YesInc. AC NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
30
Partition failure. Slot 2 Preparation Phase NodeACFCView A0A0 11A 0 A 2 A 3 A 4 A1A1 10A 1 A 2 A 3 A 4 A2A2 41A 0 A 2 A 3 A 4 A3A3 31 A4A4 21 A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
31
Partition failure. Slot 2 Transmission Phase Node A2A2 Transmitted Membership A 0 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
32
Partition failure. Slot 2 Evaluation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 NodeSame Membership ?Action A0A0 YesA 2 becomes 2 nd Succ. Inc. AC. It is acked. A1A1 No, by A 0, A 1 Remove A 2, Inc FC A2A2 Yes (itself)Inc. AC A3A3 YesInc. AC A4A4 No, by A 0, A 1 Remove A 2, Inc FC
33
Partition failure. Slot 3 Preparation Phase NodeACFCView A0A0 21A 0 A 2 A 3 A 4 A1A1 11A 1 A 3 A 4 A2A2 10A 0 A 2 A 3 A 4 A3A3 41 A4A4 22 A 1 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
34
Partition failure. Slot 3 Transmission Phase Node A3A3 Transmitted Membership A 0 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
35
Partition failure. Slot 3 Evaluation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 NodeSame Membership ?Action A0A0 YesInc. AC A1A1 No, by A 0, A 1, A 2 Remove A 3, Inc FC A2A2 YesInc. AC (It is acked). A3A3 Yes (Itself)Inc. AC A4A4 No, by A 0, A 1, A 3 Remove A 3, Inc FC
36
Partition failure. Slot 4 Preparation Phase NodeACFCView A0A0 31A 0 A 2 A 3 A 4 A1A1 12A 1 A 4 A2A2 20A 0 A 2 A 3 A 4 A3A3 10 A4A4 23 A 1 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
37
Partition failure. Slot 4 Preparation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 FC > AC Node A 4 Freezes !
38
Partition failure. Slot 4 Evaluation Phase NodeNull message?Action A0A0 YesRemove A 4 A1A1 YesRemove A 4 A2A2 YesRemove A 4 A3A3 YesRemove A 4 A4A4 Frozen NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
39
Partition failure. Slot 5 Preparation Phase NodeACFCView A0A0 31A 0 A 2 A 3 A1A1 12A1A1 A2A2 20 A3A3 10 A4A4 Frozen NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
40
Partition failure. Slot 5 Transmission Phase Node A1A1 Transmitted Membership A 0 A 2 A 3 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
41
Partition failure. Slot 5 Evaluation Phase NodeSame Membership ?Action A0A0 Yes (Itself)Inc AC A1A1 No, by A 0, A 1, A 2, A 3 Inc. FC A2A2 YesInc. AC A3A3 YesInc. AC A4A4 Frozen NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
42
Partition failure. Slot 6 Preparation Phase NodeACFCView A0A0 31A 0 A 2 A 3 A1A1 13A1A1 A2A2 30 A3A3 20 A4A4 Frozen NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
43
Partition failure. Slot 6 Preparation Phase FC > AC Node A 1 Freezes! NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
44
Partition failure. Slot 6 Evaluation Phase NodeNull message?Action A0A0 YesRemove A 1 A1A1 Frozen A2A2 YesRemove A 1 A3A3 YesRemove A 1 A4A4 Frozen NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
45
Partition failure-Defection. Slot 0 Preparation Phase NodeACFCView A0A0 50A 0 A 1 A 2 A 3 A 4 A1A1 40 A2A2 30 A3A3 20 A4A4 10 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
46
Partition failure-Defection. Slot 0 Transmission Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time 0 123456789 Node A0A0 Transmitted Membership A 0 A 1 A 2 A 3 A 4
47
Partition failure-Defection Slot 0 Evaluation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time 0 123456789 NodeCorrectly Received Frame ? A0A0 Yes (Itself) A1A1 No A2A2 A3A3 Yes A4A4 No
48
Partition failure-Defection. Slot 1 Preparation Phase NodeACFCView A0A0 10A 0 A 1 A 2 A 3 A 4 A1A1 41A 1 A 2 A 3 A 4 A2A2 31 A3A3 30A 0 A 1 A 2 A 3 A 4 A4A4 11 A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
49
Partition failure-Defection. Slot 1 Transmission Phase Node A1A1 Transmitted Membership A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
50
Partition failure-Defection. Slot 1 Evaluation Phase NodeSame Membership ?Action A0A0 No, by A 0 A 1 becomes 1 st Succ. Remove A 1, Inc. FC A1A1 Yes(Itself)Inc. AC A2A2 YesInc. AC A3A3 No, by A 0 Remove A 1, Inc. FC A4A4 YesInc. AC NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
51
Partition failure-Defection. Slot 2 Preparation Phase NodeACFCView A0A0 11A 0 A 2 A 3 A 4 A1A1 10A 1 A 2 A 3 A 4 A2A2 41 A3A3 31A 0 A 2 A 3 A 4 A4A4 21 A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
52
Partition failure-Defection. Slot 2 Transmission Phase Node A2A2 Transmitted Membership A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
53
Partition failure-Defection. Slot 2 Evaluation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 NodeSame Membership ?Action A0A0 No, by A 0, A 1 A 2 becomes 2 nd Succ. Inc AC. Defects. A1A1 YesInc AC A2A2 Yes (itself)Inc AC A3A3 No, by A 0, A 1 Remove A 2, Inc FC A4A4 YesInc AC
54
Partition failure-Defection. Slot 3 Preparation Phase NodeACFCView A0A0 21A 1 A 2 A 3 A 4 A1A1 20 A2A2 10 A3A3 32A 0 A 3 A 4 A4A4 31A 1 A 2 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
55
Partition failure-Defection. Slot 3 Transmission Phase Node A3A3 Transmitted Membership A 0 A 3 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
56
Partition failure-Defection. Slot 3 Evaluation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 NodeSame Membership ?Action A0A0 No, by A 0, A 1, A 2 Remove A 3, Inc FC A1A1 No, by A 0, A 1, A 2 Remove A 3, Inc FC A2A2 No, by A 0, A 1, A 2 Remove A 3, Inc FC A3A3 Yes (Itself)Inc AC A4A4 No, by A 0, A 1, A 2 Remove A 3, Inc FC
57
Partition failure-Defection. Slot 4 Preparation Phase NodeACFCView A0A0 22A 1 A 2 A 4 A1A1 21 A2A2 11 A3A3 10A 0 A 3 A 4 A4A4 32 A 1 A 2 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
58
Partition failure-Defection. Slot 4 Transmission Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 Node A4A4 Transmitted Membership A 1 A 2 A 4
59
Partition failure-Defection. Slot 4 Evaluation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 NodeSame Membership ?Action A0A0 YesInc AC A1A1 YesInc AC A2A2 YesInc AC A3A3 No, by A 0, A 1, A 2, A 3 Remove A 4, Inc FC A4A4 Yes (ItselfInc AC
60
Partition failure-Defection. Slot 5 Preparation Phase NodeACFCView A0A0 32A 1 A 2 A 4 A1A1 31 A2A2 21 A3A3 11A 0 A 3 A4A4 10 A 1 A 2 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
61
Partition failure-Defection. Slot 5 Transmission Phase Node A1A1 Transmitted Membership A 0 A 1 A 2 A 4 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
62
Partition failure-Defection. Slot 5 Evaluation Phase NodeSame Membership ?Action A0A0 Yes (Itself)Inc AC A1A1 YesInc. AC A2A2 YesInc. AC A3A3 No, by A 1, A 2, A 3, A 4 Remove A 0, Inc FC A4A4 YesInc. AC NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
63
Partition failure-Defection. Slot 6 Preparation Phase NodeACFCView A0A0 10A 0 A 1 A 2 A 4 A1A1 41 A2A2 31 A3A3 12A3A3 A4A4 40 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
64
Partition failure-Defection. Slot 8 Preparation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 NodeACFCView A0A0 30A 0 A 1 A 2 A 4 A1A1 20 A2A2 10 A3A3 14A3A3 A4A4 20
65
Partition failure-Defection. Slot 8 Preparation Phase NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789 FC > AC Node A 3 Freezes!
66
Partition failure. Slot 8 Evaluation Phase NodeNull message?Action A0A0 YesRemove A 3 A1A1 YesRemove A 3 A2A2 YesRemove A 3 A3A3 Frozen A4A4 YesRemove A 3 NodeA0A0 A1A1 A2A2 A3A3 A4A4 A0A0 A1A1 A2A2 A3A3 A4A4 Time0123456789
67
Cluster Model- IoI The length of Interval of Interest is one TDMA round. Each node has its own.
68
CA General Properties Property 1. Node becomes defector iff it sends in slot 0 (moment of failure). Proof. Suppose partition created cliques A, B. Let A 0 A. If A 1, A 2 B then case 5, A 0 reconsiders clique membership. If failure s.t. members of A are still able to communicate with B, in next round A will send correct frame containing B’s ML including itself and will be added in B. Let A x, x > 0, either belonging to same clique as A x+1 or to another. If A x, A x+1 in same clique then A x is acked. If not, there is a preceding slot where another member of A x ‘s or A x+1 ‘s clique has sent. Thus, A x, A x+1 disagree on membership status and A x+1 does not become A x ‘s 1 st successor.
69
CA General Properties Property 2. The results of CA algorithm during the first TDMA round do not depend on the existence of a defector. Proof. Let A 0 A. All A x s.t. x > 0 do not know if A 0 is defector, because other nodes evaluate membership of A 0 during eval. phase of slot 0. Takes up to eval. phase of slot 2 for A 0 to become defector. A 0 is not influenced too, since it performs CA in its prep. Phase. At this time (slot 0) it is not defector since no failure can occur in previous round (hypothesis).
70
Analysis Property 4. If there is no defector, no member of larger clique will shut down. All members of smaller clique will shut down within 2 TDMA rounds from clique formation. No member of smaller clique will send in 2 nd TDMA round. Proof. Let |A| + |B| = n. A node freezes iff FC >= n/2. Let |A| > |B|. Max FC for members in A is FC A <= |B| < n/2. Min AC for members in A is AC A >= |A| > n/2. Thus, no member of A will shutdown. B’s members who did not shut down by end of 1 st round will have FC B = |A| > n/2, AC B <= |B| < n/2 and will shutdown in 2 nd round.
71
Property 4 Example A0A0 A1A1 A2A2 A3A3 A4A4 A5A5 A0A0 A1A1 A2A2 A3A3 A4A4 A5A5 01234567891011 A 4 freezes in 1st TDMA round after failure. In slot 4 prep. phase it has AC = FC = 3 A 2 freezes in 2nd TDMA round after failure. In slot 8 prep. phase it has FC = 5, AC=1
72
Analysis (cont.) Property 5. If there is no defector and |A| = |B| = n/2, the clique of which the last member has sent before the last member of the other clique will win. No member of winning clique will shut down and no member of losing clique will send in 2 nd round. Proof. As long as neither A L,B L have sent, no clique will shutdown since FC A <= |B| -1 = n/2 –1, FC B <= |A| -1. Let A L have its sending slot first. A L will not shut down. All members of B will shutdown because FC B = |A|. At least one of members of B will die in 1 st round after failure.
73
Analysis (cont.) Property 6. If there is a defector, all other nodes will shutdown or continue just as if there was no defector. The defector will shutdown if it switches to the losing clique. Proof. From property 2 nothing changes so we can apply properties 4, 5. Defector will have a new clique. Let |A| > n/2. If it switches to A and if initially |A| > |B|, it will find FC <= |B| < AC = |A|. If initially |B| = |A|, then at least a member of B has shutdown in 1 st round, FC <=(|B|-1) < AC = |A|. In both cases Defector survives and becomes member of A.
74
Analysis (cont.) However, if |A| > |B| and if it switches to losing B, it will see: FC = |A| > AC <= |B|. FC = |A|-1 successive members of A plus 1 st successor, which is in B. AC <= |B|. From prop. 4, members of B may have shutdown during its IoI) and freeze. Members of A in 2 nd TDMA round will find : FC <= |B| - 2 < AC = |A |-1. Defector has left clique and shutdown. 1 st and 2 nd successor of defector in B are also down. If initially |A| = |B|, at least a member of B shuts down in 1 st round, so Defector has FC = |A| > AC <= |B| - 1 and will shut down.
75
Integration Integrating node copies ML from any I frame and waits for its first slot. It does not perform Clique Avoidance nor sends data. In a partitioned cluster, it joins the clique of the node it copied ML from. It only sends in its next assigned slot so that its counters are affected by all nodes in cluster. This is to avoid sending in second TDMA round after partition, while it is in losing clique. If it joins the losing clique it, will have FC > AC and freeze.
76
Analysis Summary The clique with the majority of nodes always wins After partitioning, one of equally sized partitions wins. Defector does not change the clique selection. No member of winners including a newly joined Defector will ever shut down. All losers will freeze at the latest in the second round after failure. There will always be at least ceil(n/2 – 1) survivors.
77
Conclusions The algorithm prevents cluster partitioning after one TDMA round instability interval. Provides nodes with consistent membership view among the active nodes. This is true even after a single arbitrary fault within 2 TDMA rounds. This does not comply with the initial fault consistency hypothesis.
78
Open Issues A 0 makes an “educated” guess based on its successors feedback. If both successor are in losing clique, then he guessed wrong. How can we reduce the probability of defecting to losers? Perhaps add more states. 3 rd successor state? Analysis of probability to defect to losers with respect to number of nodes in the cluster and number of successor states.
79
References G. Bauer and M. Paulitsch. An Investigation of Membership and Clique Avoidance in TTP/C. 19th IEEE SRDS, (2000). Herman Kopetz, Gunter Grusteidl. TTP- A Protocol for Fault-Tolerant Real-Time Systems. IEEE Comp. Society Press (1994). Gunther Bauer, Hermann Kopetz, Wilfried Steiner Byzantine Fault Containment in TTP/C. Workshop on Real-Time LANs in the Internet Age (2002)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.