Download presentation
Presentation is loading. Please wait.
Published byBambang Sonny Oesman Modified over 6 years ago
1
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 6 Synchronization Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
2
Clock Synchronization
Figure 6-1. When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
3
Figure 6-2. Computation of the mean solar day.
Physical Clocks (1) Figure 6-2. Computation of the mean solar day. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
4
Physical Clocks (2) Figure 6-3. International Atomic Time (TAI) seconds are of constant length, unlike solar seconds. Leap seconds are introduced when necessary to keep in phase with the sun. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
5
Global Positioning System (1)
Figure 6-4. Computing a position in a two-dimensional space. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
6
Global Positioning System (2)
Real world facts that complicate GPS It takes a while before data on a satellite’s position reaches the receiver. The receiver’s clock is generally not in synch with that of a satellite. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
7
Clock Synchronization Algorithms
Correction to local clock C Can't let time “run backward” Time of synch Figure 6-5. The relation between clock time and UTC when clocks tick at different rates. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
8
Figure 6-6. Getting the current time from a time server.
Network Time Protocol Figure 6-6. Getting the current time from a time server. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
9
NTP (2) Assume delay is about symmetric Skew = [(T2-T1)+(T3-T4)]/2.
Take many (8) pair of (skew,delay) and use one with best (least) delay. Clocks have “strata” - lowest stratum is best – only adjust clock if your stratum is higher. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
10
The Berkeley Algorithm (1)
Figure 6-7. (a) The time daemon asks all the other machines for their clock values. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
11
The Berkeley Algorithm (2)
Figure 6-7. (b) The machines answer. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
12
The Berkeley Algorithm (3)
Figure 6-7. (c) The time daemon tells everyone how to adjust their clock. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
13
Clock Synchronization in Wireless Networks (1)
Figure 6-8. (a) The usual critical path in determining network delays. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
14
Clock Synchronization in Wireless Networks (2)
Figure 6-8. (b) The critical path in the case of RBS. Reference Broadcast System: don't reset clock, just track offsets using logarithmic decay Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
15
Clock Types Time-of-day (wall) Local clock (oscillator)
Global reference Distributed clock Logical (event ordering) Lamport clock Vector clock Matrix clock Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
16
Lamport’s Logical Clocks (1)
The "happens-before" relation → can be observed directly in two situations: If a and b are events in the same process, and a occurs before b, then a → b is true. If a is the event of a message being sent by one process, and b is the event of the message being received by another process, then a → b Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
17
Lamport’s Logical Clocks (2)
Figure 6-9. (a) Three processes, each with its own clock. The clocks run at different rates. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
18
Lamport’s Logical Clocks (3)
Figure 6-9. (b) Lamport’s algorithm corrects the clocks. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
19
Lamport’s Logical Clocks (4)
Figure The positioning of Lamport’s logical clocks in distributed systems. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
20
Lamport’s Logical Clocks (5)
Updating counter Ci for process Pi Before executing an event Pi executes Ci ← Ci + 1. When process Pi sends a message m to Pj, it sets m’s timestamp ts (m) equal to Ci after having executed the previous step. Upon the receipt of a message m, process Pj adjusts its own local counter as Cj ← max{Cj , ts (m)}, after which it then executes the first step and delivers the message to the application. (or +Δi in general) Use process ID to break ties if necessary. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
21
Lamport’s Logical Clocks (6)
Do we obtain a total order on events? Yes – single integer, break ties using proc ID What information does Lamport clock convey? ei → ej => ts(ei) < ts(ej)? Yes, timestamp of causally dependent event must be larger because of how protocol works ts(ei) < ts(ej) => ei → ej? No – unrelated events can have either order Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
22
Group Messaging Properties
Two related properties of interest Ordering FIFO (per sender) Causal Total Reliability Best effort Duplicate detection Omission detection/recovery per receiver Atomic to all receivers (all or none) Digital Fountain – fountain codes Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
23
Example: Totally Ordered Multicasting
Figure Updating a replicated database and leaving it in an inconsistent state. Key concerns are: do operations commute (final state same); what values do clients see (intermediate states) Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
24
Causal Message Delivery
Want a method by which messages can be delivered in causal order, not necessarily in some total order across all processes Lamport logical clocks impose a total order that obeys causal order, but this is too restrictive (messages that are not causally related have a delivery order imposed on them unnecessarily) Vector clocks can capture causality information precisely Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
25
Figure 6-12. Concurrent message transmission using logical clocks.
Vector Clocks (1) Figure Concurrent message transmission using logical clocks. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
26
Vector Clocks (2) Vector clocks are constructed by letting each process Pi maintain a vector VCi with the following two properties: VCi [ i ] is the number of events that have occurred so far at Pi. In other words, VCi [ i ] is the local logical clock at process Pi . If VCi [ j ] = k then Pi knows that k events have occurred at Pj. It is thus Pi’s knowledge of the local time at Pj . Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
27
Vector Clocks (3) Steps carried out to accomplish property 2 of previous slide and enforce causal delivery: Before executing an event (e.g., message send) Pi executes VCi [ i ] ← VCi [i ] + 1. When process Pi sends a message m to Pj, it sets m’s (vector) timestamp ts (m) equal to VCi after having executed the previous step. Upon the receipt of a message m from Pi, process Pj adjusts its own vector by setting VCj [k ] ← max{VCj [k ], ts (m)[k ]} for each k, after which it delivers the message to the application, unless VCj[k] < ts(m)[k] for some k other than i, or VCj[i] +1 < ts(m)[i], in which case, buffer m. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
28
Enforcing Causal Communication
Figure Enforcing causal communication. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
29
Enforcing Causal Communication
(0000) (1000) (1000) (1001) (1101) P0 (1001) (0000) (1101) P1 (1000) (0001) (0000) P2 (1101) (0001) (1001) (0000) (0001) (1001) (1101) P3 Figure Enforcing causal communication. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
30
Atomic Multicast Want all nodes in group G to process all messages in same order Each node maintains a Lamport logical clock C. When node n sends a message m to G it includes a timestamp ts(m) = C. When node n' receives m, it updates its clock C' and sends a timestamped ACK a', with ts(a')=C'. Node n collects all ACKs and takes the maximum timestamp T(m)=max{ts(a)} for all a, and sends a commit message c containing T(m) to G. Node n' can deliver m when it receives c and it knows T(m) < T(m') for any other message m'. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
31
Mutual Exclusion Need for synchronization Access to shared variables
Access to serially reusable resources Approaches Centralized Distributed Voting Token-passing Characteristics Correctness – safety Efficiency – messages passed Fault tolerance Fairness – liveness Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
32
Mutual Exclusion A Centralized Algorithm (1)
locked=F Figure (a) Process 1 asks the coordinator for permission to access a shared resource. Permission is granted since locked=F. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
33
Mutual Exclusion A Centralized Algorithm (2)
locked=T Figure (b) Process 2 then asks permission to access the same resource. The coordinator does not reply since the resource is locked (locked=T). 2 is enqueued. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
34
Mutual Exclusion A Centralized Algorithm (3)
locked=T Figure (c) When process 1 releases the resource, it tells the coordinator, which dequeues and replies to 2. When 2 sends release, locked is set to F (assuming the queue is empty). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
35
A Distributed Algorithm (1)
Lamport Timestamp DME algorithm Three different cases: If the receiver is not accessing the resource and does not want to access it, it sends back an OK message to the sender. If the receiver already has access to the resource, it simply does not reply. Instead, it queues the request. If the receiver wants to access the resource as well but has not yet done so, it compares the timestamp of the incoming message with the one contained in the message that it has sent everyone. The lowest one wins. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
36
A Distributed Algorithm (2)
Figure (a) Two processes want to access a shared resource at the same moment. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
37
A Distributed Algorithm (3)
Figure (b) Process 0 has the lowest timestamp, so it wins. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
38
A Distributed Algorithm (4)
Figure (c) When process 0 is done, it sends an OK also, so 2 can now go ahead. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
39
A Distributed Algorithm (5)
Lamport Timestamp DME algorithm Observations: No deadlock – lowest timestamp wins. No starvation – eventually your request has the lowest timestamp (all later ones will “see” yours and have a larger timestamp). Replaces one overloaded node with N of them. Replaces one point of failure with N of them. “Fix” failures by requiring “Deny” response to denied requests – sender can now resend until gets a reply and distinguish failures. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
40
Voting Algorithms (1) Lamport Voting DME algorithm
Send request as before. Give vote if it has not already been given to someone else. Queue requests in timestamp order. Win when a majority of votes is collected. Return vote to grantor when done with resource. Give returned vote to head of queue if not empty. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
41
Voting Algorithms (2) Lamport Voting DME algorithm Observations:
Can tolerate N/2-1 failures. Don't have to send messages to all nodes – just enough to collect the votes needed. Problem: multiple nodes may collect votes and none get a majority – deadlock! Fix this with “Rescind” message – used if a grantor gets a request with a smaller timestamp. Give vote to priority request when returned. If receive a Rescind message and not in CS, must return vote and wait for majority. If receive Rescind while in CS, return when done Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
42
Voting Algorithms (3) Quorum Voting DME algorithm - coteries
Don't need majority – only need to break ties. Each node n has its own quorum Q(n) whose votes it must get to obtain lock. For all n and n', Q(n) intersects Q(n'). Node(s) in intersection of Q(n) and Q(n') decide between the requests of n and n' – safety. Can make |Q(n)| about sqrt(N) for N nodes. Can also make the number of quorums of which a node n is a member about sqrt(N), so no node is overloaded with requests. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
43
A Token Ring Algorithm Figure (a) An unordered group of processes on a network. (b) A logical ring constructed in software. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
44
Tree Algorithm (1) Form a tree of all nodes with the node currently holding the token as the root. Each node maintains a pointer to current parent and list of children, along with queue of requests (initially empty) and state (hold token or not). If node n wants get lock, checks if it has token. If so, then enter CS, else check if queue is empty. If so, send request to parent. Put self on queue. If receive request, have token, and not in CS, send token to requester and make it parent. Else send request to parent (if not self) and enqueue request. When done with CS or receive token, send to first node in queue and make it new parent. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
45
Tree Algorithm (2) T Node 5 has token, is root of tree.
1 2 3 4 5 6 7 T 8 8 Node 5 has token, is root of tree. Node 8 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
46
Tree Algorithm (3) 1 2 3 4 5 6 7 T 8 7 8 8 Node 6 receives 8's request, enqueues 8 and requests token. Node 7 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
47
Tree Algorithm (4) T Node 6 receives request, enqueues 7.
1 2 3 T 4 5 6 7 87 7 8 8 Node 6 receives request, enqueues 7. Node 5 receives request, sends token to 6, makes 6 parent Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
48
Tree Algorithm (5) 3 1 2 3 4 5 6 7 T 7 7 8 8 Node 6 receives token, dequeues 8, sends 8 token, with request. Node 3 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
49
Tree Algorithm (6) T Node 7 receives request, enqueues 3.
1 2 3 4 5 6 7 7 73 8 T 6 Node 7 receives request, enqueues 3. Node 8 receives token, enters CS, makes self root, enqueues 6. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
50
Tree Algorithm (7) 3 1 2 3 1 4 5 6 7 T 7 73 8 Node 8 leaves CS, dequeues 6, sends 6 token, makes 6 parent. Node 1 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
51
Tree Algorithm (8) 3 1 2 3 1 T 4 5 6 7 1 73 8 Node 6 receives token, dequeues 7, sends 7 token, makes 7 parent. Node 5 receives request, enqueues 1, requests token. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
52
Tree Algorithm (9) 3 1 2 3 1 T 4 5 6 7 3 1 5 8 Node 7 receives token, dequeues self, enters CS, makes self parent. Node 6 receives request, enqueues 5, requests token Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
53
Tree Algorithm (10) T Node 7 receives request, enqueues 6.
3 1 2 3 1 T 4 5 6 7 36 1 5 4 8 Node 7 receives request, enqueues 6. Node 4 requests token, enqueues self. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
54
Tree Algorithm (11) T Node 5 receives request, enqueues 4.
3 1 2 3 1 T 4 5 6 7 6 14 5 4 8 Node 5 receives request, enqueues 4. Node 7 leaves CS, dequeues 3, sends 3 token, makes 3 parent. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
55
A Comparison of the Four Algorithms ... plus two more...
Figure A comparison of three mutual exclusion algorithms. Coteries ~2 sqrt(n) ~2 sqrt(n) crash of p in quorum Token Tree to log n to 2 log n lost token, p crash Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
56
Leader Election Why? How? Often simpler to use centralized approach
Can get around SPoF and congestion by using hierarchies How? Bully algorithm Ring algorithm Variations on these Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
57
Election Algorithms The Bully Algorithm
P sends an ELECTION message to all processes with higher numbers. If no one responds, P wins the election and becomes coordinator. If one of the higher-ups answers, it takes over. P’s job is done. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
58
The Bully Algorithm (1) Figure The bully election algorithm. (a) Process 4 holds an election. (b) Processes 5 and 6 respond, telling 4 to stop. (c) Now 5 and 6 each hold an election. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
59
The Bully Algorithm (2) Figure The bully election algorithm. (d) Process 6 tells 5 to stop. (e) Process 6 wins and tells everyone. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
60
Figure 6-21. Election algorithm using a ring.
A Ring Algorithm Figure Election algorithm using a ring. Note that here, each message collects node IDs as it circulates the ring – the winner will know all live nodes at end; only the best candidate ID must be circulated, and new msg propagated only if its candidate is better Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
61
Ring Algorithm 1 2 Elect 2 7 3 Elect 6 6 4 5 Nodes 6 and 2 initiate elections, believing node 7 (previous leader) to be dead. Here, only the best candidate ID is circulated. A node enters election if it detects leader failure or if it receives an election message. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
62
Ring Algorithm 1 2 Elect 6 7 3 Elect 3 6 4 5 Nodes 0 and 3 receive election messages, enter election. A node sends an election message if it initiates or if it receives a message with a better candidate. Node 0 continues with 6 as the best candidate, while node 3 considers itself the best candidate. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
63
Ring Algorithm 1 2 Elect 6 7 3 6 4 Elect 4 5 Nodes 1 and 4 receive election messages, enter election. Node 1 continues with 6 as the best candidate, while node 4 considers itself the best candidate. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
64
Ring Algorithm 1 2 Elect 6 7 3 6 Elect 5 4 5 Nodes 5 and 2 receive election messages, 5 enters election (2 is already in). Node 2 realizes that 6 is a better candidate than the best it has seen (i.e., 2) so forwards message, while node 5 considers itself the best candidate. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
65
Ring Algorithm 1 2 Elect 6 7 3 6 Elect 5 4 5 Nodes 6 and 3 receive election messages, are both already in. Node 3 realizes that 6 is a better candidate than the best it has seen (i.e., 3) so forwards message, while node 6 knows it has seen a better candidate (itself). Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
66
Ring Algorithm 1 2 7 3 Elect 6 6 4 5 Node 4 receives election message, but is already in. Node 4 realizes that 6 is a better candidate than the best it has seen (i.e., 4) so forwards message, while node 6 waits, having suppressed the election message from 5. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
67
Ring Algorithm 1 2 7 3 6 Elect 6 4 5 Node 5 receives election message, but is already in. Node 5 realizes that 6 is a better candidate than the best it has seen (i.e., 5) so forwards message. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
68
Ring Algorithm 1 2 7 3 6 4 Elect 6 5 Node 6 receives election message with its own ID as the best candidate seen. Since only the candidate itself will start an election message with its own ID, it knows there is no better candidate in the ring and it has won! Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
69
Ring Algorithm 1 2 6 won! 6 won! 6 won! 7 3 6 won! 6 won! 6 won! 6 4 etc. 5 The winner knows it won when it sees its ID on a message, and then circulates an election results message around the ring with its ID. The results message causes each node to exit the election and return to normalcy. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
70
Improving Elections Elections are generally held when a node fails -
Use of heartbeats to detect failure – make “better” candidates more sensitive (i.e., time out sooner) so only one node starts an election Node starting an election can just contact the best candidate it believes to be alive, plus the better ones it thinks have failed (to be sure) To avoid lengthy delays, the initiator can contact the K best candidates, or If the best “live” candidate(s) don't respond, it can escalate to contact more, lower-tier candidates Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
71
Elections vs. DME While both types of protocol end up selecting a single winner, there are some important differences: A DME participant keeps trying until it wins; this is not needed in leader election (any winner will do) There is no issue of fairness in leader election All nodes need to know the winner when leader election is done; this is not needed in DME Hence DME can be more message-efficient (i.e., sublinear) compared to leader election Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
72
Elections in Wireless Environments (1)
Figure Election algorithm in a wireless network, with node a as the source. (a) Initial network. (b)–(e) The build-tree phase Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
73
Elections in Wireless Environments (2)
Figure Election algorithm in a wireless network, with node a as the source. (a) Initial network. (b)–(e) The build-tree phase Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
74
Elections in Wireless Environments (3)
Figure (e) The build-tree phase. (f) Reporting of best node to source. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
75
Elections in Large-Scale Systems (1)
Requirements for superpeer selection: Normal nodes should have low-latency access to superpeers. Superpeers should be evenly distributed across the overlay network. There should be a predefined portion of superpeers relative to the total number of nodes in the overlay network. Each superpeer should not need to serve more than a fixed number of normal nodes. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
76
Elections in Large-Scale Systems (2)
Figure Moving tokens in a two-dimensional space using repulsion forces. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.