Presentation is loading. Please wait.

Presentation is loading. Please wait.

Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging.

Similar presentations


Presentation on theme: "Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging."— Presentation transcript:

1 Project 2 Review (Part 2) Ananth Rao

2 Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

3 Identifier to Node Mapping Example Node 8 maps [5,8] Node 15 maps [9,15] Node 20 maps [16, 20] … Node 4 maps [59, 4] 4 20 32 35 8 15 44 58

4 Routing Each node maintains its successor Route packet (ID, data) to the node responsible for ID using successor pointers 4 20 32 35 8 15 44 58 send(34,data)

5 Stabilize Sent to the current successorNode periodically “Request” for a notify packet from the successor

6 Notify Sent in reply to the stabilize packet. Helps build a list of k-successors at the predecessor.

7 Stabilize-Notify Direct communication only with immediate successor and predecessor You receive only “n th” hand info about the n th successor It takes n*STABILIZE_PERIOD for a change in the n th successor to get propagated

8 Dealing with failures What happens when successorNode fails.. –Timeout while waiting to receive a notify –Shift successorNode list by one What happens when predecssorNode fails –Timeout on receiving a stabilize from the prececessor

9 Dealing with failures (cont.) We use fine-grained timers for detecting successor failures We use a coarse-grained timer for detecting a predecessor failure –Predecessor is not useful for forwarding anyway –A fine-grained timer is not useful unless we maintain a list of precessors

10 Joining Operation 4 20 32 35 8 15 44 58 50 Node 50 asks node 15 to forward join message When join(50) reaches the destination (i.e., node 58), node 58 returns a notify message to node 50 Node 50 updates its successor to 58 join(50) notify(58) succ=58

11 Joining Operation (cont’d) 4 20 32 35 8 15 44 58 50 Node 50 sends a stabilize to Node 58. The predecessor gets updated at Node 58 Node 44 sends a stabilize message to its successor, node 58 Node 58 reply with a notify message Node 44 updates its successor to 50 succ=58 stabilize() notify(predecessor=50) succ=50 pred=50

12 Joining Operation (cont’d) 4 20 32 35 8 15 44 58 50 Node 44 sends a stabilize message to its new successor, node 50 Node 50 sets its predecessor to node 44 succ=58 succ=50 Stabilize() pred=44 pred=50

13 Joining Operation (cont’d) 4 20 32 35 8 15 44 58 50 This completes the joining operation! succ=58 succ=50 pred=44 pred=50

14 Stabilize-Notify-Join Very simple Easy to code Can handle concurrent joins and failures –Try a few examples.. It may a take a few more STABILIZE_PERIODS to converge, but will eventually converge

15 Stabilize-Notify-Join (cont.) Not easy to understand –When you get it.. you get it. Very hard to debug Hard to bootstrap –Lots of corner cases when there are less than k- nodes in the ring

16 Coding Advice Checkpoint submissions better than expected :-) No major flaws Be careful with timers –“select” returns “no sooner than the requested timeout period” –Each function call takes time!! –Careful in dealing with negative struct timeval More feedback coming soon.. –Watch the newsgroup over the weekend :-(.

17 Problems with timers After handing the event at the head of the queue.. –Get current time again –Check the “due time” of the next event in the queue

18 Timers for stabilize Time out for receiving a notify When to send the next stabilize –Keep track of lastStabilizeSentTime –Use MIN(lastStabilizeSentTime+STABILIZE_PERIOD- currTime, nextEventDueTime) for timeout to select –Careful when the successorNode changes

19 Debugging Tips Most problems occur when bootstrapping the ring Prefer cerr/fprintf debugging to using gdb –If you set a breakpoint in gdb, every other program on the ring is going to timeout for some reason or the other In the beginning, you may want to increase timers to large values

20 Testing with lost packets With large timeouts –Use keyboard input to determine whether or not to send a packet –Make sure STABILIZE_PERIOD > (MAX_STABILIZE_RETRIES+1) * STABILIZE_TIMEOUT Use randomized drops with a small drop percentage

21 Go step-by-step Before implementing join, try and implement stabilize and notify –Start with a predetermined ring –Start with only one successor in command line, but the list should soon grow (because of stabilize-notify) –Detect failures only (no new nodes) –Use large (1s) timeout so don’t have to start all “chatpeers” at exactly the same time Helps get rid of bootstrapping artifacts in the first step


Download ppt "Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging."

Similar presentations


Ads by Google