Project 2 Review (Part 2) Ananth Rao. Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

24-1 Chapter 24. Congestion Control and Quality of Service (part 1) 23.1 Data Traffic 23.2 Congestion 23.3 Congestion Control 23.4 Two Examples.
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Node Lookup in Peer-to-Peer Network P2P: Large connection of computers, without central control where typically each node has some information of interest.
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Congestion Control Created by M Bateman, A Ruddle & C Allison As part of the TCP View project.
Page 1 Mutual Exclusion* Distributed Systems *referred to slides by Prof. Paul Krzyzanowski at Rutgers University and Prof. Mary Ellen Weisskopf at University.
Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Lab 4: Simple Router CS144 Lab 4 Screencast May 2, 2008 Ben Nham Based on slides by Clay Collier and Martin Casado.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #4 Mobile Ad-Hoc Networks AODV Routing.
Cs4411 – Operating Systems Practicum November 4, 2011 Zhiyuan Teo Supplementary lecture 4.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
CSEE W4140 Networking Laboratory Lecture 4: IP Routing (RIP) Jong Yul Kim
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
RFC 2453 RIP 2 (Routing Information Protocol) Daher Kaiss.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
William Stallings Data and Computer Communications 7 th Edition (Selected slides used for lectures at Bina Nusantara University) Error Control.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Spring Routing & Switching Umar Kalim Dept. of Communication Systems Engineering 06/04/2007.
EE 122: A Note On Joining Operation in Chord Ion Stoica October 20, 2002.
Aodv. Distance vector routing Belman principle AODV - overview Similar to DSR –On demand –Route request when needed and route reply when a node knows.
Election Algorithms. Topics r Issues r Detecting Failures r Bully algorithm r Ring algorithm.
Project 2: A P2P Chat Application Review Ananth Rao.
CSE 486/586 CSE 486/586 Distributed Systems PA Best Practices Steve Ko Computer Sciences and Engineering University at Buffalo.
ICMP (Internet Control Message Protocol) Computer Networks By: Saeedeh Zahmatkesh spring.
Effizientes Routing in P2P Netzwerken Chord: A Scalable Peer-to- peer Lookup Protocol for Internet Applications Dennis Schade.
Distance Vector Routing Protocols W.lilakiatsakun.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
1 Spring Semester 2009, Dept. of Computer Science, Technion Internet Networking recitation #3 Mobile Ad-Hoc Networks AODV Routing.
Ad hoc On-demand Distance Vector (AODV) Routing Protocol ECE 695 Spring 2006.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
1 Chapter Six - Errors, Error Detection, and Error Control Chapter Six.
AODV: Introduction Reference: C. E. Perkins, E. M. Royer, and S. R. Das, “Ad hoc On-Demand Distance Vector (AODV) Routing,” Internet Draft, draft-ietf-manet-aodv-08.txt,
Chord Advanced issues. Analysis Theorem. Search takes O (log N) time (Note that in general, 2 m may be much larger than N) Proof. After log N forwarding.
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
CSE 123 Discussion 10/05/2015 Updated from Anup and Narendran’s excellent notes.
Chord Advanced issues. Analysis Search takes O(log(N)) time –Proof 1 (intuition): At each step, distance between query and peer hosting the object reduces.
Consensus and leader election Landon Cox February 6, 2015.
TCP OVER ADHOC NETWORK. TCP Basics TCP (Transmission Control Protocol) was designed to provide reliable end-to-end delivery of data over unreliable networks.
Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable.
Page 1 Mutual Exclusion & Election Algorithms Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content.
CS162 Operating Systems and Systems Programming Lecture 14 Key Value Storage Systems March 12, 2012 Anthony D. Joseph and Ion Stoica
Ad Hoc On-Demand Distance Vector Routing (AODV) ietf
RIP Routing Protocol. 2 Routing Recall: There are two parts to routing IP packets: 1. How to pass a packet from an input interface to the output interface.
Fault Tolerance (2). Topics r Reliable Group Communication.
Mutual Exclusion Algorithms. Topics r Defining mutual exclusion r A centralized approach r A distributed approach r An approach assuming an organization.
Understanding IPv6 Slide: 1 Lesson 5 ICMPv6. Understanding IPv6 Slide: 2 Lesson Objectives Purpose of ICMPv6 and the structure of all ICMPv6 messages.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
The Cost of Inconsistency in Chord Shelley Zhuang, Ion Stoica, Randy Katz OASIS/i3 Retreat, January 2005.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
1 Distributed Hash tables. 2 Overview r Objective  A distributed lookup service  Data items are distributed among n parties  Anyone in the network.
Distributed Systems Lecture 6 Global states and snapshots 1.
RIP.
A Scalable Peer-to-peer Lookup Service for Internet Applications
5.2 FLAT NAMING.
Chord Advanced issues.
ECE 544 Project3 Team member: BIAO LI, BO QU, XIAO ZHANG 1 1.
Chord Advanced issues.
Staged Refresh Timers for RSVP
Chord Advanced issues.
P1 : Distributed Bitcoin Miner
Consistent Hashing and Distributed Hash Table
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Project 2 Review (Part 2) Ananth Rao

Overview Stabilize and Notify Join (slides stolen from lecture) Coding Trivia Bootstrapping and debugging

Identifier to Node Mapping Example Node 8 maps [5,8] Node 15 maps [9,15] Node 20 maps [16, 20] … Node 4 maps [59, 4]

Routing Each node maintains its successor Route packet (ID, data) to the node responsible for ID using successor pointers send(34,data)

Stabilize Sent to the current successorNode periodically “Request” for a notify packet from the successor

Notify Sent in reply to the stabilize packet. Helps build a list of k-successors at the predecessor.

Stabilize-Notify Direct communication only with immediate successor and predecessor You receive only “n th” hand info about the n th successor It takes n*STABILIZE_PERIOD for a change in the n th successor to get propagated

Dealing with failures What happens when successorNode fails.. –Timeout while waiting to receive a notify –Shift successorNode list by one What happens when predecssorNode fails –Timeout on receiving a stabilize from the prececessor

Dealing with failures (cont.) We use fine-grained timers for detecting successor failures We use a coarse-grained timer for detecting a predecessor failure –Predecessor is not useful for forwarding anyway –A fine-grained timer is not useful unless we maintain a list of precessors

Joining Operation Node 50 asks node 15 to forward join message When join(50) reaches the destination (i.e., node 58), node 58 returns a notify message to node 50 Node 50 updates its successor to 58 join(50) notify(58) succ=58

Joining Operation (cont’d) Node 50 sends a stabilize to Node 58. The predecessor gets updated at Node 58 Node 44 sends a stabilize message to its successor, node 58 Node 58 reply with a notify message Node 44 updates its successor to 50 succ=58 stabilize() notify(predecessor=50) succ=50 pred=50

Joining Operation (cont’d) Node 44 sends a stabilize message to its new successor, node 50 Node 50 sets its predecessor to node 44 succ=58 succ=50 Stabilize() pred=44 pred=50

Joining Operation (cont’d) This completes the joining operation! succ=58 succ=50 pred=44 pred=50

Stabilize-Notify-Join Very simple Easy to code Can handle concurrent joins and failures –Try a few examples.. It may a take a few more STABILIZE_PERIODS to converge, but will eventually converge

Stabilize-Notify-Join (cont.) Not easy to understand –When you get it.. you get it. Very hard to debug Hard to bootstrap –Lots of corner cases when there are less than k- nodes in the ring

Coding Advice Checkpoint submissions better than expected :-) No major flaws Be careful with timers –“select” returns “no sooner than the requested timeout period” –Each function call takes time!! –Careful in dealing with negative struct timeval More feedback coming soon.. –Watch the newsgroup over the weekend :-(.

Problems with timers After handing the event at the head of the queue.. –Get current time again –Check the “due time” of the next event in the queue

Timers for stabilize Time out for receiving a notify When to send the next stabilize –Keep track of lastStabilizeSentTime –Use MIN(lastStabilizeSentTime+STABILIZE_PERIOD- currTime, nextEventDueTime) for timeout to select –Careful when the successorNode changes

Debugging Tips Most problems occur when bootstrapping the ring Prefer cerr/fprintf debugging to using gdb –If you set a breakpoint in gdb, every other program on the ring is going to timeout for some reason or the other In the beginning, you may want to increase timers to large values

Testing with lost packets With large timeouts –Use keyboard input to determine whether or not to send a packet –Make sure STABILIZE_PERIOD > (MAX_STABILIZE_RETRIES+1) * STABILIZE_TIMEOUT Use randomized drops with a small drop percentage

Go step-by-step Before implementing join, try and implement stabilize and notify –Start with a predetermined ring –Start with only one successor in command line, but the list should soon grow (because of stabilize-notify) –Detect failures only (no new nodes) –Use large (1s) timeout so don’t have to start all “chatpeers” at exactly the same time Helps get rid of bootstrapping artifacts in the first step