Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems.

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Kademlia: A Peer-to-peer Information System Based on the XOR Metric.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 PASTRY Partially borrowed from Gabi Kliot ’ s presentation.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
The Impact of DHT Routing Geometry on Resilience and Proximity New DHTs constantly proposed –CAN, Chord, Pastry, Tapestry, Plaxton, Viceroy, Kademlia,
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Alex Shraer, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Tutorial 4: SkipNet Spring.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
SkipNet Christian Schmidt-Madsen, Peter Tiedemann,
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
The Impact of DHT Routing Geometry on Resilience and Proximity Krishna Gummadi, Ramakrishna Gummadi, Sylvia Ratnasamy, Steve Gribble, Scott Shenker, Ion.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
The Impact of DHT Routing Geometry on Resilience and Proximity Krishna Gummadi, Ramakrishna Gummadi, Sylvia Ratnasamy, Steve Gribble, Scott Shenker, Ion.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 13: SkipNet Spring.
SCALLOP A Scalable and Load-Balanced Peer- to-Peer Lookup Protocol for High- Performance Distributed System Jerry Chou, Tai-Yi Huang & Kuang-Li Huang Embedded.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
SkipNet: A Scaleable Overlay Network With Practical Locality Properties Presented by Rachel Rubin CS294-4: Peer-to-Peer Systems By Nicholas Harvey, Michael.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
P2P Course, Structured systems 1 Skip Net (9/11/05)
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Tutorial 3: SkipNet Spring.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
Mobile Ad-hoc Pastry (MADPastry) Niloy Ganguly. Problem of normal DHT in MANET No co-relation between overlay logical hop and physical hop – Low bandwidth,
Other Structured P2P Systems CAN, BATON Lecture 4 1.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
1 PASTRY. 2 Pastry paper “ Pastry: Scalable, decentralized object location and routing for large- scale peer-to-peer systems ” by Antony Rowstron (Microsoft.
The Impact of DHT Routing Geometry on Resilience and Proximity K. Gummadi, R. Gummadi..,S.Gribble, S. Ratnasamy, S. Shenker, I. Stoica.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel, Middleware 2001.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Peer to Peer Network Design Discovery and Routing algorithms
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Controlling the Cost of Reliability in Peer-to-Peer Overlays
Accessing nearby copies of replicated objects
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
P2P: Distributed Hash Tables
Presentation transcript:

Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems

Introduction-P2P Network A peer-to-peer (P2P) network is a distributed system in which peers employ distributed resources to perform a critical function in a decentralized fashion [LW2004] Classification of P2P networks  Unstructured and Structured  Centralized and Decentralized  Hierarchical and Non-Hierarchical

Structured P2P network Distributed hash table (DHT)  DHT is a structured overlay that offers extreme scalability and hash-table-like lookup interface CAN, Chord, Pastry Other techniques  Skip list Skipgraph, SkipNet

Outline Hashed based techniques in P2P  Hashed based structured P2P system Pastry P-Grid  Two important issues Load balancing Neighbor table consistency preserving  Comparison of DHT techniques Skip-list based system  SkipNet Conclusion

Outline Hashed based techniques in P2P  Hashed based structured P2P system Pastry P-Grid  Two important issues Load balancing Neighbor table consistency preserving  Comparison of DHT techniques Skip-list based system  SkipNet Conclusion

Pastry [RD2001] Pastry is a P2P object location and routing scheme  Hash-based Properties  Completely decentralized  Scalable  Self-organized  Fault-resilient  Efficient search

Design of Pastry nodeID: each node has a unique numeric identifier (128 bit)  Assigned randomly Nodes with adjacent nodeIDs are diverse in geography, ownership, etc Assumption: nodeID is uniform in the ID space  Presented as a sequence of digits with base 2 b b is a configuration parameter (4)

Design of Pastry (cont’) Message/query has a numeric key of same length with nodeIDs  Key is presented as a sequence of digits with base 2 b Route: a message is routed to the node with a nodeID that is numerically closest to the key

Message Key = 10 Destination of Routing Destination node

Pastry Schema Given a message of key k, a node A forwards the message to a node whose ID is numerically closest to k among all nodes known to A Each node maintains some routing state

Pastry Node State A leaf set L A routing table A neighborhood set M LARGER SMALLER NodeID Routing table Leaf set Neighborhood set

Meanings of ‘Close’ Closest according to proximity metric (real distance ) Nearest Neighbor Closest according to numerical meaning Node with closet nodeID

Pastry Node State A leaf set  |L| nodes with closest nodeIDs |L|/2 larger ones and |L|/2 smaller ones  Useful in message routing A neighborhood set  |M| nearest neighbors  Useful in maintaining locality properties

LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Leaf Set and Neighborhood Set In this example b=2, l=8 |L| = 2 × 2 b = 8 |M| = 2 × 2 b = 8 SMALLERLARGER A

Routing Table l rows and 2 b columns  i th row: i-prefix  j th column: next digit after the prefix is j b=2 l=8 - > 8 rows and 4 columns LARGER SMALLER NodeID Routing table Leaf set Neighborhood set nd NodeID j=0 j=1 j=3 A

LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Routing Step1: If k falls within the range of nodeIDs covered by A’s leaf set, forwarded it to a node in the leaf set whose nodeID is closest to k Eg. k = falls in the range ( , ) Forword it to node If k is not covered by the leaf set, go to step2 A

LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Routing Step2: The routing table is used and the message is forwarded to a node whose ID shares a longer prefix with the k than A’s nodeID does Eg. k = forward it to node If the appropriate entry in the routing table is empty, go to step3 A

Step3: The message is forwarded to a node in the leaf set, whose ID has the same shared prefix as A but is numerically closer to k than A Eg. k = If such a node does not exist, A is the destination node LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Routing A forward it to node

Routing The routing procedure always converges, since each step chooses a node that  Shares a longer prefix  Shares the same long prefix, but is numerically closer Routing performance  The expected number of routing steps is log 2 b N  Assumption: accurate routing tables and no recent node failures

Performance Average number of routing hops versus number of Pastry nodes b = 4, |L| = 16, |M| =32 and 200,000 lookups.

Discussion of Pastry Pastry: the parameters make it flexible  b is the most important parameter that determines the power of the system Trade-off between the routing efficient (log 2 b N) and routing table size (log 2 b N×2 b )  Each node can choose its own |L| and |M| based on the node situation

Local optimal?? Eg. k = Discussion of Pastry – routing schema LARGER SMALLER NodeID Routing table Leaf set Neighborhood set A Y’ nodeID = Dis(k, X’ID) = ( , ) = 32 Dis(k, Y’ID) = ( , ) = 1 X’ nodeID = Local optimal node is Y Pastry forward to node X

P-Grid [Aberer2001] P-Grid is a scalable access structure for P2P  Hash-based & virtual binary search tree  Randomized algorithms are used for constructing the access structure Virtual binary tree 1 :3 01:2 1 :5 01:2 0 :6 11:5 0 :2 11:5 1 :4 00:6 0 :6 10:4 Query k=100 4

P-Grid (cont’) Properties  Complete decentralized  Scalable with the total number of nodes and data items  Fault-resilient, search is robust against failures of nodes  Efficient search

Discussion of Pastry and P-Grid The two system both make uniform assumption  Pastry: ID space  P-Grid: data distribution and behavior on peer If data/message/query distribution is skewed, Pastry and P-Grid are not able to balance the load

Outline Hashed based techniques in P2P  Hashed based structured P2P system Pastry P-Grid  Two important issues Load balancing Neighbor table consistency preserving  Comparison of DHT techniques Skip-list based system  SkipNet Conclusion

Load Balancing Consider a DHT P2P system with N nodes  Θ(logN) imbalance factor if items IDs are uniformly distributed [SMKKB2001]  Even worse if applications associate semantics with the item IDs IDs would no longer be uniformly distributed How to  Minimize the load imbalance?  Minimize the amount of load moved?

Load Balancing Challenges  Data items are continuously inserted/deleted  Nodes join and depart continuously  The distribution of data item IDs and item sizes can be skewed Solution—[GLSKS2004]

Load Balancing Virtual server  Represents a peer in the DHT rather than physical node  A physical node hosts one or more virtual server  Total load of virtual servers = load of node  E.g., in Chord Virtual Server FT 1 FT 3 Node: Physical Node

Load Balancing Basic idea  Directories To store load information of the peer nodes Periodically schedule reassignments of virtual servers Distributed load balancing problem Centralized problem at each directory reduced to

Load Balancing Load balancing algorithm Directory ID (known to all nodes) Nod e Computes a schedule of virtual server transfers among nodes contacting it in order to reduce their maximal utilization Delay T time Receives information from nodes Randomly chooses a directory Send to directory:(1)Loads of all virtual servers that it is responsible for (2)Capacity directory in new cycle OR utilization>K e yes Emergency load balancing

Load Balancing Load balancing algorithm (cont.)  Computing optimal reassignment is NP- complete  Greedy algorithm O(mlogm) For each heavily loaded node, move the least loaded virtual server to pool For each virtual server in pool, from heaviest to lightest, assign to a node n which minimizes the resulting load

Load Balancing Performance  Tradeoff: Load movement vs. Load balancing Load balancing: max node utilization When T decreases  Max node utilization decreases  Load movement increases  Effective in achieving load balancing for System utilization as high as 90% Only transfer 8% of the load that arrives in the system  Emergency load balancing is necessary

Consistency Preserving Neighbor table  A table of neighbor pointers  For efficient routing in a P2P system Challenge  How to maintain consistent neighbor tables in a dynamic network where nodes may join, leave and fail concurrently and frequently?

Consistency Preserving Consistent network  For every entry in neighbor tables, if there exists at least one qualified node in the network, then the entry stores at least one qualified node Qualified node for an entry of a node’s neighbor table: the node whose ID has suffix same as the required suffix of that entry  Otherwise, the entry is empty

Consistency Preserving K-consistent network  For every entry in neighbor tables, if there exist H qualified nodes in the network, then the entry stores at least min(K,H) qualified nodes  Otherwise, the entry is empty For K>0, K-consistency => consistency 1-consistency = consistency

Consistency Preserving General strategy  Identify a consistent subnet as large as possible  Only replace a neighbor with a closer one if both of them belong to the subnet  Expand the consistent subnet after new nodes join  Maintain consistency of the subnet when nodes fail

Consistency Preserving Approach of [LL2004b]  To design a join protocol such that An initially K-consistent network remains K- consistent after a set of nodes join process terminate The termination of join implies the node joined belong to this consistent subnet  To design a failure recovery protocol that Recovers K-consistency of the subnet by repairing holes left by failed neighbors with qualified nodes in the subnet Protocol is presented in the paper [LL2004a], but integrated with join in experiment of this paper

Consistency Preserving Join protocol  Each node has a status copying, waiting, notifying, cset_waiting, in_system S-node: node in status in_system T-node: otherwise  All S-nodes form a consistent subnet

Consistency Preserving copying waiting notifying cnet_wating in_system Copy neighbor infor from S-nodes to fill in most entries of its table level by level. When cannot find a qualified S-node for a level i>=1 Try to find an S-node which shares at least the rightmost i-1 with x and stores x as a neighbor When find such a node, say y Seek and notify nodes that share the rightmost j digits with it, where j is the lowest level that x is stored in y’s table When finish notifying Wait for the nodes joining currently and are likely to be in the same consistent subnet When confirm all nodes have exited notifying status

Consistency Preserving Performance  p-ratio In x’s table, the primary-neighbor of the entry is y, the true primary-neighbor should be z p-ratio = delay from x to y / delay from x to z  K-consistency is always maintained in all experiments  When K increases, p-ratio decreases More neighbor infor is stored => more messages  Even with massive joins and failures, tables are still optimized greatly

Outline Hashed based techniques in P2P  Hashed based structured P2P system Pastry P-Grid  Two important issues Load balancing Neighbor table consistency preserving  Comparison of DHT techniques Skip-list based system  SkipNet Conclusion

Comparing DHTs [DGPR2003] Each DHT Algorithm has many details making it difficult to compare. We will use a component-base analysis approach  Break DHT design into independent components  Analyze impact of each component choice separately Two types of components  Routing-level : neighbor & route selection  System-level : caching, replication, querying policy, latency

Metrics Used Metrics used in comparison  Flexibility – Options in choosing neighbors and routes  Resilience – Does it route when nodes goes down ?  Load balancing – Is the content distributed ?  Proximity & Latency – Is the content stored nearby ? Aspects of DHT  Geometry - a structure that inspires a DHT design,  Distance function –distance between two nodes  Algorithm: rules for selecting neighbors and routes using the distance function

Algorithm & Geometry What is routing algorithm & geometry ?  Routing Algorithm – refers to exact rules for selecting neighbors, routes. (eg. Chord, CAN, PRR, Tapestry, Pastry)  Geometries – refers to the algorithms’ underlying structure derived from the way in which neighbors and routes are chosen. (Eg. Chord routes on a ring). Why is geometry important ? Geometry capture flexibility in selection of neighbors and routes.  Neighbor selection – Does the geometry choose neighbors based on proximity ? Leads to shorter paths.  Route selection – Number of options for selecting next hops. Leads to shorter, reliable paths.

DHT Algorithms Analysis The table summarizes the geometries & algorithms. We will examine the metric flexibility in these two aspects  Flexibility in neighbor selection  Flexibility in route selection GeometryAlgorithm TreePRR HypercubeCAN ButterflyViceroy RingChord XORKademlia HybridPastry root root

Tree Geometry root PRR uses tree geometry. Distance between two nodes is the depth of the binary tree (Well-balanced tree : log N) Node selection flexibility - has 2 (i-1) options of choosing neighbor at distance i. No routing flexibility Height = 1 Height = 2 Leafset

Hypercube Geometry CAN uses a d-torus hypercube. Each node has log n neighbor. Routing greedily by correcting bits in any order. Neighbors differ by exactly one bit. No flexibility in choosing neighbors. Routing from source to destination at log n distance. First node has log n next hop choices, second hop has log (n – 1) choices. Hence (log n)! choices

Butterfly Geometry Viceroy uses butterfly geometry. Nodes organized in a series of log n “stages” where all the nodes at stage i are capable of correcting the i th bit. Routing consists of 3 phases. Done in O(log N) hops No flexibility in route selection and neighbor selection.

Ring Geometry Chord uses the Ring Maintain log n neighbors and routes to arbitrary destination in log n hops. Routing in O(log n) hops Flexibility in neighbor selection, has 2 (i-1) possible options to pick its i th neighbor An approx of n log n / 2 possible routing tables for each node Yields (log n)! possible routes to route from a source to destination of distance log n

Ring Geometry To route from 000 to 110, we have two routes.  Route to 100 and then to 110.  Route to 010 and then to 110.

XOR Kademlia uses XOR Geometry. Distance between nodes is XOR of their identifier. Node has 2 (i-1) options of choosing neighbor at i th distance. Yields approx n log n / 2 entries per routing table. Route flexibility by fixing lower order bits before fixing the higher bits if an optimal path is not available. May result in longer distances as as the lower order bits fixed need not be preserved by later routing.

Hybrid Pastry is a hybrid. Its nodes are regarded as both leaves of a binary tree and points to a one-dimensional circle. Distance between nodes is either the tree distance and cyclic distance between nodes Node has 2 (i-1) options of choosing neighbor at distance i. Yields approx n ((log n) / 2) entries per routing table. Route selection freedom – allowed to take hops on the ring – these paths might not retain the O(log n) bound on routes. root

Flexibility Overview PropertyTreeHypercubeRingButterflyXorHybrid Neighbor selectionn log n / Route Selection (optimal)1c1(log n) 111 Natural support for sequential neighbors? no yesno Deafult – no Fallback – yes Ring & Hypercube have twice the routing flexibilities than Hybrid & XOR geometries

Resilience Two aspects of robust routing  Static resilience measures how well the algorithm can route in a dynamic environment before the recovery algorithms.  Dynamic recovery measures how quickly states are recovered after failure. Node failure- 30% failure  Tree - 90% routes failed (no route selection flexibility)  Ring, Hypercube – 7% routes failed (most route selection flexibility)  Hybrid, XOR - 20% route failed (half flexibility as ring) Route Selection Flexibility affects static resilience

Path Latency Goal is to minimise end-to-end latency of overlay networks. Two proximity methods are considered.  Proximity Neighbor Selection (PNS) Neighbors are chosen on their proximity.  Proximity Route Selection (PRS) Routes are selected depending on the proximity of the neighbors PNS achieves improvement over PRS which achieves improvement over Plain version. Geometry does not affect performance of PNS / PRS.  Thus it is important to choose a routing algorithm that has a geometry that accommodates PNS.

Local Convergence Does messages sent from two nodes to the same destination converge at a node near the two sources ? Leads to low latencies in the following:  Overlay Multicast  Caching  Server selection Measured by number of exit points in the network.  Best case, only one node sends a message off-domain.

Limitations & Findings Limitations  Author has not considered all geometries  Not considered other factors and performance metrics Findings  Routing geometry is important.  Flexibility is improves resilience & proximity. Why not the RING ?  Great flexibility to choose neighbors and routes. Implement both the proximity methods PNS & PRS.  Highest performance in resilience tests and is as good as other geometry in path lengths and local convergence.

Outline Hashed based techniques in P2P  Hashed based structured P2P system Pastry P-Grid  Two important issues Load balancing Neighbor table consistency preserving  Comparison of DHT techniques Skip-list based system  SkipNet Conclusion

Skip List [PSL1990] Skip list are data structures that can be used in place of balanced trees. Uses probabilistic balancing techniques hence algorithms are simpler and faster. Described as a sorted linked list in which some nodes are supplemented with pointers that skip over many list elements. HDR NIL

Perfect Skip List A perfect skip list is one where the height of the i th node is the exponent of the largest power-of-two that divides i. Pointers at level h have length 2 h. A perfect skip list supports searches in O(log N). Because it is expensive to perform insertion and deletions in a perfect skip list, a probabilistic balanced skip list is proposed by consulting a random number generator. HDR NIL Height is 2 : (2 2 ) Height is 3 : (2 3 ) Level 2 pointer skips over 2 2 nodes

Examples HDRNIL Add Node 10 (height is 1 chose randomly) HDRNIL 10 Add Node 5 (height is 0 chose randomly) HDRNIL 10 5 Add Node 8 (height is 2 chose randomly) HDRNIL Add Node 12 (height is 0 chose randomly) HDRNIL Add Node 2 (height is 0 chose randomly) HDRNIL

Search Skip List HDR NIL Search for Node 30. From HDR to Node 29. Then stop and search fails. (illustrated) Search for Node 23. From HDR to Node 16. Drop two levels, From Node 16 to Node 23. Found. Search for Node 27. From HDR to Node 16. Drop one level, From Node 16 to Node 25. Drop one level, from Node 25 to Node 27. Found.

Skip List Worst case performance when significantly unbalanced. Space efficient. Can use 1.33 pointers per element. Maintains a O(log N) searches with high probability. Comparison with AVL, recursive 2-3 & self adjust trees  Skip List performs more comparison than other methods.  Skip List is slightly slower than AVL trees in searches, but insertions and deletions in a skip list are faster Skip Lists are faster than self adjusting tree when a uniform distribution is encountered, but slower for highly skewed distributions

SkipNet Introduction [SNL2003] In DHTs, we cannot control where the data will be stored  Data might be stored far away from the administrative domain and thus hard to administer privileges. – Can we adapt ?  Gives rise to Denial of service attacks and traffic analysis. Solution : Use SkipNet - scalable overlay network that provides controlled data placement and guarantee routing locality by organizing data by string names  Content can be placed on pre-defined node or distributed uniformly across nodes of a hierarchical naming subtree.

Motivation Disadvantages of Chord, CAN, Tapestry, Pastry:  No Content locality: Explicitly place data on a specific overlay nodes or distribute it across nodes in a specified domain. Cannot be prone to traffic analysis & Denial of service attacks  No Path locality: Guarantees that routing path between two overlay nodes in a domain does not leave the domain. Additional security – the traffic does not passed on to other domain which could be its competitor. SkipNet provides both content & path locality.

How does SkipNet do it? Employs a string name and numeric ID space.  Node names and content identifier string mapped into name ID  Hashes of the node names and content identifiers mapped into the numeric ID. By arranging content in name ID order rather than dispersing it, we can achieve content & path locality.

Advantages of locality Improved availability  data stored within organisation and can search even if the network disjoints.  Resilience against Internet failures. Nodes within a cluster gracefully survives failures that disconnect clusters from the rest of the Internet (useful property of SkipNet) Performance  Searches are faster as data is stored near nodes. Manageability  facilitates control and maintenance in an administrative domain Security  Can deal with traffic analysis & denial of service attacks.

SkipNet Structure Adapts the skip list structure  Traversals start from any node  State and processing costs should be the same for all nodes  We use a Ring & doubly linked list. Other enhancements.  Each node also stored 2 log N pointers rather than a high variable number of pointers. SkipNet  Perfect : Pointers at level h point to nodes that are exactly 2 h nodes to the left and right.  Probabilistic : A node in level h probabilistically determines which ring it belongs to.

SkipNet Structure Level 2TT 1MX 0DZ SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown. A D M O T Z X V Level 2DD 1ZO 0XT

SkipNet Structure Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring Level L = 0 L = 1 L = 2 L = 3 The full SkipNet routing infrastructure for an 8 node system, including the ring labels.

Routing By Name ID Similar to search in Skip Lists  Message routed from highest level pointer in either clockwise / counter clockwise direction with name ID that are not past the destination value.  Terminates when messages arrives at a node whose name ID is closest to destination.  Because nodes are doubly linked, scheme routes either to left or right pointers depending on name ID’s.  Number of hops is O(log N)

Example Routing a message from Node A to Node V Path:  A (Level 2, clockwise)  T, “T” < “V”  T (Level 2, clockwise)  Failed  T (Level 1, clockwise)  Failed  T (Level 0, clockwise)  V. (Destination) Level 2TT 1MX 0DZ A D M O T Z X V 2DD 1ZO 0XT Level 2AA 1XM 0VO

Routing Algorithm SendMsg(nameID, msg) { if( LongestPrefix(nameID,localNode.nameID)==0 ) msg.dir = RandomDirection(); else if( nameID<localNode.nameID ) msg.dir = counterClockwise; else msg.dir = clockwise; msg.nameID = nameID; RouteByNameID(msg); } // Invoked at all nodes (including the source and // destination nodes) along the routing path. RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID, msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg); }

Routing By Numeric ID Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit. Messages forwarded from ring in level h, R h, to a ring in level h+1, R h+1, such that nodes in R h+1 share h+1 digits with destination numeric ID. Terminates when  Deliver message to node with numeric ID = key  If none of the nodes in R h share h+1 digits with destination numeric ID then we pick node with numeric ID that is closest to destination’s numeric ID. Number of message hops is O(log N),

Routing By Numeric ID E.g. Let Z = 1000, O = Route from A  Path: A(0000)  D (1100 – move up level)  O (1001 – move up level)  Z (1000)  O (1001 – closest match for 1011) (deliver). Ring 0000 Ring 0001 Ring 0100 Ring 0101 Ring 1000 Ring 1001 Ring 1100 Ring 1101 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring …………………. O

Routing Algorithm // Invoked at all nodes (including the source and destination nodes) along the routing path. // Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = false RouteByNumericID(msg) { if (msg.numID == localNode.numID || msg.finalDestination) { DeliverMessage(msg.msg); return; } if (localNode == msg.startNode) { // Done traversing current ring. msg.finalDestination = true; SendToNode(msg.bestNode); return; } h = CommonPrefixLen(msg.numID, localNode.numID); if (h > msg.ringLvl) { // Found a higher ring. msg.ringLvl = h; msg.startNode = msg.bestNode = localNode; } else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) { // Found a better candidate for current ring. msg.bestNode = localNode; } // Forward along current ring. nbr = localNode.RouteTable[clockWise][msg.ringLvl]; SendToNode(nbr); }

Benefits Skip Net support routing with the same data structure by  name ID  numeric ID Bottom ring is sorted by name ID and top rings are sorted by numeric ID. For a given node, the SkipNet rings to which it belongs to precisely form a Skip List that is a ring & double linked.

Node Joins & Departure Node Joins  A New node finds top level ring that matches its numeric ID.  Finds a neighbor in the top ring using name Id search.  Starting from one of the neighbors, it searches for its name ID at the next lower level and thus finds neighbors at lower level.  Repeated until it reaches root.  The existing nodes only point to the new node only after it has joined the root ring.  Insertion traverse O(log N) hops with high probability Node Departure  Can route correctly as long as root level ring is maintained. Other levels regarded as optimization hints and it maintains upper-ring membership thru background repair process.

Example Join - Insert node O (101)  Search by numeric ID 101 Highest attainable level is 2 O joins ring containing Z at level 2 Z forwards join message to D at next lower level 1  Proceed by searching by name ID in next lower levels D, V are neighbors in level 1 M, T are neighbors in level 0

Properties of SkipNet Content & Path Locality  Naming nodes like a DNS entry. Path locality for groups in which nodes share a single DNS suffix. E.g. reversing DNS names: john.microsoft.com becomes com.microsoft.john  Incorporating node name ID into content name gurantees that the content will be hosted on that node. E.g. com.microsoft.john/doc-name Constrained Load Balancing  Stored using two parts – a CLB Domain and CLB suffix For example a doc using the name msn.com/DataCenter!TopStories.html.  Searching node Search for node in the CLB Domain using name ID search. Then search by numeric ID for the hash of the CLB suffix constrained by domain ID. Search is constrained by a nameID prefix, we use the double link list. This type of search affect the performance by a factor of 2.  Performed over a naming subtree but not over arbitrary subset of nodes.

Properties of SkipNet Fault tolerance:  Only need to maintain correct neighbors at Level 0 Each node has 16 neighbors at Level 0. Level 0 repaired easily by contacting life nodes. Employs background stabilization mechanisms when failure  Failure across organizational boundaries only segments the overlay. Gracefully survives. Security:  Nodes cannot create global names containing suffix of registered domains.  Path locality avoids traffic analysis  However, outbound traffic still prone to analysis easily. Range queries:  Ability to perform queries over contiguous ring segments.

Enhancements Use Sparse & Dense Routing Table  Use a density parameter k & a non-binary random digit to the base k for numeric ID. Duplicate pointer elimination  Remove duplicate pointers in the routing table. 25% improvements can be achieved. Incorporate Network proximity for routing by name id  Introduce a P-table for proximity routing. The goal of P-table is to maintain routing in O(log ) hops.  Ensures that each hop has low latency. Keeps track of the network distance that are close to itself.

Enhancements Incorporate Network proximity for routing by numeric id  Add a C-table to incorporate network proximity when searching by numeric ID.  Keeps track of nodes that are close and within CLB domain.

Design Alternative IP routing & DNS o Content placement by routing using IP and DNS lookup. Single Overlay Network o Content locality, we name node with the hash of the data’s object’s name. Requires separate routing table for each object o Use 2 part naming scheme –content name consist of node addresses concatenated with node-relative names. Does not support guaranteed path locality o Add constraints to message to limit path locality. However prevents routing from being consistent. o Use a 2 part segments, use numeric ID and name ID like SkipNet. Result is a static form of constrained load balancing.

Design Alternative Multiple overlay network o Multiple overlays with membership could be considered. o Requires that access to other overlays are by gateways. o Access to data is constrained and load balanced within a single overlay not accessible to clients outside except via gateways. SkipNet provides explicit content placement, allows clients to dynamically define new DHTs over any name prefix scope and guarantees path locality within shared name prefix within a single infrastructure.

Experiments The author run experiments against the following:  Basic SkipNet using only R-Table  Full SkipNet using R-Table, P-Table, C-Table.  Pastry  Chord We use the following lookup performance metrics  Relative Delay Penalty (RDP) - latency of overlay path compare to IP  Physical network hops - length of the overlay path measured in IP hops  Number of failed lookups Other metrics (refer to paper)  Format of node name  Organisation size  Models for distribution of nodes and data  Using host or organisation generated node name  Simulation of domain isolation by failing organization’s link

Experiment Results Basic routing costs  Full SkipNet and Pastry are locality aware while basic SkipNet and Chord are not. Hence performed better.  Non-uniform distribution of data does not affect performance. Routing Entries per Node Locality of Placement  Measures physical network hops.  Chord and Pastry have constant physical hops because they are oblivious to locality of data since they diffuse data throughout network.  SkipNet shows performance improvements as the locality of the data references increased. ChordBasic SkipNetFull SkipNetPastry

Experiment Results Fault Tolerance – when organisation disconnected  Locality improves fault tolerance.  Chord, Pastry fails totally for local lookups at data diffused  SkipNet functions and does local lookups Constrained Load Balancing (within a domain)  Studies the Relative Delay Penalty (RDP) as node increases  Basic CLB using R-Table cause higher delays penalties  Full CLB causes intermediate delays penalties  Pastry has low delay penalties. Network proximity  Study the effect of RDP over density k which control P-Table entries.  We notice that RDP levels off after k=8 because of the increase of pointers in P-Table

SkipNet Summary SkipNet is the first p2p system that achieves both path and content locality. Provides content locality at desired degree and granularity. Clustering node names allows SkipNet to perform gracefully in face of linkages failure. Performance is similar to other p2p systems such as Chord and Pastry under uniform access patter. Under access patterns where intra-organisation traffic predominates, SkipNet performs better. SkipNet is also more resilience to network partitions than other p2p.

Conclusion Looked at hashed based techniques in P2P  Pastry  P-Grid Two important issues  Load balancing  Neighbor table consistency preserving Comparison of DHT techniques SkipNet – A Skip List Adaption

References [CAN2001] Sylvia Ratnasamy; Paul Francis; Mark Handley; Richard Karp; Scott Shenke. A Scalable Content-Addressable Network. SIGCOMM’01, August , [CPLS2001] Ion Stoica Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Chord: A Scalable Peertopeer Lookup Service for InternetApplications. SIGCOMM’01, August 27-31, [CSWH2000] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A distributed anonymous information storage and retrieval system”, Proc. of ICSI Workshop on Design Issues in Anonymity and Unobservability, [DRGR2003] K. Gummadi, R. Gummadiy, S. Gribble, S. Ratnasamy, S. Shenker, I. Stoicak, The Impact of DHT Routing Geometry on Resilience and Proximity, SIGCOMM’03, August 25–29, [LL2004a] S. S. Lam and h. Liu. Failure recovery for structured P2P networks: Protocol design and performance evaluation. In Proc. Of ACM SIGMETRICS, June [LL2004b] Consistency-preserving Neighbor Table Optimization for P2P Networks, Technical Report TR-04-01, Dept. of CS, Univ. of Texas at Austin, January 2004.

References (cont.) [GLSKS2004] Load Balancing in Dynamic Structured P2P Systems, Proc. of IEEE INFOCOM, Portland, Oregon, USA, [PSL1990] William Pugh. Skip lists: A probabilistic alternative to balanced trees. Communications of the ACM, June 1990 supported by an AT&T Bell Labs Fellowship and by NSF grant CCR– [RD2001] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale pear-to-per systems”. In Proc. of the 18 th IFIP/ACM International Conf. on Distributed Systems Platforms, November [SMKKB2001] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proc. Of SIGCOMM ’01, San Diego, California, USA [SML+2004] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, Proc. of the 2001 ACM Annual Conference of the Special Interest Group on Data Communication (ACM SIGCOMM’01), [SNL2003] Nicholas J.A. Harvey, Michael B. Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman. SkipNet: A Scalable Overlay Network with Practical Locality Properties. Proceedings of the Fourth USENIX Symposium on Internet Technologies and Systems (USITS '03), Seattle, WA. March 2003