Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems
Introduction-P2P Network A peer-to-peer (P2P) network is a distributed system in which peers employ distributed resources to perform a critical function in a decentralized fashion [LW2004] Classification of P2P networks Unstructured and Structured Centralized and Decentralized Hierarchical and Non-Hierarchical
Structured P2P network Distributed hash table (DHT) DHT is a structured overlay that offers extreme scalability and hash-table-like lookup interface CAN, Chord, Pastry Other techniques Skip list Skipgraph, SkipNet
Outline Hashed based techniques in P2P Hashed based structured P2P system Pastry P-Grid Two important issues Load balancing Neighbor table consistency preserving Comparison of DHT techniques Skip-list based system SkipNet Conclusion
Outline Hashed based techniques in P2P Hashed based structured P2P system Pastry P-Grid Two important issues Load balancing Neighbor table consistency preserving Comparison of DHT techniques Skip-list based system SkipNet Conclusion
Pastry [RD2001] Pastry is a P2P object location and routing scheme Hash-based Properties Completely decentralized Scalable Self-organized Fault-resilient Efficient search
Design of Pastry nodeID: each node has a unique numeric identifier (128 bit) Assigned randomly Nodes with adjacent nodeIDs are diverse in geography, ownership, etc Assumption: nodeID is uniform in the ID space Presented as a sequence of digits with base 2 b b is a configuration parameter (4)
Design of Pastry (cont’) Message/query has a numeric key of same length with nodeIDs Key is presented as a sequence of digits with base 2 b Route: a message is routed to the node with a nodeID that is numerically closest to the key
Message Key = 10 Destination of Routing Destination node
Pastry Schema Given a message of key k, a node A forwards the message to a node whose ID is numerically closest to k among all nodes known to A Each node maintains some routing state
Pastry Node State A leaf set L A routing table A neighborhood set M LARGER SMALLER NodeID Routing table Leaf set Neighborhood set
Meanings of ‘Close’ Closest according to proximity metric (real distance ) Nearest Neighbor Closest according to numerical meaning Node with closet nodeID
Pastry Node State A leaf set |L| nodes with closest nodeIDs |L|/2 larger ones and |L|/2 smaller ones Useful in message routing A neighborhood set |M| nearest neighbors Useful in maintaining locality properties
LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Leaf Set and Neighborhood Set In this example b=2, l=8 |L| = 2 × 2 b = 8 |M| = 2 × 2 b = 8 SMALLERLARGER A
Routing Table l rows and 2 b columns i th row: i-prefix j th column: next digit after the prefix is j b=2 l=8 - > 8 rows and 4 columns LARGER SMALLER NodeID Routing table Leaf set Neighborhood set nd NodeID j=0 j=1 j=3 A
LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Routing Step1: If k falls within the range of nodeIDs covered by A’s leaf set, forwarded it to a node in the leaf set whose nodeID is closest to k Eg. k = falls in the range ( , ) Forword it to node If k is not covered by the leaf set, go to step2 A
LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Routing Step2: The routing table is used and the message is forwarded to a node whose ID shares a longer prefix with the k than A’s nodeID does Eg. k = forward it to node If the appropriate entry in the routing table is empty, go to step3 A
Step3: The message is forwarded to a node in the leaf set, whose ID has the same shared prefix as A but is numerically closer to k than A Eg. k = If such a node does not exist, A is the destination node LARGER SMALLER NodeID Routing table Leaf set Neighborhood set Routing A forward it to node
Routing The routing procedure always converges, since each step chooses a node that Shares a longer prefix Shares the same long prefix, but is numerically closer Routing performance The expected number of routing steps is log 2 b N Assumption: accurate routing tables and no recent node failures
Performance Average number of routing hops versus number of Pastry nodes b = 4, |L| = 16, |M| =32 and 200,000 lookups.
Discussion of Pastry Pastry: the parameters make it flexible b is the most important parameter that determines the power of the system Trade-off between the routing efficient (log 2 b N) and routing table size (log 2 b N×2 b ) Each node can choose its own |L| and |M| based on the node situation
Local optimal?? Eg. k = Discussion of Pastry – routing schema LARGER SMALLER NodeID Routing table Leaf set Neighborhood set A Y’ nodeID = Dis(k, X’ID) = ( , ) = 32 Dis(k, Y’ID) = ( , ) = 1 X’ nodeID = Local optimal node is Y Pastry forward to node X
P-Grid [Aberer2001] P-Grid is a scalable access structure for P2P Hash-based & virtual binary search tree Randomized algorithms are used for constructing the access structure Virtual binary tree 1 :3 01:2 1 :5 01:2 0 :6 11:5 0 :2 11:5 1 :4 00:6 0 :6 10:4 Query k=100 4
P-Grid (cont’) Properties Complete decentralized Scalable with the total number of nodes and data items Fault-resilient, search is robust against failures of nodes Efficient search
Discussion of Pastry and P-Grid The two system both make uniform assumption Pastry: ID space P-Grid: data distribution and behavior on peer If data/message/query distribution is skewed, Pastry and P-Grid are not able to balance the load
Outline Hashed based techniques in P2P Hashed based structured P2P system Pastry P-Grid Two important issues Load balancing Neighbor table consistency preserving Comparison of DHT techniques Skip-list based system SkipNet Conclusion
Load Balancing Consider a DHT P2P system with N nodes Θ(logN) imbalance factor if items IDs are uniformly distributed [SMKKB2001] Even worse if applications associate semantics with the item IDs IDs would no longer be uniformly distributed How to Minimize the load imbalance? Minimize the amount of load moved?
Load Balancing Challenges Data items are continuously inserted/deleted Nodes join and depart continuously The distribution of data item IDs and item sizes can be skewed Solution—[GLSKS2004]
Load Balancing Virtual server Represents a peer in the DHT rather than physical node A physical node hosts one or more virtual server Total load of virtual servers = load of node E.g., in Chord Virtual Server FT 1 FT 3 Node: Physical Node
Load Balancing Basic idea Directories To store load information of the peer nodes Periodically schedule reassignments of virtual servers Distributed load balancing problem Centralized problem at each directory reduced to
Load Balancing Load balancing algorithm Directory ID (known to all nodes) Nod e Computes a schedule of virtual server transfers among nodes contacting it in order to reduce their maximal utilization Delay T time Receives information from nodes Randomly chooses a directory Send to directory:(1)Loads of all virtual servers that it is responsible for (2)Capacity directory in new cycle OR utilization>K e yes Emergency load balancing
Load Balancing Load balancing algorithm (cont.) Computing optimal reassignment is NP- complete Greedy algorithm O(mlogm) For each heavily loaded node, move the least loaded virtual server to pool For each virtual server in pool, from heaviest to lightest, assign to a node n which minimizes the resulting load
Load Balancing Performance Tradeoff: Load movement vs. Load balancing Load balancing: max node utilization When T decreases Max node utilization decreases Load movement increases Effective in achieving load balancing for System utilization as high as 90% Only transfer 8% of the load that arrives in the system Emergency load balancing is necessary
Consistency Preserving Neighbor table A table of neighbor pointers For efficient routing in a P2P system Challenge How to maintain consistent neighbor tables in a dynamic network where nodes may join, leave and fail concurrently and frequently?
Consistency Preserving Consistent network For every entry in neighbor tables, if there exists at least one qualified node in the network, then the entry stores at least one qualified node Qualified node for an entry of a node’s neighbor table: the node whose ID has suffix same as the required suffix of that entry Otherwise, the entry is empty
Consistency Preserving K-consistent network For every entry in neighbor tables, if there exist H qualified nodes in the network, then the entry stores at least min(K,H) qualified nodes Otherwise, the entry is empty For K>0, K-consistency => consistency 1-consistency = consistency
Consistency Preserving General strategy Identify a consistent subnet as large as possible Only replace a neighbor with a closer one if both of them belong to the subnet Expand the consistent subnet after new nodes join Maintain consistency of the subnet when nodes fail
Consistency Preserving Approach of [LL2004b] To design a join protocol such that An initially K-consistent network remains K- consistent after a set of nodes join process terminate The termination of join implies the node joined belong to this consistent subnet To design a failure recovery protocol that Recovers K-consistency of the subnet by repairing holes left by failed neighbors with qualified nodes in the subnet Protocol is presented in the paper [LL2004a], but integrated with join in experiment of this paper
Consistency Preserving Join protocol Each node has a status copying, waiting, notifying, cset_waiting, in_system S-node: node in status in_system T-node: otherwise All S-nodes form a consistent subnet
Consistency Preserving copying waiting notifying cnet_wating in_system Copy neighbor infor from S-nodes to fill in most entries of its table level by level. When cannot find a qualified S-node for a level i>=1 Try to find an S-node which shares at least the rightmost i-1 with x and stores x as a neighbor When find such a node, say y Seek and notify nodes that share the rightmost j digits with it, where j is the lowest level that x is stored in y’s table When finish notifying Wait for the nodes joining currently and are likely to be in the same consistent subnet When confirm all nodes have exited notifying status
Consistency Preserving Performance p-ratio In x’s table, the primary-neighbor of the entry is y, the true primary-neighbor should be z p-ratio = delay from x to y / delay from x to z K-consistency is always maintained in all experiments When K increases, p-ratio decreases More neighbor infor is stored => more messages Even with massive joins and failures, tables are still optimized greatly
Outline Hashed based techniques in P2P Hashed based structured P2P system Pastry P-Grid Two important issues Load balancing Neighbor table consistency preserving Comparison of DHT techniques Skip-list based system SkipNet Conclusion
Comparing DHTs [DGPR2003] Each DHT Algorithm has many details making it difficult to compare. We will use a component-base analysis approach Break DHT design into independent components Analyze impact of each component choice separately Two types of components Routing-level : neighbor & route selection System-level : caching, replication, querying policy, latency
Metrics Used Metrics used in comparison Flexibility – Options in choosing neighbors and routes Resilience – Does it route when nodes goes down ? Load balancing – Is the content distributed ? Proximity & Latency – Is the content stored nearby ? Aspects of DHT Geometry - a structure that inspires a DHT design, Distance function –distance between two nodes Algorithm: rules for selecting neighbors and routes using the distance function
Algorithm & Geometry What is routing algorithm & geometry ? Routing Algorithm – refers to exact rules for selecting neighbors, routes. (eg. Chord, CAN, PRR, Tapestry, Pastry) Geometries – refers to the algorithms’ underlying structure derived from the way in which neighbors and routes are chosen. (Eg. Chord routes on a ring). Why is geometry important ? Geometry capture flexibility in selection of neighbors and routes. Neighbor selection – Does the geometry choose neighbors based on proximity ? Leads to shorter paths. Route selection – Number of options for selecting next hops. Leads to shorter, reliable paths.
DHT Algorithms Analysis The table summarizes the geometries & algorithms. We will examine the metric flexibility in these two aspects Flexibility in neighbor selection Flexibility in route selection GeometryAlgorithm TreePRR HypercubeCAN ButterflyViceroy RingChord XORKademlia HybridPastry root root
Tree Geometry root PRR uses tree geometry. Distance between two nodes is the depth of the binary tree (Well-balanced tree : log N) Node selection flexibility - has 2 (i-1) options of choosing neighbor at distance i. No routing flexibility Height = 1 Height = 2 Leafset
Hypercube Geometry CAN uses a d-torus hypercube. Each node has log n neighbor. Routing greedily by correcting bits in any order. Neighbors differ by exactly one bit. No flexibility in choosing neighbors. Routing from source to destination at log n distance. First node has log n next hop choices, second hop has log (n – 1) choices. Hence (log n)! choices
Butterfly Geometry Viceroy uses butterfly geometry. Nodes organized in a series of log n “stages” where all the nodes at stage i are capable of correcting the i th bit. Routing consists of 3 phases. Done in O(log N) hops No flexibility in route selection and neighbor selection.
Ring Geometry Chord uses the Ring Maintain log n neighbors and routes to arbitrary destination in log n hops. Routing in O(log n) hops Flexibility in neighbor selection, has 2 (i-1) possible options to pick its i th neighbor An approx of n log n / 2 possible routing tables for each node Yields (log n)! possible routes to route from a source to destination of distance log n
Ring Geometry To route from 000 to 110, we have two routes. Route to 100 and then to 110. Route to 010 and then to 110.
XOR Kademlia uses XOR Geometry. Distance between nodes is XOR of their identifier. Node has 2 (i-1) options of choosing neighbor at i th distance. Yields approx n log n / 2 entries per routing table. Route flexibility by fixing lower order bits before fixing the higher bits if an optimal path is not available. May result in longer distances as as the lower order bits fixed need not be preserved by later routing.
Hybrid Pastry is a hybrid. Its nodes are regarded as both leaves of a binary tree and points to a one-dimensional circle. Distance between nodes is either the tree distance and cyclic distance between nodes Node has 2 (i-1) options of choosing neighbor at distance i. Yields approx n ((log n) / 2) entries per routing table. Route selection freedom – allowed to take hops on the ring – these paths might not retain the O(log n) bound on routes. root
Flexibility Overview PropertyTreeHypercubeRingButterflyXorHybrid Neighbor selectionn log n / Route Selection (optimal)1c1(log n) 111 Natural support for sequential neighbors? no yesno Deafult – no Fallback – yes Ring & Hypercube have twice the routing flexibilities than Hybrid & XOR geometries
Resilience Two aspects of robust routing Static resilience measures how well the algorithm can route in a dynamic environment before the recovery algorithms. Dynamic recovery measures how quickly states are recovered after failure. Node failure- 30% failure Tree - 90% routes failed (no route selection flexibility) Ring, Hypercube – 7% routes failed (most route selection flexibility) Hybrid, XOR - 20% route failed (half flexibility as ring) Route Selection Flexibility affects static resilience
Path Latency Goal is to minimise end-to-end latency of overlay networks. Two proximity methods are considered. Proximity Neighbor Selection (PNS) Neighbors are chosen on their proximity. Proximity Route Selection (PRS) Routes are selected depending on the proximity of the neighbors PNS achieves improvement over PRS which achieves improvement over Plain version. Geometry does not affect performance of PNS / PRS. Thus it is important to choose a routing algorithm that has a geometry that accommodates PNS.
Local Convergence Does messages sent from two nodes to the same destination converge at a node near the two sources ? Leads to low latencies in the following: Overlay Multicast Caching Server selection Measured by number of exit points in the network. Best case, only one node sends a message off-domain.
Limitations & Findings Limitations Author has not considered all geometries Not considered other factors and performance metrics Findings Routing geometry is important. Flexibility is improves resilience & proximity. Why not the RING ? Great flexibility to choose neighbors and routes. Implement both the proximity methods PNS & PRS. Highest performance in resilience tests and is as good as other geometry in path lengths and local convergence.
Outline Hashed based techniques in P2P Hashed based structured P2P system Pastry P-Grid Two important issues Load balancing Neighbor table consistency preserving Comparison of DHT techniques Skip-list based system SkipNet Conclusion
Skip List [PSL1990] Skip list are data structures that can be used in place of balanced trees. Uses probabilistic balancing techniques hence algorithms are simpler and faster. Described as a sorted linked list in which some nodes are supplemented with pointers that skip over many list elements. HDR NIL
Perfect Skip List A perfect skip list is one where the height of the i th node is the exponent of the largest power-of-two that divides i. Pointers at level h have length 2 h. A perfect skip list supports searches in O(log N). Because it is expensive to perform insertion and deletions in a perfect skip list, a probabilistic balanced skip list is proposed by consulting a random number generator. HDR NIL Height is 2 : (2 2 ) Height is 3 : (2 3 ) Level 2 pointer skips over 2 2 nodes
Examples HDRNIL Add Node 10 (height is 1 chose randomly) HDRNIL 10 Add Node 5 (height is 0 chose randomly) HDRNIL 10 5 Add Node 8 (height is 2 chose randomly) HDRNIL Add Node 12 (height is 0 chose randomly) HDRNIL Add Node 2 (height is 0 chose randomly) HDRNIL
Search Skip List HDR NIL Search for Node 30. From HDR to Node 29. Then stop and search fails. (illustrated) Search for Node 23. From HDR to Node 16. Drop two levels, From Node 16 to Node 23. Found. Search for Node 27. From HDR to Node 16. Drop one level, From Node 16 to Node 25. Drop one level, from Node 25 to Node 27. Found.
Skip List Worst case performance when significantly unbalanced. Space efficient. Can use 1.33 pointers per element. Maintains a O(log N) searches with high probability. Comparison with AVL, recursive 2-3 & self adjust trees Skip List performs more comparison than other methods. Skip List is slightly slower than AVL trees in searches, but insertions and deletions in a skip list are faster Skip Lists are faster than self adjusting tree when a uniform distribution is encountered, but slower for highly skewed distributions
SkipNet Introduction [SNL2003] In DHTs, we cannot control where the data will be stored Data might be stored far away from the administrative domain and thus hard to administer privileges. – Can we adapt ? Gives rise to Denial of service attacks and traffic analysis. Solution : Use SkipNet - scalable overlay network that provides controlled data placement and guarantee routing locality by organizing data by string names Content can be placed on pre-defined node or distributed uniformly across nodes of a hierarchical naming subtree.
Motivation Disadvantages of Chord, CAN, Tapestry, Pastry: No Content locality: Explicitly place data on a specific overlay nodes or distribute it across nodes in a specified domain. Cannot be prone to traffic analysis & Denial of service attacks No Path locality: Guarantees that routing path between two overlay nodes in a domain does not leave the domain. Additional security – the traffic does not passed on to other domain which could be its competitor. SkipNet provides both content & path locality.
How does SkipNet do it? Employs a string name and numeric ID space. Node names and content identifier string mapped into name ID Hashes of the node names and content identifiers mapped into the numeric ID. By arranging content in name ID order rather than dispersing it, we can achieve content & path locality.
Advantages of locality Improved availability data stored within organisation and can search even if the network disjoints. Resilience against Internet failures. Nodes within a cluster gracefully survives failures that disconnect clusters from the rest of the Internet (useful property of SkipNet) Performance Searches are faster as data is stored near nodes. Manageability facilitates control and maintenance in an administrative domain Security Can deal with traffic analysis & denial of service attacks.
SkipNet Structure Adapts the skip list structure Traversals start from any node State and processing costs should be the same for all nodes We use a Ring & doubly linked list. Other enhancements. Each node also stored 2 log N pointers rather than a high variable number of pointers. SkipNet Perfect : Pointers at level h point to nodes that are exactly 2 h nodes to the left and right. Probabilistic : A node in level h probabilistically determines which ring it belongs to.
SkipNet Structure Level 2TT 1MX 0DZ SkipNet nodes ordered by name ID. Routing tables of nodes A and V shown. A D M O T Z X V Level 2DD 1ZO 0XT
SkipNet Structure Ring 000 Ring 001 Ring 010 Ring 011 Ring 100 Ring 101 Ring 110 Ring 111 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring Level L = 0 L = 1 L = 2 L = 3 The full SkipNet routing infrastructure for an 8 node system, including the ring labels.
Routing By Name ID Similar to search in Skip Lists Message routed from highest level pointer in either clockwise / counter clockwise direction with name ID that are not past the destination value. Terminates when messages arrives at a node whose name ID is closest to destination. Because nodes are doubly linked, scheme routes either to left or right pointers depending on name ID’s. Number of hops is O(log N)
Example Routing a message from Node A to Node V Path: A (Level 2, clockwise) T, “T” < “V” T (Level 2, clockwise) Failed T (Level 1, clockwise) Failed T (Level 0, clockwise) V. (Destination) Level 2TT 1MX 0DZ A D M O T Z X V 2DD 1ZO 0XT Level 2AA 1XM 0VO
Routing Algorithm SendMsg(nameID, msg) { if( LongestPrefix(nameID,localNode.nameID)==0 ) msg.dir = RandomDirection(); else if( nameID<localNode.nameID ) msg.dir = counterClockwise; else msg.dir = clockwise; msg.nameID = nameID; RouteByNameID(msg); } // Invoked at all nodes (including the source and // destination nodes) along the routing path. RouteByNameID(msg) { // Forward along the longest pointer // that is between us and msg.nameID. h = localNode.maxHeight; while (h >= 0) { nbr = localNode.RouteTable[msg.dir][h]; if (LiesBetween(localNode.nameID, nbr.nameID, msg.nameID, msg.dir)) { SendToNode(msg, nbr); return; } h = h - 1; } // h<0 implies we are the closest node. DeliverMessage(msg.msg); }
Routing By Numeric ID Routing begins at level 0 ring until a node is found whose numeric ID matches the destination numeric ID in the first digit. Messages forwarded from ring in level h, R h, to a ring in level h+1, R h+1, such that nodes in R h+1 share h+1 digits with destination numeric ID. Terminates when Deliver message to node with numeric ID = key If none of the nodes in R h share h+1 digits with destination numeric ID then we pick node with numeric ID that is closest to destination’s numeric ID. Number of message hops is O(log N),
Routing By Numeric ID E.g. Let Z = 1000, O = Route from A Path: A(0000) D (1100 – move up level) O (1001 – move up level) Z (1000) O (1001 – closest match for 1011) (deliver). Ring 0000 Ring 0001 Ring 0100 Ring 0101 Ring 1000 Ring 1001 Ring 1100 Ring 1101 A D M O T Z X V A M T X D O ZV AT M X O Z D V AT M XZ OD V Ring 00Ring 01Ring 10Ring 11 Ring 0 Ring 1 Root Ring …………………. O
Routing Algorithm // Invoked at all nodes (including the source and destination nodes) along the routing path. // Initially: msg.ringLvl = -1, msg.startNode = msg.bestNode = null & msg.finalDestination = false RouteByNumericID(msg) { if (msg.numID == localNode.numID || msg.finalDestination) { DeliverMessage(msg.msg); return; } if (localNode == msg.startNode) { // Done traversing current ring. msg.finalDestination = true; SendToNode(msg.bestNode); return; } h = CommonPrefixLen(msg.numID, localNode.numID); if (h > msg.ringLvl) { // Found a higher ring. msg.ringLvl = h; msg.startNode = msg.bestNode = localNode; } else if ( abs(localNode.numID - msg.numID) < abs(msg.bestNode.numID - msg.numID)) { // Found a better candidate for current ring. msg.bestNode = localNode; } // Forward along current ring. nbr = localNode.RouteTable[clockWise][msg.ringLvl]; SendToNode(nbr); }
Benefits Skip Net support routing with the same data structure by name ID numeric ID Bottom ring is sorted by name ID and top rings are sorted by numeric ID. For a given node, the SkipNet rings to which it belongs to precisely form a Skip List that is a ring & double linked.
Node Joins & Departure Node Joins A New node finds top level ring that matches its numeric ID. Finds a neighbor in the top ring using name Id search. Starting from one of the neighbors, it searches for its name ID at the next lower level and thus finds neighbors at lower level. Repeated until it reaches root. The existing nodes only point to the new node only after it has joined the root ring. Insertion traverse O(log N) hops with high probability Node Departure Can route correctly as long as root level ring is maintained. Other levels regarded as optimization hints and it maintains upper-ring membership thru background repair process.
Example Join - Insert node O (101) Search by numeric ID 101 Highest attainable level is 2 O joins ring containing Z at level 2 Z forwards join message to D at next lower level 1 Proceed by searching by name ID in next lower levels D, V are neighbors in level 1 M, T are neighbors in level 0
Properties of SkipNet Content & Path Locality Naming nodes like a DNS entry. Path locality for groups in which nodes share a single DNS suffix. E.g. reversing DNS names: john.microsoft.com becomes com.microsoft.john Incorporating node name ID into content name gurantees that the content will be hosted on that node. E.g. com.microsoft.john/doc-name Constrained Load Balancing Stored using two parts – a CLB Domain and CLB suffix For example a doc using the name msn.com/DataCenter!TopStories.html. Searching node Search for node in the CLB Domain using name ID search. Then search by numeric ID for the hash of the CLB suffix constrained by domain ID. Search is constrained by a nameID prefix, we use the double link list. This type of search affect the performance by a factor of 2. Performed over a naming subtree but not over arbitrary subset of nodes.
Properties of SkipNet Fault tolerance: Only need to maintain correct neighbors at Level 0 Each node has 16 neighbors at Level 0. Level 0 repaired easily by contacting life nodes. Employs background stabilization mechanisms when failure Failure across organizational boundaries only segments the overlay. Gracefully survives. Security: Nodes cannot create global names containing suffix of registered domains. Path locality avoids traffic analysis However, outbound traffic still prone to analysis easily. Range queries: Ability to perform queries over contiguous ring segments.
Enhancements Use Sparse & Dense Routing Table Use a density parameter k & a non-binary random digit to the base k for numeric ID. Duplicate pointer elimination Remove duplicate pointers in the routing table. 25% improvements can be achieved. Incorporate Network proximity for routing by name id Introduce a P-table for proximity routing. The goal of P-table is to maintain routing in O(log ) hops. Ensures that each hop has low latency. Keeps track of the network distance that are close to itself.
Enhancements Incorporate Network proximity for routing by numeric id Add a C-table to incorporate network proximity when searching by numeric ID. Keeps track of nodes that are close and within CLB domain.
Design Alternative IP routing & DNS o Content placement by routing using IP and DNS lookup. Single Overlay Network o Content locality, we name node with the hash of the data’s object’s name. Requires separate routing table for each object o Use 2 part naming scheme –content name consist of node addresses concatenated with node-relative names. Does not support guaranteed path locality o Add constraints to message to limit path locality. However prevents routing from being consistent. o Use a 2 part segments, use numeric ID and name ID like SkipNet. Result is a static form of constrained load balancing.
Design Alternative Multiple overlay network o Multiple overlays with membership could be considered. o Requires that access to other overlays are by gateways. o Access to data is constrained and load balanced within a single overlay not accessible to clients outside except via gateways. SkipNet provides explicit content placement, allows clients to dynamically define new DHTs over any name prefix scope and guarantees path locality within shared name prefix within a single infrastructure.
Experiments The author run experiments against the following: Basic SkipNet using only R-Table Full SkipNet using R-Table, P-Table, C-Table. Pastry Chord We use the following lookup performance metrics Relative Delay Penalty (RDP) - latency of overlay path compare to IP Physical network hops - length of the overlay path measured in IP hops Number of failed lookups Other metrics (refer to paper) Format of node name Organisation size Models for distribution of nodes and data Using host or organisation generated node name Simulation of domain isolation by failing organization’s link
Experiment Results Basic routing costs Full SkipNet and Pastry are locality aware while basic SkipNet and Chord are not. Hence performed better. Non-uniform distribution of data does not affect performance. Routing Entries per Node Locality of Placement Measures physical network hops. Chord and Pastry have constant physical hops because they are oblivious to locality of data since they diffuse data throughout network. SkipNet shows performance improvements as the locality of the data references increased. ChordBasic SkipNetFull SkipNetPastry
Experiment Results Fault Tolerance – when organisation disconnected Locality improves fault tolerance. Chord, Pastry fails totally for local lookups at data diffused SkipNet functions and does local lookups Constrained Load Balancing (within a domain) Studies the Relative Delay Penalty (RDP) as node increases Basic CLB using R-Table cause higher delays penalties Full CLB causes intermediate delays penalties Pastry has low delay penalties. Network proximity Study the effect of RDP over density k which control P-Table entries. We notice that RDP levels off after k=8 because of the increase of pointers in P-Table
SkipNet Summary SkipNet is the first p2p system that achieves both path and content locality. Provides content locality at desired degree and granularity. Clustering node names allows SkipNet to perform gracefully in face of linkages failure. Performance is similar to other p2p systems such as Chord and Pastry under uniform access patter. Under access patterns where intra-organisation traffic predominates, SkipNet performs better. SkipNet is also more resilience to network partitions than other p2p.
Conclusion Looked at hashed based techniques in P2P Pastry P-Grid Two important issues Load balancing Neighbor table consistency preserving Comparison of DHT techniques SkipNet – A Skip List Adaption
References [CAN2001] Sylvia Ratnasamy; Paul Francis; Mark Handley; Richard Karp; Scott Shenke. A Scalable Content-Addressable Network. SIGCOMM’01, August , [CPLS2001] Ion Stoica Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Chord: A Scalable Peertopeer Lookup Service for InternetApplications. SIGCOMM’01, August 27-31, [CSWH2000] I. Clarke, O. Sandberg, B. Wiley, and T. W. Hong, “Freenet: A distributed anonymous information storage and retrieval system”, Proc. of ICSI Workshop on Design Issues in Anonymity and Unobservability, [DRGR2003] K. Gummadi, R. Gummadiy, S. Gribble, S. Ratnasamy, S. Shenker, I. Stoicak, The Impact of DHT Routing Geometry on Resilience and Proximity, SIGCOMM’03, August 25–29, [LL2004a] S. S. Lam and h. Liu. Failure recovery for structured P2P networks: Protocol design and performance evaluation. In Proc. Of ACM SIGMETRICS, June [LL2004b] Consistency-preserving Neighbor Table Optimization for P2P Networks, Technical Report TR-04-01, Dept. of CS, Univ. of Texas at Austin, January 2004.
References (cont.) [GLSKS2004] Load Balancing in Dynamic Structured P2P Systems, Proc. of IEEE INFOCOM, Portland, Oregon, USA, [PSL1990] William Pugh. Skip lists: A probabilistic alternative to balanced trees. Communications of the ACM, June 1990 supported by an AT&T Bell Labs Fellowship and by NSF grant CCR– [RD2001] A. Rowstron and P. Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale pear-to-per systems”. In Proc. of the 18 th IFIP/ACM International Conf. on Distributed Systems Platforms, November [SMKKB2001] I. Stoica, R. Morris, D. Karger, M.F. Kaashoek, H. Balakrishnan, Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Proc. Of SIGCOMM ’01, San Diego, California, USA [SML+2004] I. Stoica, R. Morris, D. Liben-Nowell, D. R. Karger, M. F. Kaashoek, F. Dabek, and H. Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, Proc. of the 2001 ACM Annual Conference of the Special Interest Group on Data Communication (ACM SIGCOMM’01), [SNL2003] Nicholas J.A. Harvey, Michael B. Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman. SkipNet: A Scalable Overlay Network with Practical Locality Properties. Proceedings of the Fourth USENIX Symposium on Internet Technologies and Systems (USITS '03), Seattle, WA. March 2003