1 Plaxton Routing
2 History Greg Plaxton, Rajmohan Rajaraman, Andrea Richa. Accessing nearby copies of replicated objects, SPAA 1997 Used in several important P2P networks like Microsoft’s Pastry and UC Berkeley’s Tapestry (Zhao, Kubiatowicz, Joseph et al.) that is also the backbone of Oceanstore, blueprint of a persistent global-scale storage system
3 Goal A set A of m objects reside in a network G. Plaxton et al. proposed an algorithm of accessing such a shared object using a nearby copy of it. Supported operations are insert, delete, read (but not write ) on shared objects. [ What is the challenge here? Think about this: if every node maintains a copy, then read is fast, but insert or delete are very expensive. Also, storage overhead grows. The algorithm must be efficient w.r.t both time and space]
4 Main idea Embed an n-node virtual height-balanced tree T into the network. Each node u maintains information about copies of the object in the subtree of T rooted at u. To read, u checks the local memory of the subtree under it. If the copy or a pointer to the object is available, then u uses that information, otherwise, it passes the request to its parent in T.
5 Plaxton tree Plaxton tree (some call it Plaxton mesh) is a data structure that allows peers to efficiently locate objects, and route to them across an arbitrarily-sized network, using a small routing map at each hop. Each node serves as a client, server and a router.
6 Object and Node names Objects and nodes acquire names independent of their location and semantic properties, in the form of random fixed-length bit-sequences represented by a common base (e.g., 40 Hex digits representing 160 bits). using the output of hashing algorithms, such as SHA-1 (leads to roughly even distribution in the name space) Assume that n is a power of 2 b, where b is a fixed positive integer. Thus, name of object B = 1032 (base 4 = 2 2, so b=2), name of object C = 3011 etc. n = 256
7 Neighbor map Level i matches i suffix entries Number of entries per level = ID base (here it is 16) Each entry is the suffix matching node with least cost. If no such entry exists, then pick the one that with highest id & largest suffix match These are all primary entries A A L2 L0 L x y1 y1’ y2y2’ c(x,y1) < c(x.y1’) c(x,y2) < c(x.y2’) Size of the table: b * log b (n) Destination =4307 Destination = 7353 Destination =
8 Neighbor map of 5642 Level i matches i suffix entries. Number of entries per level = ID base (here it is 8) Each entry is the suffix matching node with least cost. If no such entry exists, then pick the one that with highest id & largest suffix match These are all primary entries y2y2’ L0L1 L2 L3
9 Neighbor map In addition to primary entries, each level contains secondary entries. A node u is a secondary entry of node x at a level i, if (1)it is not the primary neighbor y, and (2) c(x,u) is at most d. c(x,w ) for all w with a matching suffix of size i Here d is a constant Finally, each node stores reverse neighbors for each level. Node y is a reverse neighbor of x iff x is a primary neighbor of y. All entries are statically chosen, and this needs global knowledge (-)
10 Routing Algorithm Assume that the destination node is a tree root. To route to node (xyz) –Let shared suffix = n so far –Look at level n+1 –Match the next digit of the destination id –Send the message to that node Eventually the message gets relayed to the destination.
11 Example of suffix routing Consider 2 18 namespace, How many hops?
12 Choosing the root For each object, the root node simply stores a pointer to the server that stores the object. The root does not hold a copy of the object itself. An object’s root or surrogate node is chosen as a node whose id matches the object’s id in the largest number of trailing bit positions. In case there are multiple such nodes, choose the node with the largest such id.
13 Pointer list Each node x has a pointer list P(x): a list of triples (O,S, k), where O is the object, S is the node that holds a copy of O, and k is the upper bound of c(x,y). The pointer list is updated By insert and delete operations. Root of O (O,S’,2) (O,S,1) (O,S,2) (O,S’,1) (O,S) Server S (O,S’,3) Server S’
14 Inserting an Object To insert an object, the publishing process computes the root, and sends the message (O,S) towards the root node. At each hop along the way, the publishing process stores location information in the form of a mapping Root of O (O,S’,4) (O,S,1) (O,S,2) (O,S’,3) (O,S) Server S (O,S’,5)
15 Inserting another copy Intermediate nodes maintain the cost of access. While inserting a duplicate copy, the pointers are modified so that they direct to the closest copy Root of O (O,S’,2) (O,S,1) (O,S,2) (O,S’,1) (O,S) Server S (O,S’,3) Server S’
16 Read Client routes to Root of O, route to S when ( O,S ) found. If any intermediate node stores (O,S), then that leads to a faster discovery. Also, nodes handling requests en route communicate with their primary and secondary neighbors to check the existence of another close-by copy. Root of O (O,S’,2) (O,S,1) (O,S,2) (O,S’,1) (O,S) Server S (O,S’,3) Server S’
17 Deleting an object Removes all pointers to that copy of the object in the pointer list on the nodes leading to the root Otherwise, it uses the reverse pointers to to update the entry. Root of O (O,S’,2) (O,S,1) (O,S,2) (O,S’,1) (O,S) Server S (O,S’,3) Server S’ (O,S,3 ) (O,S,4 ) (O,S,5 )
18 Results Let C= (max{c(u,v): u,v in V} Cost of read = O(f(L(A))c(x,y)) where L is the length of the message. If there is no shared copy, then it is O(C) Cost of insert = O(C) Cost of delete = O(C log n) Auxiliary memory to store q objects = O(q log 2 n) w.h.p
19 Benefits and Limitations + Scalable solution + Existing objects are guaranteed to be found - Needs global knowledge to form the neighbor tables - The root node for each object may be a possible bottleneck.
20 Benefits and limitations + Simple fault handling > > Scalable All routing done using locally available data + Optimal routing distance