Turning Heterogeneity into an Advantage in Overlay Routing (To be presented at IEEE Infocom’03) Zhichen Xu, Mallik Mahalingam, Magnus Karlsson Internet Systems and Storage Lab Hewlett-Packard Company
9/20/2018 Motivation For a large distributed system to function well it must be scalable, fault-tolerant, secure, reliable, and have low maintenance cost Distributed hash table (DHT) based overlay networks provide a simple abstraction that maps “keys” to “values” They can be used in many important applications, as a result these applications can enjoy these nice properties E.g., distributed storage, DNS, media streaming, web caching, content-based searching, distributed firewalls, etc. Several proposals: Pastry, Tapestry, CAN, eCAN, SkipNet, etc. Provide a homogeneous abstraction to the applications, but vary in their logical structures and flexibility 9/20/2018 Zhichen Xu HP template
Baseline DHT, a 2-dimensional CAN node zone Cartesian space partitioned into zones A node serves as “owner” of a zone A key is a “point” in the Cartesian space “Value” stored on node that owns the zone that contains the point (key) 9/20/2018 Zhichen Xu
Low maintenance cost & self-organizing… new zone new node Node join: pick a point and split zone with node currently owns the point Node departure: a neighboring node takes over “state” of the departing node Dynamisms are shielded from the users and applications! 9/20/2018 Zhichen Xu
Logical routing 1 2 3 Routing: traverse a series of neighboring zones from source to destination 9/20/2018 Zhichen Xu
Each logical hop can correspond to multiple physical hops 1 1 2 3 3 2 It is important that the structure of the overlay efficiently uses the underlying physical network! 9/20/2018 Zhichen Xu
Techniques for achieving proximity awareness 9/20/2018 Techniques for achieving proximity awareness Within the overlay [Castro et al] Geographic layout, e.g., Topologically-aware CAN uneven distribution of the nodes and chance of overloading nodes 9/20/2018 Zhichen Xu HP template
Techniques for achieving proximity awareness 9/20/2018 Techniques for achieving proximity awareness Within the overlay [Castro et al] Geographic layout, e.g., Topologically-aware CAN uneven distribution of the nodes and chance of overloading nodes Proximity routing, e.g., Chord, Choices limited Closest to s s: source Candidate 1 Candidate 2 d: destination Candidate 3 9/20/2018 Zhichen Xu HP template
Techniques for achieving proximity awareness 9/20/2018 Techniques for achieving proximity awareness Within the overlay [Castro et al] Geographic layout, e.g., Topologically-aware CAN uneven distribution of the nodes and chance of overloading nodes Proximity routing, e.g., Chord, Choices limited Proximity-neighbor selection, e.g., Pastry, Tapestry, eCAN Routing table entries selected according to proximity metric among nodes that satisfy the constraint 1 2 1 2 3 4 3 4 5 6 7 8 9 10 11 12 13 14 15 16 9/20/2018 Zhichen Xu HP template
Techniques for achieving proximity awareness 9/20/2018 Techniques for achieving proximity awareness Within the overlay [Castro et al] Geographic layout, e.g., Topologically-aware CAN uneven distribution of the nodes and chance of overloading nodes Proximity routing, e.g., Chord, Choices limited Proximity-neighbor selection, e.g., Pastry,Tapestry, eCAN Routing table entries selected according to proximity metric among nodes that satisfy the constraint Performance constrained by the logical structure of the default overlay 9/20/2018 Zhichen Xu HP template
Techniques for achieving proximity awareness 9/20/2018 Techniques for achieving proximity awareness Within the overlay [Castro et al] Geographic layout, e.g., Topologically-aware CAN uneven distribution of the nodes and chance of overloading nodes Proximity routing, e.g., Chord, Choices limited Proximity-neighbor selection, e.g., Pastry Routing table entries selected according to proximity metric among nodes that satisfy the constraint Performance constrained by the logical structure of the default overlay Auxiliary networks, e.g. Brocade Constructing a secondary overlay network Still use logical routing in the secondary network Pushes the problem to an auxiliary network of a smaller size Dilemma in picking the size of the secondary network 9/20/2018 Zhichen Xu HP template
Our contributions Decouple the homogeneous abstraction from routing 9/20/2018 Our contributions Decouple the homogeneous abstraction from routing Constructing auxiliary routing network using AS-level topology derived from BGP reports Landmark-numbering scheme Route advertisement using a “distance vector” algorithm with a route summarization to reduce state Works with all currently existing overlays Simulation results show that our approach can achieve close to optimal routing performance 1.04 to 1.12 times optimal for an Internet-like topology Previous approaches 2.5 to 5 times optimal for the same topology 9/20/2018 Zhichen Xu HP template
Outline Motivation Related work Default overlay network eCAN 9/20/2018 Outline Motivation Related work Default overlay network eCAN Expressway: unconstrained auxiliary network How does a node find the close-by nodes? How do we control the routing state? What can the expressway be used for? Experimental results Discussions & conclusions 9/20/2018 Zhichen Xu HP template
eCAN, represents state-of-art CAN zones (order-1 zones) 9/20/2018 Zhichen Xu
K default CAN zones make an order-2 zone Order-2 zones 9/20/2018 Zhichen Xu
K order-2 zones make an order-3 zone 9/20/2018 Zhichen Xu
High order routing neighbors High-order routing tables are soft-state Allows for proximity-neighbor selection Neighbor selection based on landmark clustering / controlled data placement Topology-aware Chord is equivalent to 1-d eCAN 9/20/2018 Zhichen Xu
Expressway definitions & challenges 9/20/2018 Expressway definitions & challenges Expressway nodes are nodes that have good connectivity and availability Expressway nodes connect to other expressway nodes that are close-by to form a backbone Ordinary nodes connect to closest expressway node Traffic go through expressway, if possible Challenges: How does a node (ordinary or expressway) find the close-by expressway nodes? How do we control the routing state? What can the expressway be used for? 9/20/2018 Zhichen Xu HP template
Outline Motivation Related work Default overlay network eCAN 9/20/2018 Outline Motivation Related work Default overlay network eCAN Expressway: unconstrained auxiliary network How does a node find the close-by nodes? How do we control the routing state in the expressway? What can the expressway be used for? Experimental results Discussions & conclusions 9/20/2018 Zhichen Xu HP template
Landmark clustering Related work di: distance to landmark I 9/20/2018 Landmark clustering Landmark3 Landmark space di: distance to landmark I <d1, d2, d3> Landmark1 Landmark vector Nodes with similar distances to landmarks likely close to each other Landmark2 Related work Landmark ordering [Ratnasamy et al 2002]: Coordinate-based [Eugene and Zhang 2001]: 9/20/2018 Zhichen Xu HP template
Locating close-by expressway node 9/20/2018 Locating close-by expressway node Landmark3 DHT a a b b c Landmark1 c Landmark2 Landmark vector as key to store information of the expressway nodes on the DHT such that distances in the “landmark space” are preserved A node uses its landmark vector to search the DHT to find close-by nodes Expressway nodes finds and connects to physically close-by expressway nodes to form the expressway network 9/20/2018 Zhichen Xu HP template
9/20/2018 But, the dimensionality of the landmark space and that of the DHT can be different Landmark3 DHT Dimension reduction a a b b c Landmark1 c Landmark2 9/20/2018 Zhichen Xu HP template
Space Filling Curves : Hilbert Curve 2 3 8 7 1 4 5 6 Points close to each other in n-d space mapped to points close to each other in 1-d space, and vice versa 9/20/2018 Zhichen Xu
9/20/2018 Proximity-preserving dimension reduction of landmark vectors : landmark numbering 5 6 2 3 7 8 4 3 1 4 5 6 7 1 2 Landmark number (a) (b) 9/20/2018 Zhichen Xu HP template
Discussions A similar procedure can be used for other overlays 9/20/2018 Discussions A similar procedure can be used for other overlays For Chord, we use the landmark number as the DHT key to store information of the expressway nodes on a node whose ID is greater or equal to the landmark number For Tapestry and Pastry, we can use a prefix of the node IDs to partition the logical space into grids. In summary, our goal is to store expressway node information such that information about close-by nodes is stored close to each other on the overlay Whereas, e.g., Pastry relies on the ability of finding physically closest node at node join and requires message exchanges to fix up the existing routing tables 9/20/2018 Zhichen Xu HP template
Outline Motivation Related work Default overlay network eCAN 9/20/2018 Outline Motivation Related work Default overlay network eCAN Expressway: unconstrained auxiliary network How does a node find the close-by nodes? How do we control the routing state in the expressway? What can the expressway be used for? Experimental results Discussions & conclusions 9/20/2018 Zhichen Xu HP template
Route advertisement with summarization 9/20/2018 Route advertisement with summarization An expressway node periodically advertises all local nodes that are in its physical proximity to neighboring expressway nodes Same as the standard distance vector algorithm, except advertise summarization of multiple nodes, and transport address of one representative node only expressway nodes participate in route advertisement Route advertisement messages are controlled with a time-to-live (TTL) expressed as the number of expressway hops 9/20/2018 Zhichen Xu HP template
Route summarization: aggregate multiple nodes 9/20/2018 Route summarization: aggregate multiple nodes 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 Nodes whose zone falls in a virtual grid are summarized by the ID of the virtual grid The pair <GridID, IP of representative node> are propagated representative node For CAN, we partition the Cartesian space into virtual grids For Pastry, we can summarize multiple node with nodeID prefix For Chord, we can summarize multiple nodes with a nodeID range 9/20/2018 Zhichen Xu HP template
Outline Motivation Related work Default overlay network eCAN 9/20/2018 Outline Motivation Related work Default overlay network eCAN Expressway: unconstrained auxiliary network How does a node find the close-by nodes? How do we control the routing state in the expressway? What can the expressway be used for? Experimental results Discussions & conclusions 9/20/2018 Zhichen Xu HP template
Direct route vs. Expressway-node forwarding 9/20/2018 Direct route vs. Expressway-node forwarding source Direct route: Requires slightly more storage space to keep the route summary and relies on IP routing Expressway–node forwarding: If a node leaves the system, it is less expensive to repair May deliver routing performance better than default IP routing [RON 2001, Detour 1999] Ordinary nodes cache addresses of nodes associated with the same expressway node node node node Expressway node Expressway node node node Direct route node node Expressway node node node dest 9/20/2018 Zhichen Xu HP template
Experimental evaluation : 2-d eCAN as default overlay 9/20/2018 Experimental evaluation : 2-d eCAN as default overlay AS topology: 1000 AS from a total of 13,000 active AS Assume 100 ms inter-AS delay and 10 ms intra-AS delay A node is assigned to one of the 1000 AS. Transit-stub graph using GT-ITM: 10,000 nodes, 228 transit domains, 5 nodes /transit domain, 4 stub domains/transit node, and 2 nodes in each stub domain. 100ms for cross transit links, 20 ms for links inside a transit, 5 ms for links connecting a transit and stub node, and 2 ms for links inside a stub Compare against eCAN with roughly the same amount of state Logical auxiliary: a Brocade-like system that uses a homogeneous auxiliary logical overlay network 9/20/2018 Zhichen Xu HP template
eCAN with similar state 1 2 For fairness, we compare with eCAN with similar state How do we make use of the additional state? Rather than always route to the physically closest nexthop candidate, we route to the nexthop that can bring down overall delay 3 9/20/2018 Zhichen Xu
Logical auxiliary Default overlay 0.5 caching along advertising paths 9/20/2018 Logical auxiliary 0.5 caching along advertising paths 2: lookup the IP address of the destination node Homogeneous auxiliary overlay network 0: ordinary nodes advertise themselves on the auxiliary using nodeIDs as keys to store their IP addresses 1: contact local super node 3: route to the destination node node Default overlay 9/20/2018 Zhichen Xu HP template
Parameters used # of nodes: 512-8K (4K as default) 9/20/2018 Parameters used # of nodes: 512-8K (4K as default) TTL: 1-9 (9 as default) Virtual grids : 1 virtual grid /1 node – 1/16 nodes (1/1 as default) # number of landmarks: 15 Fraction of nodes that are expressway nodes: 1/1-1/64 (1/10 default) Routing: direct, expressway-node forwarding Performance metric: stretch Routing delay / shortest-path delay 9/20/2018 Zhichen Xu HP template
9/20/2018 Summary of results Expressway produces good average routing performance Landmark clustering: For the AS topology, 1.07 times shortest-path routing, individual measurement ranging from 1.04 to 1.12 For the transit-stub graph, 1.41 on average, with individual measurements ranging from 1.20 to 1.55 (Can be better as ordinary nodes associating with the same expressway node do not establish direct route among themselves) eCAN and homogeneous auxiliary stays between 2.5-7 times shortest-path routing 9/20/2018 Zhichen Xu HP template
Comparison of various approaches 9/20/2018 Comparison of various approaches AS topology Transit-stub graph Our approach: 1.07 to 1.41 times of optimal Other approaches: 2.5 to 7 times of optimal 9/20/2018 Zhichen Xu HP template
Direct route vs. expressway-node forwarding 9/20/2018 Direct route vs. expressway-node forwarding Direct route performs better than expressway-node forwarding, due to shortest-path routing Performance of our approach improves as number of nodes increases 9/20/2018 Zhichen Xu HP template
Effect of varying the ratio of expressway nodes in the system 9/20/2018 Effect of varying the ratio of expressway nodes in the system As the percentage of expressway nodes increases, expressway better approximates the underlying physical network Whereas “logical auxiliary” cannot take advantage of this 9/20/2018 Zhichen Xu HP template
9/20/2018 Conclusions Propose generic techniques to construct an auxiliary network for DHT-based overlays Decouples routing from DHT abstraction to take advantage of heterogeneity that exists in the system Achieves routing performance close to optimal The protocol is relatively complicated The expressway nodes need to be relatively stable 9/20/2018 Zhichen Xu HP template
More about eCAN Topology-aware Chord: 1-d eCAN High-order zones allows for locality-preserving data placement (SkipNet) Placement of objects can be controlled to preserve locality Machines that belong to certain organizations can be co-located logically High-order zone 1 zone 2 zone 5 zone 3 zone 4 node node node 9/20/2018 Zhichen Xu
Varying the number of virtual grids 9/20/2018 Varying the number of virtual grids 1 node/virtual grid 4 nodes/virtual grid 16 nodes/virtual grid 9/20/2018 Zhichen Xu HP template
Effect of varying TTL for route advertisement 9/20/2018 Effect of varying TTL for route advertisement 9/20/2018 Zhichen Xu HP template
Example applications Distributed storage space 9/20/2018 Example applications Distributed storage space Content SHA-1 key Place <Key, document> pair on top of DHT Object lookup translates to routing Distributed content-based search Controlled placement of document info on DHT such that documents that are similar in contents are co-located Search space is effectively controlled It is important that structure of the overlay efficiently uses the underlying physical network! 9/20/2018 Zhichen Xu HP template