The Impact of DHT Routing Geometry on Resilience and Proximity K. Gummadi, R. Gummadi..,S.Gribble, S. Ratnasamy, S. Shenker, I. Stoica
Introduction how routing geometries affect the resilience and proximity properties of DHTs –coping with node failures: static resilience –adapting to Internet topology, proximity issues: path latency and local convergence –flexibility in selection of neighbors and routes an essential factor evaluating various geometries for gaining insight for better designs admitted flaws –not evaluating all routing algorithms –not considering management overhead –focusing on only two performance issues
Terminology Algorithm: the exact details of selecting neighbors and next-hops Geometry: a geometric interpretation of how the selections are made –geometry constrains the choices, but small changes in algorithm do not change the geometry Flexibility: the algorithmic freedom left after the geometry is chosen –neighbor selection does the geometry allow selecting neighbors based on proximity does the geometry support sequential neighbors –route selection how many options for next-hop in case of failure does the geometry allow selecting next-hops based on proximity Sequential neighbors –can route and progress towards all destinations –global ordering on distances required (naturally in Ring, add-on for others)
Basic Routing Geometries (1/2) Tree -e.g. PRR -node ids are leaf nodes -distance = smallest common subtree -routing MSB first Hypercube -e.g. CAN -node id represents position -neighbors differ by 1 bit only -distance is # of differing bits -routing in any order, previous corrections preserved Butterfly -e.g. Viceroy -nodes in log n stages -nodes at stage i can correct i th bit -imposed global ordering: global and stage successors/predecessors required to be held as neighbors -routing in log n hops with constant state at each node: first through stages, then through successors/predecessors
Basic Routing Geometries (2/2) Ring -e.g. Chord -nodes in one-dimensional cyclic id space -distance is the clockwise numeric distance between nodes -routing in log n hops, because each hop cuts the distance in half XOR -e.g. Kademlia -distance is XOR of ids -routing MSB first, but in case of failures can correct the next bit -these corrections are not necessarily preserved => multiple non-optimal paths Hybrid -e.g. Pastry -dual mode, e.g. tree + ring -nodes both leafs and on circle -distance both tree and ring distance -fallback to ring when tree fails -can progress on the tree while not on progressing on the ring
Static Resilience DHTs resilient to node failures Three aspects of resilience –data replication –routing recovery –static resilience routing before recovery algorithms kick in the only aspect considered in this paper
Static Resilience Performance of DHTs with routing tables of equal sizes What happens when failures are present? –failed paths? –increase in path length?
Static Resilience of Various Geometries consistent with flexibility in route selection Ring and Hypercube have twice the flexibility of Hybrid and XOR Tree and Butterfly have no flexibility Hypercube has equal length alternative paths Hybrid, Tree, and XOR have only longer alternative paths Ring has some longer paths Butterfly not applicable
Static Resilience: Sequential Neighbors 16 sequential neighbors added XOR not applicable, Tree included in Hybrid Ring has natural support for sequential neighbors greatly increased resilience increase in resilience comes at the cost of path stretch note the scale! path increase in Butterfly is way higher
Static Resilience: Sequential vs. Regular Neighbors at high failure rates, sequentials are better sequentials can lead to longer paths a large number of regulars is a good compromise
Proximity DHTs designed to route effectively in terms of hopcounts end-to-end latency issues approached through proximity methods PNS - neighbor selection - identifying closest nodes is hard => heuristics needed - PNS(K), K=# of samples - a node’s latency distribution helps in choosing K PRS - next-hop selection -complicated tradeoff between # of hops and latency - different heuristics for different geometries PIS - ids based on location - load balancing hard - not discussed here
Proximity: Performance Results Basic assumptions in this evaluation –evaluation based on recursive routing –geometries have little effect, support for PNS/PRS essential –performance of proximity methods depends on topology and its latency characteristics good approximation by looking at the latency distribution for a typical node –Hybrid left out, Butterfly not applicable –PNS/PRS not expensive in terms of hopcounts
Proximity: PNS vs. PRS XOR and Ring support both PRS better than plain PNS far better (PNS 2 i neighbor options, PRS i options) PRS+PNS only a small improvement over PNS PNS(16) gives similar results 16 sequential neighbors improve PRS and Plain a little, but not significantly => PNS support most important
Proximity: More Results Does geometry matter? –Tree: PRS n/a –Hypercube: PNS n/a –performance of pairs very close => PNS/PRS support matters, not geometry Absolute performance –depends on latency distribution => overlay latencies can be reduced to small multiples of underlying Internet path latencies
Local Convergence (1/2) messages sent from two nearby nodes converge at a node near the sources => low latencies or bandwidth savings in e.g. - overlay multicast - caching - server selection measured by number of exit points PNS vs. PRS: PNS nearly optimal, PRS not effective
Local Convergence (2/2) PNS(16) and PRS equally ineffective PNS(16)+PRS better => PRS more relevant with limited samples and domain sizes => geometries don’t matter, PNS/PRS support is the key
Summary geometry constrains other design choices flexibility important (neighbor & next-hop selection) Ring & XOR are flexible: support for both PNS and PRS Why not Ring? –flexibility –natural support for sequential neighbors –tested well