University of Oregon Slides from Gotz and Wehrle + Chord paper CHORD, CAN, Pastry Chapter 8: Distributed Hash Tables Part II of III: Selected DHT Algorithms *Original slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen) Peer-to-Peer Systems and Applications
X. Overview Chord Topology Routing Self-Organization Pastry CAN (Content Addressable Network) Symphony Viceroy Kademlia Summary
Chord Morris, Liben-Nowell, Karger, Kaashoek, Dabek, Balakrishnan (MIT) and Stoica (Berkeley)
Early and successful algorithm (2001) Simple & elegant Chord: Overview Early and successful algorithm (2001) Simple & elegant easy to understand and implement many improvements and optimizations exist Main responsibilities: Routing Flat logical address space: B-bit identifiers instead of IP addresses Efficient routing in large systems: log(N) hops with N total nodes Self-organization Handle node arrival, departure, and failure
Chord: Consistent Hash and Chord IDs Consistent hash function E.g. SHA-1, 160-bit output → 0 <= identifier < 2^16 Load-balanced Resilient to joins and leaves Chord Identifiers Key ID associated with data item (where to store the data) E.g. key = sha-1(value) Node ID associated with host (where host sits in the Chord ring) E.g. id = sha-1 (IP address, port) Insert and Retrieve from the Chord DHT put (key, value) inserts data to Chord Value = get (key) retrieves data from Chord
Chord: Topology (example with 3 bit Ids) Keys and IDs on ring, i.e., all arithmetic modulo 2^160 (key, value) pairs managed by clockwise next node: If key = SHA-1(value) it is stored at the first node whose ID is equal to or follows that of key. This is called the successor(key). 6 4 2 6 5 1 3 7 1 successor(1) = 1 Chord Ring successor(6) = 0 6 2 successor(2) = 3 2 Identifier Node X Key
Topology determined by links between nodes Chord: Topology Topology determined by links between nodes Link: knowledge about another node Stored in routing table on each node Simplest topology: circular linked list Each node has link to clockwise next node 4 2 6 5 1 3 7
Chord: Routing Primitive routing: Pros: Cons: Forward query for key x until successor(x) is found Return result to source of query Pros: Simple Little node state Cons: Poor lookup efficiency: O(1/2 * N) hops on average (with N nodes) Node failure breaks circle 6 Node 0 4 2 6 5 1 3 7 1 Key 6? 2
Simple lookup algorithm Lookup(my-id, key-id) n = my successor if my-id < n < key-id call Lookup(id) on node n // next hop else return my successor // done Correctness depends only on successors Always undershoots to predecessor. So never misses the real successor. Lookup procedure isn’t inherently log(n). But finger table causes it to be.
Chord: Routing Advanced routing: Scalable routing: Store links to z next neighbors Forward queries for k to farthest known predecessor of k For z = N: fully meshed routing system Lookup efficiency: O(1) Per-node state: O(N) Still poor scalability Scalable routing: Linear routing progress scales poorly Mix of short- and long-distance links required: Accurate routing in node’s vicinity Fast routing progress over large distances Bounded number of links per node
Chord’s routing table: finger table Chord: Routing Chord’s routing table: finger table Stores log(N) links per node Covers exponentially increasing distances: Node n: entry i points to successor(n + 2^i) (i-th finger) 1 2 4 3 finger table start succ. keys 6 i 4 2 6 5 1 3 7 finger table i succ. keys 1 2 3 start 5 finger table i succ. keys 2 1 start 4 5 7
Chord: Routing Chord’s routing algorithm: Each node n forwards query for key k clockwise To farthest finger preceding k Until n = predecessor(k) and successor(n) = successor(k) Return successor(n) to source of query 63 60 4 56 7 54 52 13 lookup (44) 14 lookup (44) = 45 49 49 16 19 45 45 44 42 42 23 39 26 37 30 33
Chord: Self-Organization Handle changing network environment Failure of nodes Network failures Arrival of new nodes Departure of participating nodes Maintain consistent system state for routing Keep routing information up to date Routing correctness depends on correct successor information Routing efficiency depends on correct finger tables Failure tolerance required for all operations
Chord: Failure Tolerance: Storage Layered design Chord DHT mainly responsible for routing Data storage managed by application persistence consistency fairness Chord soft-state approach: Nodes delete (key, value) pairs after timeout Applications need to refresh (key, value) pairs periodically Worst case: data unavailable for refresh interval after node failure
Chord: Failure Tolerance: Routing Finger failures during routing query cannot be forwarded to finger forward to previous finger (do not overshoot destination node) trigger repair mechanism: replace finger with its successor Active finger maintenance periodically check liveness of fingers replace with correct nodes on failures trade-off: maintenance traffic vs. correctness & timeliness 63 60 4 56 7 54 52 13 14 49 49 16 19 45 45 44 42 42 23 39 26 37 30 33
Chord: Failure Tolerance: Routing Successor failure during routing Last step of routing can return failed node to source of query -> all queries for successor fail Store n successors in successor list successor[0] fails -> use successor[1] etc. routing fails only if n consecutive nodes fail simultaneously Active maintenance of successor list periodic checks similar to finger table maintenance crucial for correct routing
Construct finger table via standard routing/lookup() Chord: Node Arrival New node picks ID Contact existing node Construct finger table via standard routing/lookup() Retrieve (key, value) pairs from successor finger table keys i start succ. 6 1 2 1 2 4 1 3 4 2 6 5 1 3 7 finger table i succ. keys 1 2 3 start 5 7 2 3 finger table start succ. keys 1 i finger table i succ. keys 2 1 start 4 5 7
Examples for choosing new node IDs Chord: Node Arrival Examples for choosing new node IDs random ID: equal distribution assumed but not guaranteed hash IP address & port place new nodes based on load on existing nodes geographic location, etc. Retrieval of existing node IDs Controlled flooding DNS aliases Published through web etc. ID = hash(182.84.10.23) = 6 4 2 6 5 1 3 7 entrypoint.chord.org? 182.84.10.23 DNS
Construction of finger table Chord: Node Arrival Construction of finger table iterate over finger table rows for each row: query entry point for successor standard Chord routing on entry point Construction of successor list add immediate successor from finger table request successor list from successor successor list 1 3 4 2 6 5 1 3 7 1 succ(7) = 0 succ(0) = 0 succ(2) = 3 succ(7)? succ(0)? succ(2)? finger table keys i start succ. 1 2 7 2 3 successor list
Deliberate node departure For simplicity: treat as failure Chord: Node Departure Deliberate node departure clean shutdown instead of failure For simplicity: treat as failure system already failure tolerant soft state: automatic state restoration state is lost briefly invalid finger table entries: reduced routing efficiency For efficiency: handle explicitly notification by departing node to successor, predecessor, nodes at finger distances copy (key, value) pairs before shutdown
Impact of node failures on lookup failure rate Chord: Performance Impact of node failures on lookup failure rate lookup failure rate roughly equivalent to node failure rate
Chord: Performance Moderate impact of number of nodes on lookup latency Consistent average path length
Lookup latency (number of hops/messages): ~ 1/2 log2(N) Chord: Performance Lookup latency (number of hops/messages): ~ 1/2 log2(N) Confirms theoretical estimation Lookup Path Length Number of Nodes
Many improvements published Chord: Summary Complexity Messages per lookup: O(log N) Memory per node: O(log N) Messages per management action (join/leave/fail): O(log² N) Advantages Theoretical models and proofs about complexity Simple & flexible Disadvantages No notion of node proximity and proximity-based routing optimizations Chord rings may become disjoint in realistic settings Many improvements published e.g. proximity, bi-directional links, load balancing, etc.