Download presentation
Presentation is loading. Please wait.
1
Peer-to-Peer Structured Overlay Networks
Antonino Virgillito
2
Background Peer-to-peer systems distribution
symmetry (communication, node roles) decentralized control self-organization dynamicity
3
Data Lookup in P2P Systems
Data items spread over a large number of nodes Which node stores which data item? A lookup mechanism needed Centralized directory -> bottleneck/single point of failure Query Flooding -> scalability concerns Need more structure!
4
More Issues Organize, maintain overlay network
node arrivals node failures Resource allocation/load balancing Resource location Network proximity routing
5
What is a Distributed HashTable?
Exactly that A service, distributed over multiple machines, with hash table semantics put(key, value), Value = get(key) Designed to work in a peer-to-peer (P2P) environment No central control Nodes under different administrative control But of course can operate in an “infrastructure” sense
6
What is a DHT? Hash table semantics: Key is a single flat string
put(key, value), Value = get(key) Key is a single flat string Limited semantics compared to keyword search Put() causes value to be stored at one (or more) peer(s) Get() retrieves value from a peer Put() and Get() accomplished with unicast routed messages In other words, it scales Other API calls to support application, like notification when neighbors come and go
7
Distributed Hash Tables (DHT)
nodes k1,v1 k2,v2 k3,v3 P2P overlay network Operations: put(k,v) get(k) k4,v4 k5,v5 k6,v6 p2p overlay maps keys to nodes completely decentralized and self-organizing robust, scalable
8
Popular DHTs Tapestry (Berkeley)
Based on Plaxton trees---similar to hypercube routing The first* DHT Complex and hard to maintain (hard to understand too!) CAN (ACIRI), Chord (MIT), and Pastry (Rice/MSR Cambridge) Second wave of DHTs (contemporary with and independent of each other)
9
DHTs Basics Node IDs can be mapped to the hash key space
Given a hash key as a “destination address”, you can route through the network to a given node Always route to the same node no matter where you start from Requires no centralized control (completely distributed) Small per-node state is independent of the number of nodes in the system (scalable) Nodes can route around failures (fault-tolerant)
10
Things to look at What is the structure?
How does routing work in the structure? How does it deal with node joins and departures (structure maintenance)? How does it scale? How does it deal with locality? What are the security issues?
11
The Chord Approach Consistent Hashing Logical Ring Finger Pointers
12
The Chord Protocol Provides:
A mapping successor: key -> node To lookup key K, go to node successor(K) successor defined using consistent hashing: Key hash Node hash Both Keys and Nodes hash to same (circular) identifier space successor(K)=first node with hash ID equal to or greater than hash(K)
13
Example: The Logical Ring
Nodes 0, 1, 3 Keys 1, 2, 6
14
Consistent Hashing [Karger et al. ‘97]
Some Nice Properties: Smoothness: minimal key movement on node join/leave Load Balancing: keys equitably distributed over nodes
15
Mapping Details Range of Hash Function
Circular ID space module 2m Compute 160 bit SHA-1 hash, and truncate to m-bits Chance of collision rare if m is large enough Deterministic, but hard for an adversary to subvert
16
Chord State Successor/Predecessor in the Ring Finger Pointers
n.finger[i] = successor (n+2 i-1) Each node knows more about portion of circle close to it!
17
Example: Finger Tables
18
Chord: routing protocol
Notation n.foo( ) stands for a remote call to node n. A set of nodes towards id are contacted remotely Each node is queried for the known node which is closest to id Process stops when a node is found having successor > id
19
Example: Chord Routing
Finger Pointers for Node 1
20
Lookup Complexity With high probability: O(log(N)) Proof Intuition:
Being p the successor of the targeted key, distance to p reduces by at least half in each step In m steps, would reach p Stronger claim: In O(log(N)) steps, distance ≤ 2m/N Thereafter even linear advance will suffice to give O(log(N)) lookup complexity
21
Chord invariants Every key in the network can be located as long as the following invariants are preserved after joins and leaves: Each node’s successor is correctly maintained For every key k, node successor(k) is responsible for k
22
Chord: Node Joins New node B learns of at least one existing node A via external means B asks A to lookup its finger-table information Given that B’s hash-id is b, A does lookup for B.finger[i] = successor ( b + 2i-1) if interval not already included in finger[i-1] B stores all finger information and sets up pred/succ pointers
23
Node Joins (contd.) Update of finger table of existing nodes p such that: p precedes b by at least 2i-1 the i-th finger of node p succeeds b Starts from p = predecessor( b - 2i-1 ) and proceeds in counter-clock-wise direction while 2. is true Transferring keys: Only from successor(b) to b Must send notification to the application
24
Example: finger table update
Node 6 joins
25
Example: transferring keys
Node 1 leaves
26
Concurrent Joins/Leaves
Need a stabilization protocol to guard against inconsistency Note: Incorrect finger pointers may only increase latency, but incorrect successor pointers may cause lookup failure! Nodes periodically run stabilization protocol Finds successor’s predecessor Repair if this isn’t self This algorithm is also run at join
27
Example: node 25 joins
28
Example: node 28 joins before 20 stabilizes (1)
29
Example: node 28 joins before 20 stabilizes (2)
30
CAN Virtual d-dimensional Cartesian coordinate system on a d-torus
Example: 2-d [0,1]x[1,0] Dynamically partitioned among all nodes Pair (K,V) is stored by mapping key K to a point P in the space using a uniform hash function and storing (K,V) at the node in the zone containing P Retrieve entry (K,V) by applying the same hash function to map K to P and retrieve entry from node in zone containing P If P is not contained in the zone of the requesting node or its neighboring zones, route request to neighbor node in zone nearest P
31
Routing in a CAN Follow straight line path through the Cartesian space from source to destination coordinates Each node maintains a table of the IP address and virtual coordinate zone of each local neighbor Use greedy routing to neighbor closest to destination For d-dimensional space partitioned into n equal zones, nodes maintain 2d neighbors Average routing path length: Mention favorable path length scaling without increasing per node state
32
CAN Construction Joining node locates a bootstrap node using the CAN DNS entry Bootstrap node provides IP addresses of random member nodes Joining node sends JOIN request to random point P in the Cartesian space Node in zone containing P splits the zone and allocates “half” to joining node (K,V) pairs in the allocated “half” are transferred to the joining node Joining node learns its neighbor set from previous zone occupant Previous zone occupant updates its neighbor set
33
Departure, Recovery and Maintenance
Graceful departure: node hands over its zone and the (K,V) pairs to a neighbor Network failure: unreachable node(s) trigger an immediate takeover algorithm that allocate failed node’s zone to a neighbor Detect via lack of periodic refresh messages Neighbor nodes start a takeover timer initialized in proportion to its zone volume Send a TAKEOVER message containing zone volume to all of failed node’s neighbors If received TAKEOVER volume is smaller kill timer, if not reply with a TAKEOVER message Nodes agree on neighbor with smallest volume that is alive
34
Pastry Generic p2p location and routing substrate
Self-organizing overlay network Lookup/insert object in < log16 N routing steps (expected) O(log N) per-node state Network proximity routing
35
Pastry: Object distribution
2128-1 O Consistent hashing 128 bit circular id space nodeIds (uniform random) objIds (uniform random) Invariant: node with numerically closest nodeId maintains object objId Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. nodeIds
36
Pastry: Object insertion/lookup
2128-1 O Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible X Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. Route(X)
37
Pastry: Routing table (# 65a1fc)
Row 0 Row 1 Row 2 Row 3 log16 N rows
38
Pastry: Leaf sets Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively. routing efficiency/robustness fault detection (keep-alive) application-specific local coordination
39
Pastry: Routing procedure
if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (Rld exists) forward to Rld forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node
40
Pastry: Routing Properties log16 N steps O(log N) state d471f1 d467c4
d462ba d46a1c d4213f Each node has a randomly assigned 128-bit nodeId, circular namespace Basic operation: A message with key X, sent by any Pastry node, is delivered to the live node with nodeId closest to X in at most log16 N steps (barring node failures). Pastry uses a form of generalized hypercube routing, where the routing tables are initialized and updated dynamically. Properties log16 N steps O(log N) state Route(d46a1c) d13da3 65a1fc
41
Pastry: Performance Integrity of overlay message delivery:
guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: No failures: < log16 N expected, 128/b + 1 max During failure recovery: O(N) worst case, average case much better
42
Pastry Join X = new node, A = bootstrap, Z = nearest node
A finds Z for X In process, A, Z, and all nodes in path send state tables to X X settles on own table Possibly after contacting other nodes X tells everyone who needs to know about itself
43
Pastry Leave Noticed by leaf set neighbors when leaving node doesn’t respond Neighbors ask highest and lowest nodes in leaf set for new leaf set Noticed by routing neighbors when message forward fails Immediately can route to another neighbor Fix entry by asking another neighbor in the same “row” for its neighbor If this fails, ask somebody a level up
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.