Download presentation
Presentation is loading. Please wait.
Published byEdwin Hines Modified over 9 years ago
1
Sylvia Ratnasamy (UC Berkley Dissertation 2002) Paul Francis Mark Handley Richard Karp Scott Shenker A Scalable, Content Addressable Network Slides by Authors above with additions/modificatons by Michael Isaacs (UCSB) Tony Hannan (Georgia Tech)
2
Outline Introduction Design Design Improvements Evalution Related Work Conclusion
3
Hash tables – essential building block in software systems Internet-scale distributed hash tables – equally valuable to large-scale distributed systems peer-to-peer systems – Napster, Gnutella, Groove, FreeNet, MojoNation… large-scale storage management systems – Publius, OceanStore, PAST, Farsite, CFS... mirroring on the Web Internet-scale hash tables
4
Content-Addressable Network (CAN) CAN: Internet-scale hash table Interface – insert (key, value) – value = retrieve (key) Properties – scalable – operationally simple – good performance Related systems: Chord, Plaxton/Tapestry...
5
Problem Scope provide the hashtable interface scalability robustness performance
6
Not in scope security anonymity higher level primitives – keyword searching – mutable content
7
Outline Introduction Design Design Improvements Evalution Related Work Conclusion
8
K V CAN: basic idea K V
9
CAN: basic idea insert (K 1,V 1 ) K V
10
CAN: basic idea insert (K 1,V 1 ) K V
11
CAN: basic idea (K 1,V 1 ) K V
12
CAN: basic idea retrieve (K 1 ) K V
13
CAN: solution Virtual Cartesian coordinate space Entire space is partitioned amongst all the nodes – every node “owns” a zone in the overall space Abstraction – can store data at “points” in the space – can route from one “point” to another Point = node that owns the enclosing zone
14
CAN: simple example 1
15
12
16
1 2 3
17
1 2 3 4
19
I
20
node I::insert(K,V) I
21
(1) a = h x (K) CAN: simple example x = a node I::insert(K,V) I
22
(1) a = h x (K) b = h y (K) CAN: simple example x = a y = b node I::insert(K,V) I
23
(1) a = h x (K) b = h y (K) CAN: simple example (2) route(K,V) -> (a,b) node I::insert(K,V) I
24
CAN: simple example (2) route(K,V) -> (a,b) (3) (a,b) stores (K,V) (K,V) node I::insert(K,V) I (1) a = h x (K) b = h y (K)
25
CAN: simple example (2) route “retrieve(K)” to (a,b) (K,V) (1) a = h x (K) b = h y (K) node J ::retrieve(K) J
26
Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address) CAN
27
CAN: routing table
28
CAN: routing (a,b) (x,y)
29
A node only maintains state for its immediate neighboring nodes CAN: routing
30
CAN: node insertion Bootstrap node 1) Discover some node “I” already in CAN new node
31
CAN: node insertion I new node 1) Discover some node “I” already in CAN
32
CAN: node insertion 2) pick random point in space I (p,q) new node
33
CAN: node insertion (p,q) 3) I routes to (p,q), discovers node J I J new node
34
CAN: node insertion new J 4) split J’s zone in half… new owns one half
35
Inserting a new node affects only a single other node and its immediate neighbors CAN: node insertion
36
CAN: node failures Need to repair the space – recover database state updates use replication, rebuild database from replicas – repair routing takeover algorithm
37
CAN: takeover algorithm Nodes periodically send update message to neigbors An absent update message indicates a failure – Each node send a recover message to the takeover node – The takeover node takes over the failed node's zone
38
CAN: takeover node The takeover node is the node whose VID is numerically closest to the failed nodes VID. The VID is a binary string indicating the path from the root to the zone in the binary partition tree. 0 11 100 101 0 1 10 100 101 11
39
CAN: VID predecessor / successor To handle multiple neighbors failing simultaneously, each node maintains its predecessor and successor in the VID ordering (idea borrowed from Chord). This guarantees that the takeover node can always be found.
40
Only the failed node’s immediate neighbors are required for recovery CAN: node failures
41
Basic Design recap Basic CAN – completely distributed – self-organizing – nodes only maintain state for their immediate neighbors
42
Outline Introduction Design Design Improvements Evalution Related Work Conclusion
43
Design Improvement Objectives Reduce latency of CAN routing – Reduce CAN path length – Reduce per-hop latency (IP length/latency per CAN hop) Improve CAN robustness – data availability – routing
44
Design Improvement Tradeoffs Routing performance and system robustnes » vs. Increased per-node state and system complexity
45
Design Improvement Techniques 1. Increase number of dimensions in coordinate space 2. Multiple coordinate spaces (realities) 3. Better CAN routing metrics (measure & consider time) 4. Overloading coordinate zones (peer nodes) 5. Multiple hash functions 6. Geographic-sensitive construction of CAN overlay network (distributed binning) 7. More uniform partitioning (when new nodes enter) 8. Caching & Replication techniques for “hot-spots” 9. Background load balancing algorithm
46
1. Increase dimensions
47
2. Multiple realities Multiple (r) coordinate spaces Each node maintains a different zone in each reality Contents of hash table replicated on each reality Routing chooses neighbor closest in any reality (can make large jumps towards target) Advantages: – Greater data availability – Fewer hops
48
Multiple realities
49
Multiple dimensions vs realities
50
3. Time-sensitive routing Each node measures round-trip time (RTT) to each of its neighbors Routing message is forwarded to the neighbor with the maximum ratio of progress to RTT Advantage: – reduces per-hop latency (25-40% depending on # dimensions)
51
4. Zone overloading Node maintains list of “peers” in same zone Zone not split until MAXPEERS are there Only least RTT neighbor maintained Replication or splitting of data across peers Advantages – Reduced per hop latency – Reduced path length (because sharing zones has same effect as reducing total number of nodes) – Improved fault tolerance (a zone is vacant only when all nodes fail)
52
Zone overload Avg of 2 8 to 2 18 nodes
53
5. Multiple hash (key, value) mapped to “k” multiple points (and nodes) Parallel query execution on “k” paths Or choose closest one in space Advantages: – improve data availability – reduce path length
54
Multiple hash functions
55
6. Geographically distributed binning Distributed binning of CAN nodes based on their relative distances from m IP landmarks (such as root DNS servers) m! partitions of coordinate space New node joins partition associated with its landmark ordering Topological close nodes will be assigned to same partition Nodes in same partition divide the space up randomly Disadvantage: coordinate space is no longer uniformily populated – background load balancing could alleviate this problem
56
Distributed binning
57
7. More uniform partitioning When adding a node to a zone, choose neighbor with largest volume to split Advantage – 85% of nodes have the average load as opposed to 40% without, and largest zone volume dropped from 8x to 2x the average
58
8. Caching & Replication A node can cache values of keys it has recently requested. An node can push out “hot” keys to neighbors.
59
9. Load balancing Merge and split zones to balance loads
60
Outline Introduction Design Design Improvements Evalution Related Work Conclusion
61
Evaluation Metrics 1. Path length 2. Neighbor state 3. Latency 4. Volume 5. Routing fault tolerance 6. Data availability
62
CAN: scalability For a uniformly partitioned space with n nodes and d dimensions – per node, number of neighbors is 2d – average routing path is (dn 1/d ) hops Simulations show that the above results hold in practice (Transit-Stub topology: GT-ITM gen) Can scale the network without increasing per- node state
63
CAN: latency latency stretch = (CAN routing delay) (IP routing delay) To reduce latency, apply design improvements: – increase dimensions (d) – multiple nodes per zone (peer nodes) (p) – multiple “realities” ie coordinate spaces (r) – multiple hash functions (k) – use RTT weighted routing – distributed Binning – uniform Partitioning – caching & replication (for “hot spots”)
64
Overall improvement
65
CAN: Robustness Completely distributed – no single point of failure Not exploring database recovery Resilience of routing – can route around trouble
66
Routing resilience destination source
67
Routing resilience
68
destination
69
Routing resilience
70
Node X::route(D) If (X cannot make progress to D) – check if any neighbor of X can make progress – if yes, forward message to one such nbr Routing resilience
72
Outline Introduction Design Design Improvements Evalution Related Work Conclusion
73
Related Algorithms Distance Vector (DV) and Link State (LS) in IP routing – require widespread dissemination of topology – not well suited to dynamic topology like file- sharing Hierarchical routing algorithms – too centralized, single points of stress Other DHTs – Plaxton, Tapestry, Pastry, Chord
74
Related Systems DNS OceanStore (uses Tapestry) Publius: Web-publishing system with high anonymity P2P File sharing systems – Napster, Gnutella, KaZaA, Freenet
75
Outline Introduction Design Design Improvements Evalution Related Work Conclusion
76
Consclusion: Purpose CAN – an Internet-scale hash table – potential building block in Internet applications
77
Conclusion: Scalability O(d) per-node state O(d * n^(1/d)) path length For d = (log n)/2, path lengh and per-node state is O(log n) like other DHTs
78
Conclusion: Latency With the main design improvements latency is less than twice the IP path latency
79
Conclusion: Robustness Decentralized, can route around trouble With certain design improvements (replication), high data availability
80
Weaknesses / Future Work Security – Denial of service attacks hard to combat because a malicious node can act as a malicious server as well as a malicious client Mutable content Search
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.