Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks 1 23 1,231 1

Slide Credits (for UO CIS 410/510) Ratnasamy et.al. from SIGCOMM ‘01 Ken Birman from CIS 514 Cornell

CAN: solution Virtual d-dimensional Cartesian coordinate space entire space is partitioned amongst all the nodes –every node “owns” a zone in the overall space –point = node that owns the enclosing zone State per node is O(d) Routing between nodes is O(d * n 1/d ), d dimensions, n nodes assuming evenly distributed nodes

CAN Virtual Space Virtual d-dimensional Cartesian coordinate system on a d-torus –Example: 2-d [0,1]x[1,0] –Note: coordinates can be rational numbers !! –Infinitely expandable Subspaces (zones) dynamically partitioned among all nodes

CAN: simple example 1

CAN: simple example (vertical split first) 12

CAN: simple example (horizontal split next) 1 2 3

CAN: simple example 1 2 3 4

Storing and Retrieving (K,V) in CAN Pair (K,V) is stored by mapping key K to a point P in the space using a uniform hash function and storing (K,V) at the node in the zone containing P Retrieve entry (K,V) by applying the same hash function to map K to P and retrieve entry from node in zone containing P –If P is not contained in the zone of the requesting node or its neighboring zones, route request to neighbor node in zone nearest P

CAN: simple example I

node I::insert(K,V) I

(1) a = h x (K) CAN: simple example x = a node I::insert(K,V) I

(1) a = h x (K) b = h y (K) CAN: simple example x = a y = b node I::insert(K,V) I

(1) a = h x (K) b = h y (K) CAN: simple example (2) route(K,V) -> (a,b) node I::insert(K,V) I

CAN: simple example (2) route(K,V) -> (a,b) (3) (a,b) stores (K,V) (K,V) node I::insert(K,V) I (1) a = h x (K) b = h y (K)

CAN: simple example (2) route “retrieve(K)” to (a,b) (K,V) (1) a = h x (K) b = h y (K) node J::retrieve(K) J

Routing in a CAN Each node maintains a table of the IP address and virtual coordinate zone of each local neighbor Follow path through the Cartesian space from source to destination coordinates Use greedy routing to neighbor closest to destination For d-dimensional space partitioned into n equal zones, nodes maintain 2d neighbors –Average routing path length:

CAN: routing table

CAN: routing ??? (a,b) (x,y)

CAN: node insertion Bootstrap node 1) Discover some node “I” already in CAN new node

CAN: node insertion I new node 1) discover some node “I” already in CAN

CAN: node insertion 2) pick random point in space I (p,q) new node

CAN: node insertion (p,q) 3) I routes to (p,q), discovers node J I J new node

CAN: node insertion new J 4) split J’s zone in half… new owns one half

CAN: node failures Need to repair the space –recover database soft-state updates use replication, rebuild database from replicas –repair routing takeover algorithm

CAN: takeover algorithm Simple failures –know your neighbor’s neighbors –when a node fails, one of its neighbors takes over its zone More complex failure modes –simultaneous failure of multiple adjacent nodes –scoped flooding to discover neighbors Only the failed node’s immediate neighbors are required for recovery

Evaluation Scalability Low-latency Load balancing Robustness

CAN: scalability For a uniformly partitioned space with n nodes and d dimensions –per node, number of neighbors is 2d –average routing path is (dn 1/d )/4 hops –simulations show that the above results hold in practice Can scale the network without increasing per-node state. Unlimited growth possible. Chord/Plaxton/Tapestry/Buzz –log(n) nbrs with log(n) hops

CAN: low-latency Problem –latency stretch = (CAN routing delay) (IP routing delay) –application-level routing may lead to high stretch Solution –increase dimensions –heuristics RTT-weighted routing multiple nodes per zone (peer nodes) deterministically replicate entries

CAN: low-latency #nodes Latency stretch 16K32K65K131K #dimensions = 2 w/o heuristics w/ heuristics

CAN: low-latency #nodes Latency stretch 16K32K65K131K #dimensions = 10 w/o heuristics w/ heuristics

CAN: load balancing Two pieces –Dealing with hot-spots popular (key,value) pairs nodes cache recently requested entries overloaded node replicates popular entries at neighbors –Uniform coordinate space partitioning uniformly spread (key,value) entries uniformly spread out routing load

Uniform Partitioning Added check –at join time, pick a zone –check neighboring zones –pick the largest zone and split that one

Uniform Partitioning V2V4V8V Volume Percentage of nodes w/o check w/ check V = total volume n V 16 V 8 V 4 V 2 65,000 nodes, 3 dimensions

CAN: Robustness Completely distributed –no single point of failure Not exploring database recovery Resilience of routing –can route around trouble

Routing resilience destination source

Routing resilience

destination

Routing resilience

Node X::route(D) If (X cannot make progress to D) –check if any neighbor of X can make progress –if yes, forward message to one such nbr Routing resilience

dimensions Pr(successful routing) CAN size = 16K nodes Pr(node failure) = 0.25

Routing resilience CAN size = 16K nodes #dimensions = 10 Pr(node failure) Pr(successful routing)

Topologically Sensitive Overlay Construction for CAN: Distributed Binning Idea –well known set of landmark machines –each CAN node, measures its RTT to each landmark –orders the landmarks in order of increasing RTT –Nodes are sorted into bins based on the landmark order CAN construction –place nodes from the same bin close together on the CAN

Distributed Binning Basic Binning Example 5 landmark machines: L1, L2, L3, L4, L5 CAN node pings the 5 landmarks and orders them from closest to farthest based on RTT bin label is 21453 which is one of 5! possible orderings Bin label gets translated into CAN coordinates Divide into 5! Subspaces by cycling through the dimensions XYZXYZXYZ … gets divided into m, m-1, m-2, … portions Thus, X axis divided into 5*2, Y divided into 4*1, Z divided into 3.

Sophisticated Distributed Binning Divide RTTs into ranges: –Level 0: [0,100) ms –Level 1: [100,200) ms,etc. Node bin includes the order of the landmarks and level: E.g. Node A’s distance to landmarks L1, L2,L3 is 232 ms, 51ms, and 117 ms. I Bin ordering is L2, L3, L1. Bin + level level ordering is 012.

Distributed Binning –4 Landmarks (placed at 5 hops away from each other) – naïve partitioning number of nodes 256 1K4K latency Stretch 5 10 15 20 256 1K4K  w/o binning w/ binning w/o binning w/ binning #dimensions=2#dimensions=4

Fault Tolerance: Multiple Hash Functions Improve data availability by using k hash functions to map a single key to k points in the coordinate space Replicate (K,V) and store at k distinct nodes (K,V) is only unavailable when all k replicas are simultaneously unavailable Authors suggest querying all k nodes in parallel to reduce average lookup latency

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.

Similar presentations

Presentation on theme: "Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.

Similar presentations

Presentation on theme: "Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks."— Presentation transcript:

Similar presentations

About project

Feedback