Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.

Similar presentations


Presentation on theme: "Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007."— Presentation transcript:

1 Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007

2 ECE 1770 – Content-Addressable Networks High-Level Overview Hash tables (map keys to values) are heavily used in building software applications The concept of a Content-Addressable Network (CAN) provides hash table-like functionality on Internet-like scales. CAN is: Scalable Robust/Fault-tolerant Self-organizing Low-latency

3 ECE 1770 – Content-Addressable Networks Hash Tables and CAN A data structure that efficiently maps keys onto values CANs are a form of distributed, Internet-scale hash tables.

4 ECE 1770 – Content-Addressable Networks What CAN would do for us CAN would improve peer-to-peer systems Napster: the process of locating a file is centralized Expensive to scale the central repository, single point of failure Gnutella: decentralized the file location process (network self-organizes into an application layer mesh) Requests for files are done through flooding, not scalable, may not find content Conclusion: P2P systems need a scalable indexing mechanism CAN would improve large data repositories These systems need efficient insertion and retrieval CAN would create large-scale name resolution services that don’t use a naming scheme (ie. Not DNS) No more location-dependent naming schemes

5 ECE 1770 – Content-Addressable Networks Basic Operations Performed On CANs Basic Operations Insertion (of key,value pairs) Lookup (of key,value pairs) Deletion (of key,value pairs) Each CAN stores 1. A piece (called a zone) of the entire hash table 2. Holds information about a small number of adjacent zones in the table Routing in a CAN Done by intermediate CAN nodes towards the CAN node whose zone contains that key CAN Design is Distributed (requires no centralized control or coordination) Scalable (nodes hold only a small about of information that doesn’t grow with the network) Fault-tolerant (nodes can route around failures) Doesn’t require a naming hierarchy Is entirely Application Layer

6 ECE 1770 – Content-Addressable Networks CAN Design Centers around a virtual d-dimensional Cartesian coordinate space on a d-torus At any time, the entire coordinate space is dynamically partitioned among all the nodes in the system Each node owns a distinct zone

7 ECE 1770 – Content-Addressable Networks CAN Design (2) 1. To store a pair, key K 1 is mapped to P via a uniform hash function 2. The pair is then stored at the node that owns the zone where P lies 3. To retrieve an entry corresponding to K 1, any node can apply the same hash function to map K 1 to P and get the corresponding value A node learns and maintains the IP addresses of those nodes that hold adjoining coordinate zones Efficient routing is critical to a useful CAN

8 ECE 1770 – Content-Addressable Networks Routing in a CAN Routing in a Content Addressable Networks works by following the straight line path through the Cartesian space from source to destination coordinates. A CAN node maintains a coordinate routing table that holds the IP address and virtual coordinate zone of each of its immediate neighbors in the coordinate space. Average Path Length = (d/4)(n 1/d ) Individual Nodes Have 2d Neighbors Average Path Length Grows As O(n 1/d )

9 ECE 1770 – Content-Addressable Networks Construction of a CAN Overlay The entire CAN space is divided amongst the nodes currently in the system Incremental construction process takes three steps The new node finds a node already in the CAN Using the CAN routing mechanisms, finds a node whose zone will be split The neighbors of the split zone must be notified so that routing can include the new node Bootstrapping: There are CAN bootstrap nodes associated to a DNS domain name Node Insertion Affects Only O(number of dimensions) existing nodes

10 ECE 1770 – Content-Addressable Networks Maintenance of a CAN Overlay Node Graceful Departure: node explicitly hands over its zone and the associated (key,value) database to one of its neighbors Node Abrupt Disappearance: An immediate takeover algorithm ensures one of the “failed” node’s neighbors takes over the zone Under normal conditions, a node sends periodic update messages to each of its neighbors and a list of neighbors and their zone coordinates. Prolonged absence of an update message from a neighbor signals it’s failure

11 ECE 1770 – Content-Addressable Networks Design Improvements Basic CAN algorithm provides Low per-node state (O(d) for a d-dimensional space) Short path lengths (O(dn 1/d ) hops for d dimensions and n nodes) The problem is that there are application- layer hops, not IP-layer hops Latency of each hop might be substantial

12 ECE 1770 – Content-Addressable Networks Design Improvements (2) Improvement: Multi-dimensioned Coordinate Spaces Increasing the dimensions of the CAN coordinate space reduces the routing path length and path latency for a small increase in the size of the coordinate routing table Path Length scales as O(d(n 1/d )) Fault-tolerance improves Improvement: Multiple Coordinate Spaces (a.k.a. Multiple Realities) Maintain multiple independent coordinate spaces with each node in the system being assigned a different zone in the coordinate space (each coordinate space is a reality) Fault-tolerance improves Low per-node state (O(d) for a d-dimensional space) Short path lengths (O(dn 1/d ) hops for d dimensions and n nodes) Which is better? Increasing the dimensions

13 ECE 1770 – Content-Addressable Networks Design Improvements (3) Improvement: Better CAN Routing Metrics Have each node measure the network-level round-trip-time RTT to each of its neighbors. Then route messages accordingly. Favors lower latency paths and avoids unnecessarily long hops Improvement: Caching and Replication A CAN node can maintain a cache of the data keys it recently accessed A CAN node can replicate the data key at each of its neighboring nodes Both schemes need an associated time-to-live field, to eventually expire from the cache

14 ECE 1770 – Content-Addressable Networks Related Systems Domain Name System CANs are more general than the DNS because DNS closely ties the naming scheme to the manner in which a name is resolved to an IP address Peer-to-Peer A simple example is keys being analogous to a URL Will improve robustness Key difference is that content within the CAN can always be located by any other node because there is a clear “home” (point) in the CAN for that content and every other node knows what the home is how to reach it

15 ECE 1770 – Content-Addressable Networks Discussion Security? Better or worse with CAN? Any Other Design Improvement? Is The Communication Overhead Significant?

16 ECE 1770 – Content-Addressable Networks References A Scalable Content-Addressable Network, Ratnasamy, University of California – Berkeley, http://www.sigcomm.org/sigcomm2001/p13- ratnasamy.pdfhttp://www.sigcomm.org/sigcomm2001/p13- ratnasamy.pdf


Download ppt "Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007."

Similar presentations


Ads by Google