Sylvia Ratnasamy (UC Berkley Dissertation 2002) Paul Francis Mark Handley Richard Karp Scott Shenker A Scalable, Content Addressable Network Slides by.

Slides:



Advertisements
Similar presentations
Topology-Aware Overlay Construction and Server Selection Sylvia Ratnasamy Mark Handley Richard Karp Scott Shenker Infocom 2002.
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
Topologically-Aware Overlay Construction and Server Selection Sylvia Ratnasamy, Mark Handly, Richard Karp and Scott Shenker Presented by Shreeram Sahasrabudhe.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
A Scalable Content Addressable Network (CAN)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.
A Scalable Content Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker Presented by: Ilya Mirsky, Alex.
Fault-tolerant Routing in Peer-to-Peer Systems James Aspnes Zoë Diamadi Gauri Shah Yale University PODC 2002.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Distributed Lookup Systems
1 A Scalable Content- Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Proceedings of ACM SIGCOMM ’01 Sections: 3.5 & 3.7.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.
A Scalable Content-Addressable Network
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Peer-to-Peer Networks and Distributed Hash Tables 2006.
CONTENT ADDRESSABLE NETWORK Sylvia Ratsanamy, Mark Handley Paul Francis, Richard Karp Scott Shenker.
Applied Research Laboratory David E. Taylor A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Content Addressable Networks CAN is a distributed infrastructure, that provides hash table-like functionality on Internet-like scales. Keys hashed into.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Peer to Peer Network Design Discovery and Routing algorithms
Topologically-Aware Overlay Construction and Sever Selection Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker.
Peer-to-Peer Networks 03 CAN (Content Addressable Network) Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA
EE 122: Peer-to-Peer (P2P) Networks
A Scalable content-addressable network
CONTENT ADDRESSABLE NETWORK
A Scalable, Content-Addressable Network
Peer-to-Peer Networks and Distributed Hash Tables
A Scalable Content Addressable Network
A Scalable, Content-Addressable Network
Presentation transcript:

Sylvia Ratnasamy (UC Berkley Dissertation 2002) Paul Francis Mark Handley Richard Karp Scott Shenker A Scalable, Content Addressable Network Slides by Authors above with additions/modificatons by Michael Isaacs (UCSB) Tony Hannan (Georgia Tech)

Outline Introduction Design Design Improvements Evalution Related Work Conclusion

Hash tables – essential building block in software systems Internet-scale distributed hash tables – equally valuable to large-scale distributed systems peer-to-peer systems – Napster, Gnutella, Groove, FreeNet, MojoNation… large-scale storage management systems – Publius, OceanStore, PAST, Farsite, CFS... mirroring on the Web Internet-scale hash tables

Content-Addressable Network (CAN) CAN: Internet-scale hash table Interface – insert (key, value) – value = retrieve (key) Properties – scalable – operationally simple – good performance Related systems: Chord, Plaxton/Tapestry...

Problem Scope provide the hashtable interface scalability robustness performance

Not in scope security anonymity higher level primitives – keyword searching – mutable content

Outline Introduction Design Design Improvements Evalution Related Work Conclusion

K V CAN: basic idea K V

CAN: basic idea insert (K 1,V 1 ) K V

CAN: basic idea insert (K 1,V 1 ) K V

CAN: basic idea (K 1,V 1 ) K V

CAN: basic idea retrieve (K 1 ) K V

CAN: solution Virtual Cartesian coordinate space Entire space is partitioned amongst all the nodes – every node “owns” a zone in the overall space Abstraction – can store data at “points” in the space – can route from one “point” to another Point = node that owns the enclosing zone

CAN: simple example 1

12

1 2 3

I

node I::insert(K,V) I

(1) a = h x (K) CAN: simple example x = a node I::insert(K,V) I

(1) a = h x (K) b = h y (K) CAN: simple example x = a y = b node I::insert(K,V) I

(1) a = h x (K) b = h y (K) CAN: simple example (2) route(K,V) -> (a,b) node I::insert(K,V) I

CAN: simple example (2) route(K,V) -> (a,b) (3) (a,b) stores (K,V) (K,V) node I::insert(K,V) I (1) a = h x (K) b = h y (K)

CAN: simple example (2) route “retrieve(K)” to (a,b) (K,V) (1) a = h x (K) b = h y (K) node J ::retrieve(K) J

Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address) CAN

CAN: routing table

CAN: routing (a,b) (x,y)

A node only maintains state for its immediate neighboring nodes CAN: routing

CAN: node insertion Bootstrap node 1) Discover some node “I” already in CAN new node

CAN: node insertion I new node 1) Discover some node “I” already in CAN

CAN: node insertion 2) pick random point in space I (p,q) new node

CAN: node insertion (p,q) 3) I routes to (p,q), discovers node J I J new node

CAN: node insertion new J 4) split J’s zone in half… new owns one half

Inserting a new node affects only a single other node and its immediate neighbors CAN: node insertion

CAN: node failures Need to repair the space – recover database state updates use replication, rebuild database from replicas – repair routing takeover algorithm

CAN: takeover algorithm Nodes periodically send update message to neigbors An absent update message indicates a failure – Each node send a recover message to the takeover node – The takeover node takes over the failed node's zone

CAN: takeover node The takeover node is the node whose VID is numerically closest to the failed nodes VID. The VID is a binary string indicating the path from the root to the zone in the binary partition tree

CAN: VID predecessor / successor To handle multiple neighbors failing simultaneously, each node maintains its predecessor and successor in the VID ordering (idea borrowed from Chord). This guarantees that the takeover node can always be found.

Only the failed node’s immediate neighbors are required for recovery CAN: node failures

Basic Design recap Basic CAN – completely distributed – self-organizing – nodes only maintain state for their immediate neighbors

Outline Introduction Design Design Improvements Evalution Related Work Conclusion

Design Improvement Objectives Reduce latency of CAN routing – Reduce CAN path length – Reduce per-hop latency (IP length/latency per CAN hop) Improve CAN robustness – data availability – routing

Design Improvement Tradeoffs Routing performance and system robustnes » vs. Increased per-node state and system complexity

Design Improvement Techniques 1. Increase number of dimensions in coordinate space 2. Multiple coordinate spaces (realities) 3. Better CAN routing metrics (measure & consider time) 4. Overloading coordinate zones (peer nodes) 5. Multiple hash functions 6. Geographic-sensitive construction of CAN overlay network (distributed binning) 7. More uniform partitioning (when new nodes enter) 8. Caching & Replication techniques for “hot-spots” 9. Background load balancing algorithm

1. Increase dimensions

2. Multiple realities Multiple (r) coordinate spaces Each node maintains a different zone in each reality Contents of hash table replicated on each reality Routing chooses neighbor closest in any reality (can make large jumps towards target) Advantages: – Greater data availability – Fewer hops

Multiple realities

Multiple dimensions vs realities

3. Time-sensitive routing Each node measures round-trip time (RTT) to each of its neighbors Routing message is forwarded to the neighbor with the maximum ratio of progress to RTT Advantage: – reduces per-hop latency (25-40% depending on # dimensions)

4. Zone overloading Node maintains list of “peers” in same zone Zone not split until MAXPEERS are there Only least RTT neighbor maintained Replication or splitting of data across peers Advantages – Reduced per hop latency – Reduced path length (because sharing zones has same effect as reducing total number of nodes) – Improved fault tolerance (a zone is vacant only when all nodes fail)

Zone overload Avg of 2 8 to 2 18 nodes

5. Multiple hash (key, value) mapped to “k” multiple points (and nodes) Parallel query execution on “k” paths Or choose closest one in space Advantages: – improve data availability – reduce path length

Multiple hash functions

6. Geographically distributed binning Distributed binning of CAN nodes based on their relative distances from m IP landmarks (such as root DNS servers) m! partitions of coordinate space New node joins partition associated with its landmark ordering Topological close nodes will be assigned to same partition Nodes in same partition divide the space up randomly Disadvantage: coordinate space is no longer uniformily populated – background load balancing could alleviate this problem

Distributed binning

7. More uniform partitioning When adding a node to a zone, choose neighbor with largest volume to split Advantage – 85% of nodes have the average load as opposed to 40% without, and largest zone volume dropped from 8x to 2x the average

8. Caching & Replication A node can cache values of keys it has recently requested. An node can push out “hot” keys to neighbors.

9. Load balancing Merge and split zones to balance loads

Outline Introduction Design Design Improvements Evalution Related Work Conclusion

Evaluation Metrics 1. Path length 2. Neighbor state 3. Latency 4. Volume 5. Routing fault tolerance 6. Data availability

CAN: scalability For a uniformly partitioned space with n nodes and d dimensions – per node, number of neighbors is 2d – average routing path is (dn 1/d ) hops Simulations show that the above results hold in practice (Transit-Stub topology: GT-ITM gen) Can scale the network without increasing per- node state

CAN: latency latency stretch = (CAN routing delay) (IP routing delay) To reduce latency, apply design improvements: – increase dimensions (d) – multiple nodes per zone (peer nodes) (p) – multiple “realities” ie coordinate spaces (r) – multiple hash functions (k) – use RTT weighted routing – distributed Binning – uniform Partitioning – caching & replication (for “hot spots”)

Overall improvement

CAN: Robustness Completely distributed – no single point of failure Not exploring database recovery Resilience of routing – can route around trouble

Routing resilience destination source

Routing resilience

destination

Routing resilience

Node X::route(D) If (X cannot make progress to D) – check if any neighbor of X can make progress – if yes, forward message to one such nbr Routing resilience

Outline Introduction Design Design Improvements Evalution Related Work Conclusion

Related Algorithms Distance Vector (DV) and Link State (LS) in IP routing – require widespread dissemination of topology – not well suited to dynamic topology like file- sharing Hierarchical routing algorithms – too centralized, single points of stress Other DHTs – Plaxton, Tapestry, Pastry, Chord

Related Systems DNS OceanStore (uses Tapestry) Publius: Web-publishing system with high anonymity P2P File sharing systems – Napster, Gnutella, KaZaA, Freenet

Outline Introduction Design Design Improvements Evalution Related Work Conclusion

Consclusion: Purpose CAN – an Internet-scale hash table – potential building block in Internet applications

Conclusion: Scalability O(d) per-node state O(d * n^(1/d)) path length For d = (log n)/2, path lengh and per-node state is O(log n) like other DHTs

Conclusion: Latency With the main design improvements latency is less than twice the IP path latency

Conclusion: Robustness Decentralized, can route around trouble With certain design improvements (replication), high data availability

Weaknesses / Future Work Security – Denial of service attacks hard to combat because a malicious node can act as a malicious server as well as a malicious client Mutable content Search