1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Peer to Peer and Distributed Hash Tables
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Application Layer Overlays IS250 Spring 2010 John Chuang.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Distributed Lookup Systems
Overlay Networks EECS 122: Lecture 18 Department of Electrical Engineering and Computer Sciences University of California Berkeley.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
CS 268: Overlay Networks: Distributed Hash Tables Kevin Lai May 1, 2001.
1 EE 122: Overlay Networks and p2p Networks Ion Stoica TAs: Junda Liu, DK Moon, David Zats (Materials with thanks.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
EE 122: A Note On Joining Operation in Chord Ion Stoica October 20, 2002.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Effizientes Routing in P2P Netzwerken Chord: A Scalable Peer-to- peer Lookup Protocol for Internet Applications Dennis Schade.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
CS162 Operating Systems and Systems Programming Lecture 23 Peer-to-Peer Systems April 18, 2011 Anthony D. Joseph and Ion Stoica
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
15-744: Computer Networking L-22: P2P. Lecture 22: Peer-to-Peer Networks Typically each member stores/provides access to content Has quickly.
Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
15-744: Computer Networking L-22: P2P. L -22; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
15-744: Computer Networking L-23: P2P. L -23; © Srinivasan Seshan, P2P Peer-to-peer networks Assigned reading [Cla00] Freenet: A Distributed.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS162 Operating Systems and Systems Programming Lecture 14 Key Value Storage Systems March 12, 2012 Anthony D. Joseph and Ion Stoica
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS162 Operating Systems and Systems Programming Lecture 25 P2P Systems and Review November 28, 2012 Ion Stoica
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
CS 268: Lecture 22 (Peer-to-Peer Networks)
Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
Scott Shenker and Ion Stoica April 11, 2005
EE 122: Peer-to-Peer (P2P) Networks
CS 268: Peer-to-Peer Networks and Distributed Hash Tables
DHT Routing Geometries and Chord
A Scalable content-addressable network
CS 162: P2P Networks Computer Science Division
P2P Systems and Distributed Hash Tables
An Overview of Peer-to-Peer
Peer-to-Peer Networks and Distributed Hash Tables
Consistent Hashing and Distributed Hash Table
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA

2 How Did it Start?  A killer application: Naptser -Free music over the Internet  Key idea: share the content, storage and bandwidth of individual (home) users Internet

3 Model  Each user stores a subset of files  Each user has access (can download) files from all users in the system

4 Main Challenge  Find where a particular file is stored A B C D E F E?

5 Other Challenges  Scale: up to hundred of thousands or millions of machines  Dynamicity: machines can come and go any time

6 Napster  Assume a centralized index system that maps files (songs) to machines that are alive  How to find a file (song) -Query the index system  return a machine that stores the required file Ideally this is the closest/least-loaded machine -ftp the file  Advantages: -Simplicity, easy to implement sophisticated search engines on top of the index system  Disadvantages: -Robustness, scalability (?)

7 Napster: Example A B C D E F m1 m2 m3 m4 m5 m6 m1 A m2 B m3 C m4 D m5 E m6 F E? m5 E? E

8 Gnutella  Distribute file location  Idea: flood the request  Hot to find a file: -Send request to all neighbors -Neighbors recursively multicast the request -Eventually a machine that has the file receives the request, and it sends back the answer  Advantages: -Totally decentralized, highly robust  Disadvantages: -Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL)

9 Gnutella: Example  Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… A B C D E F m1 m2 m3 m4 m5 m6 E? E

10 Distributed Hash Tables (DHTs)  Abstraction: a distributed hash-table data structure -insert(id, item); -item = query(id); (or lookup(id);) -Note: item can be anything: a data object, document, file, pointer to a file…  Proposals -CAN, Chord, Kademlia, Pastry, Tapestry, etc

11 DHT Design Goals  Make sure that an item (file) identified is always found  Scales to hundreds of thousands of nodes  Handles rapid arrival and failure of nodes

12 Content Addressable Network (CAN)  Associate to each node and item a unique id in an d- dimensional Cartesian space on a d-torus  Properties -Routing table size O(d) -Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes

13 CAN Example: Two Dimensional Space  Space divided between nodes  All nodes cover the entire space  Each node covers either a square or a rectangular area of ratios 1:2 or 2:1  Example: -Node n1:(1, 2) first node that joins  cover the entire space n1

14 CAN Example: Two Dimensional Space  Node n2:(4, 2) joins  space is divided between n1 and n n1 n2

15 CAN Example: Two Dimensional Space  Node n2:(4, 2) joins  space is divided between n1 and n n1 n2 n3

16 CAN Example: Two Dimensional Space  Nodes n4:(5, 5) and n5:(6,6) join n1 n2 n3 n4 n5

17 CAN Example: Two Dimensional Space  Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6)  Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); n1 n2 n3 n4 n5 f1 f2 f3 f4

18 CAN Example: Two Dimensional Space  Each item is stored by the node who owns its mapping in the space n1 n2 n3 n4 n5 f1 f2 f3 f4

19 CAN: Query Example  Each node knows its neighbors in the d-space  Forward query to the neighbor that is closest to the query id  Example: assume n1 queries f4  Can route around some failures n1 n2 n3 n4 n5 f1 f2 f3 f4

20 CAN: Node Joining I new node 1) Discover some node “I” already in CAN

21 CAN: Node Joining 2) Pick random point in space I (x,y) new node

22 CAN: Node Joining (x,y) 3) I routes to (x,y), discovers node J I J new node

23 CAN: Node Joining new J 4) split J’s zone in half… new node owns one half

24 Node departure  Node explicitly hands over its zone and the associated (key,value) database to one of its neighbors  Incase of network failure this is handled by a take-over algorithm  Problem : take over mechanism does not provide regeneration of data  Solution: every node has a backup of its neighbours

25 Chord  Associate to each node and item a unique id in an uni- dimensional space 0..2 m -1  Goals -Scales to hundreds of thousands of nodes -Handles rapid arrival and failure of nodes  Properties -Routing table size O(log(N)), where N is the total number of nodes -Guarantees that a file is found in O(log(N)) steps

26 Identifier to Node Mapping Example  Node 8 maps [5,8]  Node 15 maps [9,15]  Node 20 maps [16, 20]  …  Node 4 maps [59, 4]  Each node maintains a pointer to its successor

27 Lookup  Each node maintains its successor  Route packet (ID, data) to the node responsible for ID using successor pointers lookup(37) node=44

28 Joining Operation  Each node A periodically sends a stabilize() message to its successor B  Upon receiving a stabilize() message node B -returns its predecessor B’=pred(B) to A by sending a notify(B’) message  Upon receiving notify(B’) from B, -if B’ is between A and B, A updates its successor to B’ -A doesn’t do anything, otherwise

29 Joining Operation  Node with id=50 joins the ring  Node 50 needs to know at least one node already in the system -Assume known node is 15 succ=4 pred=44 succ=nil pred=nil succ=58 pred=35

30 Joining Operation  Node 50 asks node 15 to forward join message  When join(50) reaches the destination (i.e., node 58), node 58 1)updates its predecessor to 50, 2)returns a notify message to node 50  Node 50 updates its successor to 58 join(50) notify() pred=50 succ=58 succ=4 pred=44 succ=nil pred=nil succ=58 pred=35

31 Joining Operation (cont’d)  Node 44 sends a stabilize message to its successor, node 58  Node 58 reply with a notify message  Node 44 updates its successor to 50 succ=58 stabilize() notify(predecessor=50) succ=50 pred=50 succ=4 pred=nil succ=58 pred=35

32 Joining Operation (cont’d)  Node 44 sends a stabilize message to its new successor, node 50  Node 50 sets its predecessor to node 44 succ=58 succ=50 Stabilize() pred=44 pred=50 pred=35 succ=4 pred=nil

33 Joining Operation (cont’d)  This completes the joining operation! succ=58 succ=50 pred=44 pred=50

34 Achieving Efficiency: finger tables ( ) mod 2 7 = 16 0 Say m=7 ith entry at peer with id n is first peer with id >= i ft[i] Finger Table at

35 Achieving Robustness  To improve robustness each node maintains the k (> 1) immediate successors instead of only one successor  In the notify() message, node A can send its k-1 successors to its predecessor B  Upon receiving notify() message, B can update its successor list by concatenating the successor list received from A with A itself

36 CAN/Chord Optimizations  Reduce latency -Chose finger that reduces expected time to reach destination -Chose the closest node from range [N+2 i-1,N+2 i ) as successor  Accommodate heterogeneous systems -Multiple virtual nodes per physical node

37 Conclusions  Distributed Hash Tables are a key component of scalable and robust overlay networks  CAN: O(d) state, O(d*n1/d) distance  Chord: O(log n) state, O(log n) distance  Both can achieve stretch < 2  Simplicity is key  Services built on top of distributed hash tables -persistent storage (OpenDHT, Oceanstore) -p2p file storage, i3 (chord) -multicast (CAN, Tapestry)