Peer to Peer and Distributed Hash Tables

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Peer-to-Peer Distributed Search. Peer-to-Peer Networks A pure peer-to-peer network is a collection of nodes or peers that: 1.Are autonomous: participants.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
Distributed Lookup Systems
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Structure Overlay Networks and Chord Presentation by Todd Gardner Figures from: Ion Stoica, Robert Morris, David Liben- Nowell, David R. Karger, M. Frans.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Lecture 2 Distributed Hash Table
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Indranil Gupta (Indy) September 27, 2012 Lecture 10 Peer-to-peer Systems II Reading: Chord paper on website (Sec 1-4, 6-7)  2012, I. Gupta Computer Science.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
The Chord P2P Network Some slides taken from the original presentation by the authors.
CS 268: Lecture 22 (Peer-to-Peer Networks)
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
A Scalable Peer-to-peer Lookup Service for Internet Applications
(slides by Nick Feamster)
EE 122: Peer-to-Peer (P2P) Networks
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
P2P: Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Peer to Peer and Distributed Hash Tables CS 271

Distributed Hash Tables Challenge: To design and implement a robust and scalable distributed system composed of inexpensive, individually unreliable computers in unrelated administrative domains Partial thanks Idit Keidar) CS 271

Searching for distributed data Goal: Make billions of objects available to millions of concurrent users e.g., music files Need a distributed data structure to keep track of objects on different sires. map object to locations Basic Operations: Insert(key) Lookup(key) CS 271

Searching N2 N1 N3 ? N4 N6 N5 Key=“title” Value=MP3 data… Internet Client Publisher 1000s of nodes. Set of nodes may change… Lookup(“title”) N4 N6 N5 CS 271

Simple Solution First There was Napster Centralized server/database for lookup Only file-sharing is peer-to-peer, lookup is not Launched in 1999, peaked at 1.5 million simultaneous users, and shut down in July 2001. Pros: low overhead, no duplicates, server knows all nodes that have a certain file, 100% success Cons: single-point-of-failure; load on central DB; requires heavy infrastructure; if central DB is partitioned, success rate is no longer 100% CS 271

Napster: Publish insert(X, 123.2.21.23) ... Publish I have X, Y, and Z! 123.2.21.23 CS 271

Napster: Search 123.2.0.18 Fetch search(A) --> 123.2.0.18 Query Reply Where is file A? CS 271

Overlay Networks A virtual structure imposed over the physical network (e.g., the Internet) A graph, with hosts as nodes, and some edges Overlay Network Keys Node ids A node does not need to know all other nodes. Hash fn Hash fn CS 271

Unstructured Approach: Gnutella Build a decentralized unstructured overlay Each node has several neighbors Holds several keys in its local database When asked to find a key X Check local database if X is known If yes, return, if not, ask your neighbors Use a limiting threshold for propagation. Flooding. In practice: limit TTL of flood CS 271

Gnutella: Search I have file A. Reply Query Where is file A? CS 271

Structured vs. Unstructured The examples we described are unstructured There is no systematic rule for how edges are chosen, each node “knows some” other nodes Any node can store any data so a searched data might reside at any node Structured overlay: The edges are chosen according to some rule Data is stored at a pre-defined place Tables define next-hop for lookup CS 271

Hashing Data structure supporting the operations: void insert( key, item ) item search( key ) Implementation uses hash function for mapping keys to array cells Expected search time O(1) provided that there are few collisions CS 271

Distributed Hash Tables (DHTs) Nodes store table entries lookup( key ) returns the location of the node currently responsible for this key We will mainly discuss Chord, Stoica, Morris, Karger, Kaashoek, and Balakrishnan SIGCOMM 2001 Other examples: CAN (Berkeley), Tapestry (Berkeley), Pastry (Microsoft Cambridge), etc. Same spec, not distributed CS 271

For d dimensions, routing takes O(dn1/d) hops CAN [Ratnasamy, et al] Map nodes and keys to coordinates in a multi-dimensional cartesian space Zone source key Routing through shortest Euclidean path For d dimensions, routing takes O(dn1/d) hops

Chord Logical Structure (MIT) m-bit ID space (2m IDs), usually m=160. Nodes organized in a logical ring according to their IDs. N1 N56 N51 N8 N10 N48 N14 N42 N21 N38 N30

DHT: Consistent Hashing Key 5 K5 Node 105 N105 K20 Circular ID space N32 N90 K80 A key is stored at its successor: node with next higher ID CS 271 Thanks CMU for animation

Consistent Hashing Guarantees For any set of N nodes and K keys: A node is responsible for at most (1 + )K/N keys When an (N + 1)st node joins or leaves, responsibility for O(K/N) keys changes hands CS 271

DHT: Chord Basic Lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” Just need to make progress, and not overshoot. Will talk about initialization later. And robustness. Now, how about speed? K80 N90 N60 Each node knows only its successor Routing around the circle, one node at a time. CS 271

DHT: Chord “Finger Table” 1/4 1/2 1/8 1/16 1/32 1/64 1/128 N80 Small tables, but multi-hop lookup. Table entries: IP address and Chord ID. Navigate in ID space, route queries closer to successor. Log(n) tables, log(n) hops. Route to a document between ¼ and ½ … Entry i in the finger table of node n is the first node that succeeds or equals n + 2i In other words, the ith finger points 1/2n-i way around the ring CS 271

DHT: Chord Join Assume an identifier space [0..8] Node n1 joins 1 7 6 Succ. Table i id+2i succ 0 2 1 1 3 1 2 5 1 1 7 6 2 5 3 4 CS 271

DHT: Chord Join Node n2 joins 1 7 6 2 5 3 4 Succ. Table i id+2i succ i id+2i succ 0 2 2 1 3 1 2 5 1 1 7 6 2 Succ. Table i id+2i succ 0 3 1 1 4 1 2 6 1 5 3 4 CS 271

DHT: Chord Join Nodes n0, n6 join 1 7 6 2 5 3 4 Succ. Table i id+2i succ 0 1 1 1 2 2 2 4 0 Nodes n0, n6 join Succ. Table i id+2i succ 0 2 2 1 3 6 2 5 6 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 CS 271

DHT: Chord Join Nodes: n1, n2, n0, n6 Items: f7, f1 1 7 6 2 5 3 4 Succ. Table Items i id+2i succ 0 1 1 1 2 2 2 4 0 7 Nodes: n1, n2, n0, n6 Items: f7, f1 Succ. Table Items 1 7 i id+2i succ 0 2 2 1 3 6 2 5 6 1 Succ. Table 6 2 i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 CS 271

DHT: Chord Routing Upon receiving a query for item id, a node: Checks whether stores the item locally? If not, forwards the query to the largest node in its successor table that does not exceed id Succ. Table Items i id+2i succ 0 1 1 1 2 2 2 4 0 7 Succ. Table Items 1 7 i id+2i succ 0 2 2 1 3 6 2 5 6 1 query(7) Succ. Table 6 2 i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 CS 271

Chord Data Structures Finger table First finger is successor Predecessor What if each node knows all other nodes O(1) routing Expensive updates CS 271

Routing Time Node n looks up a key stored at node p p is in n’s ith interval: p  ((n+2i-1)mod 2m, (n+2i)mod 2m] n contacts f=finger[i] The interval is not empty so: f  ((n+2i-1)mod 2m, (n+2i)mod 2m] f is at least 2i-1 away from n p is at most 2i-1 away from f The distance is halved at each hop. n n+2i-1 f finger[i] p n+2i m steps worst case; O(log N) whp under load balance assumptions.

Routing Time Assuming uniform node distribution around the circle, the number of nodes in the search space is halved at each step: Expected number of steps: log N Note that: m = 160 For 1,000,000 nodes, log N = 20 CS 271

P2P Lessons Decentralized architecture. Avoid centralization Flooding can work. Logical overlay structures provide strong performance guarantees. Churn a problem. Useful in many distributed contexts. CS 271