Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
What is a P2P system? A distributed system architecture: No centralized control Nodes are symmetric in function Large number of unreliable nodes Enabled.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Chord and CFS Philip Skov Knudsen Niels Teglsbo Jensen Mads Lundemann
Distributed Lookup Systems
University of Oregon Slides from Gotz and Wehrle + Chord paper
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
A Backup System built from a Peer-to-Peer Distributed Hash Table Russ Cox joint work with Josh Cates, Frank Dabek, Frans Kaashoek, Robert Morris,
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Chord+DHash+Ivy: Building Principled Peer-to-Peer Systems Robert Morris Joint work with F. Kaashoek, D. Karger, I. Stoica, H. Balakrishnan,
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Dr. Yingwu Zhu.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
Peer-to-peer computing research: a fad? Frans Kaashoek Joint work with: H. Balakrishnan, P. Druschel, J. Hellerstein, D. Karger, R.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
CSE 486/586 Distributed Systems Distributed Hash Tables
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
The Chord P2P Network Some slides taken from the original presentation by the authors.
Nick McKeown CS244: Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001] Thanks to Ion Stoica.
Peer-to-Peer Systems and Distributed Hash Tables
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
(slides by Nick Feamster)
Peer-to-Peer Systems and Distributed Hash Tables
Peer-to-Peer (P2P) File Systems
Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service
Peer-to-Peer Storage Systems
Chord and CFS Philip Skov Knudsen
Consistent Hashing and Distributed Hash Table
#02 Peer to Peer Networking
Presentation transcript:

Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009

2 Today: DHTs, P2P Distributed Hash Tables: a building block Applications built atop them Your task: “Why DHTs?” –vs. centralized servers? –vs. non-DHT P2P systems?

3 What Is a P2P System? A distributed system architecture: –No centralized control –Nodes are symmetric in function Large number of unreliable nodes Enabled by technology improvements Node Internet

4 The Promise of P2P Computing High capacity through parallelism: –Many disks –Many network connections –Many CPUs Reliability: –Many replicas –Geographic distribution Automatic configuration Useful in public and proprietary settings

5 What Is a DHT? Single-node hash table: key = Hash(name) put(key, value) get(key) -> value –Service: O(1) storage How do I do this across millions of hosts on the Internet? –Distributed Hash Table

6 What Is a DHT? (and why?) Distributed Hash Table: key = Hash(data) lookup(key) -> IP address(Chord) send-RPC(IP address, PUT, key, value) send-RPC(IP address, GET, key) -> value Possibly a first step towards truly large-scale distributed systems –a tuple in a global database engine –a data block in a global file system –rare.mp3 in a P2P file-sharing system

7 DHT Factoring Distributed hash table Distributed application get (key) data node …. put(key, data) Lookup service lookup(key)node IP address Application may be distributed over many nodes DHT distributes data storage over many nodes (DHash) (Chord)

8 Why the put()/get() interface? API supports a wide range of applications –DHT imposes no structure/meaning on keys Key/value pairs are persistent and global –Can store keys in other DHT values –And thus build complex data structures

9 Why Might DHT Design Be Hard? Decentralized: no central authority Scalable: low network traffic overhead Efficient: find items quickly (latency) Dynamic: nodes fail, new nodes join General-purpose: flexible naming

10 The Lookup Problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Put (Key=“title” Value=file data…) Client Get(key=“title”) ? At the heart of all DHTs

11 Motivation: Centralized Lookup (Napster) Client Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O( N ) state and a single point of failure Key=“title” Value=file data… N4N4

12 Motivation: Flooded Queries (Gnutella) N4N4 Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Robust, but worst case O( N ) messages per lookup Key=“title” Value=file data… Lookup(“title”)

13 Motivation: FreeDB, Routed DHT Queries (Chord, &c.) N4N4 Publisher Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(H(audio data)) Key=H(audio data) Value={artist, album title, track title}

14 DHT Applications They’re not just for stealing music anymore… –global file systems [OceanStore, CFS, PAST, Pastiche, UsenetDHT] –naming services [Chord-DNS, Twine, SFR] –DB query processing [PIER, Wisc] –Internet-scale data structures [PHT, Cone, SkipGraphs] –communication services [ i 3, MCAN, Bayeux] –event notification [Scribe, Herald] –File sharing [OverNet]

15 Chord Lookup Algorithm Properties Interface: lookup(key)  IP address Efficient: O(log N) messages per lookup –N is the total number of servers Scalable: O(log N) state per node Robust: survives massive failures Simple to analyze

16 Chord IDs Key identifier = SHA-1(key) Node identifier = SHA-1(IP address) SHA-1 distributes both uniformly How to map key IDs to node IDs?

17 Consistent Hashing [Karger 97] A key is stored at its successor: node with next higher ID K80 N32 N90 N105 K20 K5 Circular 7-bit ID space Key 5 Node 105

18 Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

19 Simple lookup algorithm Lookup(my-id, key-id) n = my successor if my-id < n < key-id call Lookup(key-id) on node n // next hop else return my successor // done Correctness depends only on successors

20 “Finger Table” Allows log(N)- time Lookups N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

21 Finger i Points to Successor of n+2 i N80 ½ ¼ 1/8 1/16 1/32 1/64 1/ N120

22 Lookup with Fingers Lookup(my-id, key-id) look in local finger table for highest node n s.t. my-id < n < key-id if n exists call Lookup(key-id) on node n // next hop else return my successor // done

23 Lookups Take O( log(N) ) Hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

24 Joining: Linked List Insert N36 N40 N25 1. Lookup(36) K30 K38

25 Join (2) N36 N40 N25 2. N36 sets its own successor pointer K30 K38

26 Join (3) N36 N40 N25 3. Copy keys from N40 to N36 K30 K38 K30

27 Join (4) N36 N40 N25 4. Set N25’s successor pointer Predecessor pointer allows link to new host Update finger pointers in the background Correct successors produce correct lookups K30 K38 K30

28 Failures Might Cause Incorrect Lookup N120 N113 N102 N80 N85 N80 doesn’t know correct successor, so incorrect lookup N10 Lookup(90)

29 Solution: Successor Lists Each node knows r immediate successors After failure, will know first live successor Correct successors guarantee correct lookups Guarantee is with some probability

30 Choosing Successor List Length Assume 1/2 of nodes fail P(successor list all dead) = (1/2) r –i.e., P(this node breaks the Chord ring) –Depends on independent failure P(no broken nodes) = (1 – (1/2) r ) N –r = 2log(N) makes prob. = 1 – 1/N

31 Lookup with Fault Tolerance Lookup(my-id, key-id) look in local finger table and successor-list for highest node n s.t. my-id < n < key-id if n exists call Lookup(key-id) on node n // next hop if call failed, remove n from finger table return Lookup(my-id, key-id) else return my successor // done

32 Experimental Overview Quick lookup in large systems Low variation in lookup costs Robust despite massive failure Experiments confirm theoretical results

33 Chord Lookup Cost Is O(log N) Number of Nodes Average Messages per Lookup Constant is 1/2

34 Failure Experimental Setup Start 1,000 CFS/Chord servers –Successor list has 20 entries Wait until they stabilize Insert 1,000 key/value pairs –Five replicas of each Stop X% of the servers Immediately perform 1,000 lookups

35 DHash Replicates Blocks at r Successors N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N68 Replicas are easy to find if successor fails Hashed node IDs ensure independent failure

36 Massive Failures Have Little Impact Failed Lookups (Percent) Failed Nodes (Percent) (1/2) 6 is 1.6%

37 DHash Properties Builds key/value storage on Chord Replicates blocks for availability –What happens when DHT partitions, then heals? Which (k, v) pairs do I need? Caches blocks for load balance Authenticates block contents

38 DHash Data Authentication Two types of DHash blocks: –Content-hash: key = SHA-1(data) –Public-key: key is a public key, data are signed by that key DHash servers verify before accepting Clients verify result of get(key) Disadvantages?