CSE 486/586 Distributed Systems Distributed Hash Tables

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Xiaowei Yang CompSci 356: Computer Network Architectures Lecture 22: Overlay Networks Xiaowei Yang
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Distributed Lookup Systems
University of Oregon Slides from Gotz and Wehrle + Chord paper
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Overlay network concept Case study: Distributed Hash table (DHT) Case study: Distributed Hash table (DHT)
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Node Lookup in P2P Networks. Node lookup in p2p networks In a p2p network, each node may provide some kind of service for other nodes and also will ask.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
CSE 486/586 Distributed Systems Distributed Hash Tables
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
1 Distributed Hash tables. 2 Overview r Objective  A distributed lookup service  Data items are distributed among n parties  Anyone in the network.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
The Chord P2P Network Some slides taken from the original presentation by the authors.
Peer-to-Peer Information Systems Week 12: Naming
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
CSE 486/586 Distributed Systems Distributed Hash Tables
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Data Management
(slides by Nick Feamster)
Distributed Hash Tables
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
Prof. Leonardo Mostarda University of Camerino
P2P Systems and Distributed Hash Tables
EECS 498 Introduction to Distributed Systems Fall 2017
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
P2P: Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Information Systems Week 12: Naming
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
CSE 486/586 Distributed Systems Peer-to-Peer Architecture --- 1
Presentation transcript:

CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo

Last Time Evolution of peer-to-peer BitTorrent Central directory (Napster) Query flooding (Gnutella) Hierarchical overlay (Kazaa, modern Gnutella) BitTorrent Focuses on parallel download Prevents free-riding

Today’s Question How do we organize the nodes in a distributed system? Up to the 90’s Prevalent architecture: client-server (or master-slave) Unequal responsibilities Now Emerged architecture: peer-to-peer Equal responsibilities Today: studying peer-to-peer as a paradigm

What We Want Functionality: lookup-response P P P P P P P E.g., Gnutella P P P P P P P

What We Don’t Want Memory Lookup Latency #Messages for a lookup Cost (scalability) & no guarantee for lookup Napster: cost not balanced, too much for the server-side Gnutella: cost still not balanced, just too much, no guarantee for lookup Memory Lookup Latency #Messages for a lookup Napster O(1) (O(N)@server) Gnutella O(N) (worst case) Gnutella: cannot control how the network is formed

What We Want What data structure provides lookup-response? Hash table: data structure that associates keys with values Name-value pairs (or key-value pairs) E.g., “http://www.cnn.com/foo.html” and the Web page E.g., “BritneyHitMe.mp3” and “12.78.183.2” Table Index Values

Hashing Basics Hash function Main goals: E.g., mod Function that maps a large, possibly variable-sized datum into a small datum, often a single integer that serves to index an associative array In short: maps n-bit datum into k buckets (k << 2n) Provides time- & space-saving data structure for lookup Main goals: Low cost Deterministic Uniformity (load balanced) E.g., mod k buckets (k << 2n), data d (n-bit) b = d mod k Distributes load uniformly only when data is distributed uniformly

DHT: Goal Let’s build a distributed system with a hash table abstraction! P P P lookup(key) key value value P P P P

Where to Keep the Hash Table Server-side  Napster Client-local  Gnutella What are the requirements (think Napster and Gnutella)? Deterministic lookup Low lookup time (shouldn’t grow linearly with the system size) Should balance load even with node join/leave What we’ll do: partition the hash table and distribute them among the nodes in the system We need to choose the right hash function We also need to somehow partition the table and distribute the partitions with minimal relocation of partitions in the presence of join/leave

Where to Keep the Hash Table Consider problem of data partition: Given document X, choose one of k servers to use Two-level mapping Hashing: Map one (or more) data item(s) to a hash value (the distribution should be balanced) Partitioning: Map a hash value to a server (each server load should be balanced even with node join/leave) Let’s look at a simple approach and think about pros and cons. Hashing with mod, and partitioning with buckets

Using Basic Hashing and Bucket Partitioning? Hashing: Suppose we use modulo hashing Number servers 1..k Partitioning: Place X on server i = (X mod k) Problem? Data may not be uniformly distributed Mod Table Index Values Server 0 Server 1 Server 15

Using Basic Hashing and Bucket Partitioning? Place X on server i = uniform_hash (X) mod k Problem? What happens if a server fails or joins (k  k±1)? Answer: (Almost) all entries get remapped to new nodes! Hash + Mod Table Index Values Server 0 Server 1 Server 15

CSE 486/586 Administrivia PA2-B due on Friday next week, 3/15 (In class) Midterm on Wednesday (3/13)

Chord DHT A distributed hash table system using consistent hashing Organizes nodes in a ring Maintains neighbors for correctness and shortcuts for performance DHT in general DHT systems are “structured” peer-to-peer as opposed to “unstructured” peer-to-peer such as Napster, Gnutella, etc. Used as a base system for other systems, e.g., many “trackerless” BitTorrent clients, Amazon Dynamo, distributed repositories, distributed file systems, etc. It shows an example of principled design.

Chord Ring: Global Hash Table Represent the hash key space as a virtual ring A ring representation instead of a table representation. Use a hash function that evenly distributes items over the hash space, e.g., SHA-1 Map nodes (buckets) in the same ring Used in DHTs, memcached, etc. 2128-1 1 Id space represented as a ring. Hash(name)  object_id Hash(IP_address)  node_id

Chord: Consistent Hashing Partitioning: Maps data items to its “successor” node Advantages Even distribution Few changes as nodes come and go… Hash(name)  object_id Hash(IP_address)  node_id

Chord: When nodes come and go… Small changes when nodes come and go Only affects mapping of keys mapped to the node that comes or goes Hash(name)  object_id Hash(IP_address)  node_id

Chord: Node Organization Maintain a circularly linked list around the ring Every node has a predecessor and successor Separate join and leave protocols pred node succ

Chord: Basic Lookup Route hop by hop via successors lookup (id): if ( id > pred.id && id <= my.id ) return my.id; else return succ.lookup(id); Route hop by hop via successors O(n) hops to find destination id Object ID node

Chord: Efficient Lookup --- Fingers ith entry at peer with id n is first peer with: id >= Finger Table at N80 N114 i ft[i] 0 96 1 96 2 96 3 96 4 96 5 114 6 20 80 + 25 80 + 26 N20 N96 80 + 24 80 + 23 80 + 22 80 + 21 80 + 20 N80

Finger Table Finding a <key, value> using fingers N20 N102 N86 86 + 24 N86 20 + 26

Chord: Efficient Lookup --- Fingers lookup (id): if ( id > pred.id && id <= my.id ) return my.id; else // fingers() by decreasing distance for finger in fingers(): if id >= finger.id return finger.lookup(id); return succ.lookup(id); Route greedily via distant “finger” nodes O(log n) hops to find destination id

Chord: Node Joins and Leaves When a node joins Node does a lookup on its own id And learns the node responsible for that id This node becomes the new node’s successor And the node can learn that node’s predecessor (which will become the new node’s predecessor) Monitor If doesn’t respond for some time, find new Leave Clean (planned) leave: notify the neighbors Unclean leave (failure): need an extra mechanism to handle lost (key, value) pairs, e.g., as Dynamo does.

Summary DHT Chord DHT Gives a hash table as an abstraction Partitions the hash table and distributes them over the nodes “Structured” peer-to-peer Chord DHT Based on consistent hashing Balances hash table partitions over the nodes Basic lookup based on successors Efficient lookup through fingers

Acknowledgements These slides contain material developed and copyrighted by Indranil Gupta (UIUC), Michael Freedman (Princeton), and Jennifer Rexford (Princeton).