Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.

Slides:



Advertisements
Similar presentations
Distributed Hash Tables: An Overview
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Distributed Hash Tables: Chord Brad Karp (with many slides contributed by Robert Morris) UCL Computer Science CS M038 / GZ06 27 th January, 2009.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
What is a P2P system? A distributed system architecture: No centralized control Nodes are symmetric in function Large number of unreliable nodes Enabled.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Distributed Lookup Systems
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications 吳俊興 國立高雄大學 資訊工程學系 Spring 2006 EEF582 – Internet Applications and Services 網路應用與服務.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Overlay network concept Case study: Distributed Hash table (DHT) Case study: Distributed Hash table (DHT)
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
OVERVIEW Lecture 6 Overlay Networks. 2 Focus at the application level.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
Nick McKeown CS244 Lecture 17 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications [Stoica et al 2001]
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
CSE 486/586 Distributed Systems Distributed Hash Tables
Distributed Hash Tables (DHT) Jukka K. Nurminen *Adapted from slides provided by Stefan Götz and Klaus Wehrle (University of Tübingen)
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CSE 486/586 Distributed Systems Distributed Hash Tables
Distributed Hash Tables
Peer-to-Peer Data Management
(slides by Nick Feamster)
COS 461: Computer Networks
Distributed Hash Tables
P2P Systems and Distributed Hash Tables
COS 461: Computer Networks
Consistent Hashing and Distributed Hash Table
CSE 486/586 Distributed Systems Distributed Hash Tables
Presentation transcript:

Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris

P2P Search  How do we find an item in a P2P network?  Unstructured: Query-flooding Not guaranteed Designed for common case: popular items  Structured DHTs – a kind of indexing Guarantees to find an item Designed for both popular and rare items

The lookup problem Internet N1N1 N2N2 N3N3 N6N6 N5N5 N4N4 Publisher Put (Key=“title” Value=file data…) Client Get(key=“title”) ? At the heart of all DHTs

Centralized lookup (Napster) Client Lookup(“title”) N6N6 N9N9 N7N7 DB N8N8 N3N3 N2N2 N1N1 SetLoc(“title”, N4) Simple, but O(N) state and a single point of failure Key=“title” Value=file data… N4N4

Flooded queries (Gnutella) N4N4 Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Robust, but worst case O(N) messages per lookup Key=“title” Value=file data… Lookup(“title”)

Routed queries (Freenet, Chord, etc.) N4N4 Publisher Client N6N6 N9N9 N7N7 N8N8 N3N3 N2N2 N1N1 Lookup(“title”) Key=“title” Value=file data…

Hash Table  Name-value pairs (or key-value pairs)  E.g,. “Mehmet Hadi Gunes” and  E.g., “ and the Web page  E.g., “HitSong.mp3” and “ ”  Hash table  Data structure that associates keys with values 7 lookup(key) valuekey value

Distributed Hash Table  Hash table spread over many nodes o Distributed over a wide area  Main design goals o Decentralization no central coordinator o Scalability efficient even with large # of nodes o Fault tolerance tolerate nodes joining/leaving

Distributed hash table (DHT) Distributed hash table Distributed application get (key) data node …. put(key, data) Lookup service lookup(key)node IP address Application may be distributed over many nodes DHT distributes data storage over many nodes

Distributed Hash Table  Two key design decisions  How do we map names on to nodes?  How do we route a request to that node?

Hash Functions  Hashing  Transform the key into a number  And use the number to index an array  Example hash function  Hash(x) = x mod 101, mapping to 0, 1, …, 100  Challenges  What if there are more than 101 nodes? Fewer?  Which nodes correspond to each hash value?  What if nodes come and go over time?

Consistent Hashing  “view” = subset of hash buckets that are visible  For this conversation, “view” is O(n) neighbors  But don’t need strong consistency on views  Desired features  Balanced: in any one view, load is equal across buckets  Smoothness: little impact on hash bucket contents when buckets are added/removed  Spread: small set of hash buckets that may hold an object regardless of views  Load: across views, # objects assigned to hash bucket is small

Fundamental Design Idea - I  Consistent Hashing  Map keys and nodes to an identifier space; implicit assignment of responsibility Identifiers A CDB Key Mapping performed using hash functions (e.g., SHA-1)  Spread nodes and keys uniformly throughout

Fundamental Design Idea - II  Prefix / Hypercube routing Source Destination Zoom In

Definition of a DHT  Hash table  supports two operations  insert(key, value)  value = lookup(key)  Distributed  Map hash-buckets to nodes  Requirements  Uniform distribution of buckets  Cost of insert and lookup should scale well  Amount of local state (routing table size) should scale well

Chord  Map nodes and keys to identifiers  Using randomizing hash functions  Arrange them on a circle Identifier Circle x succ(x) pred(x)

Consistent Hashing Bucket 14 Construction – Assign each of C hash buckets to random points on mod 2 n circle; hash key size = n – Map object to random position on circle – Hash of object = closest clockwise bucket Desired features – Balanced: No bucket responsible for large number of objects – Smoothness: Addition of bucket does not cause major movement among existing buckets – Spread and load: Small set of buckets that lie near object Similar to that later used in P2P Distributed Hash Tables (DHTs) In DHTs, each node only has partial view of neighbors

Consistent Hashing  Large, sparse identifier space (e.g., 128 bits)  Hash a set of keys x uniformly to large id space  Hash nodes to the id space as well 01 Hash(name)  object_id Hash(IP_address)  node_id Id space represented as a ring

Where to Store (Key, Value) Pair?  Mapping keys in a load-balanced way  Store the key at one or more nodes  Nodes with identifiers “close” to the key where distance is measured in the id space  Advantages  Even distribution  Few changes as nodes come and go… Hash(name)  object_id Hash(IP_address)  node_id

Hash a Key to Successor N32 N10 N100 N80 N60 Circular ID Space Successor: node with next highest ID K33, K40, K52 K11, K30 K5, K10 K65, K70 K100 Key ID Node ID

Joins and Leaves of Nodes  Maintain a circularly linked list around the ring  Every node has a predecessor and successor node pred succ

Successor Lists Ensure Robust Lookup N32 N10 N5 N20 N110 N99 N80 N60 Each node remembers r successors Lookup can skip over dead nodes to find blocks N40 10, 20, 32 20, 32, 40 32, 40, 60 40, 60, 80 60, 80, 99 80, 99, , 110, 5 110, 5, 10 5, 10, 20

Joins and Leaves of Nodes  When an existing node leaves  Node copies its pairs to its predecessor  Predecessor points to node’s successor in the ring  When a node joins  Node does a lookup on its own id  And learns the node responsible for that id  This node becomes the new node’s successor  And the node can learn that node’s predecessor which will become the new node’s predecessor

Nodes Coming and Going  Small changes when nodes come and go  Only affects mapping of keys mapped to the node that comes or goes Hash(name)  object_id Hash(IP_address)  node_id

How to Find the Nearest Node?  Need to find the closest node  To determine who should store (key, value) pair  To direct a future lookup(key) query to the node  Strawman solution: walk through linked list  Circular linked list of nodes in the ring  O(n) lookup time when n nodes in the ring  Alternative solution:  Jump further around ring  “Finger” table of additional overlay links

Links in the Overlay Topology  Trade-off between # of hops vs. # of neighbors  E.g., log(n) for both, where n is the number of nodes  E.g., such as overlay links 1/2, 1/4 1/8, … around the ring  Each hop traverses at least half of the remaining distance 1/2 1/4 1/8

Chord “Finger Table” Accelerates Lookups N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

Chord lookups take O(log N) hops N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

Chord Self-organization  Node join  Set up finger i: route to succ(n + 2 i )  log(n) fingers ) O(log 2 n) cost  Node leave  Maintain successor list for ring connectivity  Update successor list and finger pointers

Chord lookup algorithm properties  Interface: lookup(key)  IP address  Efficient: O(log N) messages per lookup  N is the total number of servers  Scalable: O(log N) state per node  Robust: survives massive failures  Simple to analyze

Plaxton Trees  Motivation  Access nearby copies of replicated objects  Time-space trade-off Space = Routing table size Time = Access hops

Plaxton Trees Algorithm 9AE4247B 1. Assign labels to objects and nodes Each label is of log 2 b n digits ObjectNode - using randomizing hash functions

Plaxton Trees Algorithm 247B 2. Each node knows about other nodes with varying prefix matches Node 247B 247B 247B 247B A C Prefix match of length 0 Prefix match of length 1 Prefix match of length 2 Prefix match of length 3

Plaxton Trees Object Insertion and Lookup Given an object, route successively towards nodes with greater prefix matches 247B Node 9AE29A769F109AE4 Object Store the object at each of these locations log(n) steps to insert or locate object

Plaxton Trees Why is it a tree? 247B9F109A769AE2 Object

Plaxton Trees Network Proximity  Overlay tree hops could be totally unrelated to the underlying network hops USA Europe East Asia Plaxton trees guarantee constant factor approximation!  Only when the topology is uniform in some sense

Pastry  Based directly upon Plaxton Trees  Exports a DHT interface  Stores an object only at a node whose ID is closest to the object ID  In addition to main routing table  Maintains leaf set of nodes  Closest L nodes (in ID space) L = 2 (b + 1),typically -- one digit to left and right

CAN [Ratnasamy, et al]  Map nodes and keys to coordinates in a multi- dimensional cartesian space source key Routing through shortest Euclidean path For d dimensions, routing takes O(dn 1/d ) hops Zone

Symphony [Manku, et al]  Similar to Chord – mapping of nodes, keys  ‘k’ links are constructed probabilistically! x This link chosen with probability P(x) = 1/(x ln n) Expected routing guarantee: O(1/k (log 2 n)) hops

SkipNet [Harvey, et al]  Previous designs distribute data uniformly throughout the system  Good for load balancing  But, my data can be stored in Timbuktu!  Many organizations want stricter control over data placement  What about the routing path? Should a Microsoft  Microsoft end-to-end path pass through Sun?

SkipNet Content and Path Locality Basic Idea: Probabilistic skip lists Height Nodes  Each node choose a height at random  Choose height ‘h’ with probability 1/2 h

SkipNet Content and Path Locality Height Nodes machine1.cmu.edumachine2.cmu.edu machine1.berkeley.edu Nodes are lexicographically sorted Still O(log n) routing guarantee!

Comparison of Different DHTs # Links per node Routing hops Pastry/Tapestry O(2 b log 2 b n)O(log 2 b n) Chord log nO(log n) CAN ddn 1/d SkipNet O(log n) Symphony kO((1/k) log 2 n) Koorde dlog d n Viceroy 7O(log n)

What can DHTs do for us?  Distributed object lookup  Based on object ID  De-centralized file systems  CFS, PAST, Ivy  Application Layer Multicast  Scribe, Bayeux, Splitstream  Databases  PIER

Where are we now?  Many DHTs offering efficient and relatively robust routing  Unanswered questions  Node heterogeneity  Network-efficient overlays vs. Structured overlays Conflict of interest!  What happens with high user churn rate?  Security

Are DHTs a panacea?  Useful primitive  Tension between network efficient construction and uniform key-value distribution  Does every non-distributed application use only hash tables?  Many rich data structures which cannot be built on top of hash tables alone  Exact match lookups are not enough  Does any P2P file-sharing system use a DHT?