Distributed Hash Tables

Slides:

Advertisements

Similar presentations

Distributed Hash Tables: An Overview

Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

Peer to Peer and Distributed Hash Tables

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.

Scalable Content-Addressable Network Lintao Liu

P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman

CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.

Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.

Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.

Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.

Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.

Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.

1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.

Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.

Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.

Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.

Distributed Lookup Systems

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.

Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.

Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.

1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.

Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.

Wide-area cooperative storage with CFS

File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.

Lecture 10 Naming services for flat namespaces. EECE 411: Design of Distributed Software Applications Logistics / reminders Project Send Samer and me.

DHTs and Peer-to-Peer Systems Supplemental Slides Aditya Akella 03/21/2007.

Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.

Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.

Overlay network concept Case study: Distributed Hash table (DHT) Case study: Distributed Hash table (DHT)

Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.

1 Slides from Richard Yang with minor modification Peer-to-Peer Systems: DHT and Swarming.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.

1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.

Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.

CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.

LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.

Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.

P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.

CSE 486/586 Distributed Systems Distributed Hash Tables

Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,

Peer-to-Peer Information Systems Week 12: Naming

Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M

CSE 486/586 Distributed Systems Distributed Hash Tables

Distributed Hash Tables

Peer-to-Peer Data Management

(slides by Nick Feamster)

COS 461: Computer Networks

Improving and Generalizing Chord

Accessing nearby copies of replicated objects

EE 122: Peer-to-Peer (P2P) Networks

DHT Routing Geometries and Chord

A Scalable content-addressable network

Building Peer-to-Peer Systems with Chord, a Distributed Lookup Service

P2P Systems and Distributed Hash Tables

EECS 498 Introduction to Distributed Systems Fall 2017

Distributed Hash Tables

COS 461: Computer Networks

MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference

Consistent Hashing and Distributed Hash Table

P2P: Distributed Hash Tables

CSE 486/586 Distributed Systems Distributed Hash Tables

Peer-to-Peer Information Systems Week 12: Naming

#02 Peer to Peer Networking

Presentation transcript:

Distributed Hash Tables CPE 401 / 601 Computer Network Systems Distributed Hash Tables Modified from Ashwin Bharambe, Robert Morris, Krishna Gummadi and Jinyang Li

Peer to peer systems Symmetric in node functionalities Anybody can join and leave Everybody gives and takes Great potentials Huge aggregate network capacity, storage etc. Resilient to single point of failure and attack E.g. Gnapster, Gnutella, Bittorrent, Skype, Joost

The lookup problem N2 N1 N3 ? N4 N6 N5 Put (Key=“title” Value=file data…) Internet ? Client Publisher 1000s of nodes. Set of nodes may change… Get(key=“title”) N4 N6 N5 At the heart of all DHTs

Centralized lookup (Napster) SetLoc(“title”, N4) N3 Client DB N4 Publisher@ Lookup(“title”) Key=“title” Value=file data… N8 N9 N7 N6 O(N) state means its hard to keep the state up to date. Simple, but O(N) state and a single point of failure Not scalable Central service needs $$$

Improved #1: Hierarchical lookup Performs hierarchical lookup (like DNS) More scalable than a central site Root of the hierarchy is still a single point of vulnerability

Flooded queries (Gnutella) Lookup(“title”) N3 Client Publisher@ N4 Key=“title” Value=file data… N6 N8 N7 N9 Too much overhead traffic Limited in scale No guarantee for results Robust, but worst case O(N) messages per lookup

Improved #2: super peers E.g. Kazaa, Skype Lookup only floods super peers Still no guarantees for lookup results

DHT approach: routing Structured: Symmetric: Many protocols: put(key, value); get(key, value) maintain routing state for fast lookup Symmetric: all nodes have identical roles Many protocols: Chord, Kademlia, Pastry, Tapestry, CAN, Viceroy

Routed queries (Freenet, Chord, etc.) Client N4 Lookup(“title”) Publisher Key=“title” Value=file data… N6 N8 N7 Challenge: can we make it robust? Small state? Actually find stuff in a changing system? Consistent rendezvous point, between publisher and client. N9

Hash Table Name-value pairs (or key-value pairs) Hash table value E.g,. “Mehmet Hadi Gunes” and mgunes@unr.edu E.g., “http://cse.unr.edu/” and the Web page E.g., “HitSong.mp3” and “12.78.183.2” Hash table Data structure that associates keys with values lookup(key) key value value

Distributed Hash Table Hash table spread over many nodes Distributed over a wide area Main design goals Decentralization no central coordinator Scalability efficient even with large # of nodes Fault tolerance tolerate nodes joining/leaving

Distributed hash table (DHT) Distributed application put(key, data) get (key) data Distributed hash table lookup(key) node IP address Lookup service node …. Application may be distributed over many nodes DHT distributes data storage over many nodes

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V Neighboring nodes are “connected” at the application-level

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V Operation: take key as input; route messages to node holding key

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V insert(K1,V1) Operation: take key as input; route messages to node holding key

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V insert(K1,V1) Operation: take key as input; route messages to node holding key

DHT: basic idea (K1,V1) K V Operation: take key as input; route messages to node holding key

DHT: basic idea K V K V K V K V K V K V K V K V K V K V K V retrieve (K1) Operation: take key as input; route messages to node holding key

Distributed Hash Table Two key design decisions How do we map names on to nodes? How do we route a request to that node?

Hash Functions Hashing Example hash function Challenges Transform the key into a number And use the number to index an array Example hash function Hash(x) = x mod 101, mapping to 0, 1, …, 100 Challenges What if there are more than 101 nodes? Fewer? Which nodes correspond to each hash value? What if nodes come and go over time?

Consistent Hashing “view” = subset of hash buckets that are visible For this conversation, “view” is O(n) neighbors But don’t need strong consistency on views Desired features Balanced: in any one view, load is equal across buckets Smoothness: little impact on hash bucket contents when buckets are added/removed Spread: small set of hash buckets that may hold an object regardless of views Load: across views, # objects assigned to hash bucket is small

Fundamental Design Idea - I Consistent Hashing Map keys and nodes to an identifier space; implicit assignment of responsibility A B C D Identifiers Key 1111111111 0000000000 Mapping performed using hash functions (e.g., SHA-1) Spread nodes and keys uniformly throughout

Fundamental Design Idea - II Prefix / Hypercube routing Source Destination

Definition of a DHT Hash table  supports two operations Distributed insert(key, value) value = lookup(key) Distributed Map hash-buckets to nodes Requirements Uniform distribution of buckets Cost of insert and lookup should scale well Amount of local state (routing table size) should scale well

DHT ideas Scalable routing needs a notion of “closeness” Ring structure [Chord]

Different notions of “closeness” Tree-like structure, [Pastry, Kademlia, Tapestry] 01100101 01100111 01100101 01100110 01100100 Why different choices? Criteria for a good notion for closeness? 011011** 011001** 011010** 011000** Distance measured as the length of longest matching prefix with the lookup key

Different notions of “closeness” Cartesian space [CAN] (1,1) 0,0.5, 0.5,1 0.5,0.5, 1,1 0,0, 0.5,0.5 0.5,0.25,0.75,0.5 0.5,0,0.75,0.25 0.75,0, 1,0.5 Distance measured as geometric distance (0,0)

Chord Map nodes and keys to identifiers Arrange them on a circle Using randomizing hash functions Arrange them on a circle Identifier Circle succ(x) 010111110 x 010110110 pred(x) 010110000

Consistent Hashing Construction Desired features Construction Assign each of C hash buckets to random points on mod 2n circle; hash key size = n Map object to random position on circle Hash of object = closest clockwise bucket 14 Bucket 12 4 8 Desired features Balanced: No bucket responsible for large number of objects Smoothness: Addition of bucket does not cause major movement among existing buckets Spread and load: Small set of buckets that lie near object

Consistent Hashing Large, sparse identifier space (e.g., 128 bits) Hash a set of keys x uniformly to large id space Hash nodes to the id space as well 2128-1 1 Id space represented as a ring Hash(name)  object_id Hash(IP_address)  node_id

Where to Store (Key, Value) Pair? Mapping keys in a load-balanced way Store the key at one or more nodes Nodes with identifiers “close” to the key where distance is measured in the id space Advantages Even distribution Few changes as nodes come and go… Hash(name)  object_id Hash(IP_address)  node_id

Hash a Key to Successor Successor: node with next highest ID Key ID Node ID N10 K5, K10 K100 N100 Circular ID Space N32 K11, K30 K65, K70 N80 N60 K33, K40, K52 Successor: node with next highest ID

Joins and Leaves of Nodes Maintain a circularly linked list around the ring Every node has a predecessor and successor pred node succ

How to Find the Nearest Node? Need to find the closest node To determine who should store (key, value) pair To direct a future lookup(key) query to the node Strawman solution: walk through linked list Circular linked list of nodes in the ring O(n) lookup time when n nodes in the ring Alternative solution: Jump further around ring “Finger” table of additional overlay links

Links in the Overlay Topology Trade-off between # of hops vs. # of neighbors E.g., log(n) for both, where n is the number of nodes E.g., such as overlay links 1/2, 1/4 1/8, … around the ring Each hop traverses at least half of the remaining distance 1/2 1/4 1/8

Chord “Finger Table” Accelerates Lookups ¼ ½ 1/8 1/16 1/32 1/64 Log(N) means we don’t have to track all nodes. Easy to maintain finger info. 1/128 N80

Chord lookups take O(log N) hops Lookup(K19) Maybe note that fingers point to the first relevant node. N80 N60

Successor Lists Ensure Robust Lookup 10, 20, 32 N5 20, 32, 40 N10 5, 10, 20 N110 32, 40, 60 N20 110, 5, 10 N99 40, 60, 80 N32 N40 60, 80, 99 All r successors have to fail before we have a problem. List ensures we find actual current successor. 99, 110, 5 N80 N60 80, 99, 110 Each node remembers r successors Lookup can skip over dead nodes to find blocks

Nodes Coming and Going Small changes when nodes come and go Only affects mapping of keys mapped to the node that comes or goes Hash(name)  object_id Hash(IP_address)  node_id

Chord Self-organization Node join log(n) fingers O(log2 n) cost Node leave Maintain successor list for ring connectivity Update successor list and finger pointers

Chord lookup algorithm properties Interface: lookup(key)  IP address Efficient: O(log N) messages per lookup N is the total number of servers Scalable: O(log N) state per node Robust: survives massive failures Simple to analyze

Plaxton Trees Motivation Access nearby copies of replicated objects Time-space trade-off Space = Routing table size Time = Access hops

Plaxton Trees - Algorithm 1. Assign labels to objects and nodes - using randomizing hash functions 9 A E 4 2 4 7 B Object Node

Plaxton Trees - Algorithm 2. Each node knows about other nodes with varying prefix matches 1 2 4 7 B Prefix match of length 0 2 4 7 B 3 Node 2 3 2 4 7 B Prefix match of length 1 2 5 2 4 7 A 2 4 6 2 4 7 B 2 4 7 B Prefix match of length 2 2 4 7 C 2 4 8 Prefix match of length 3

Plaxton Trees Object Insertion and Lookup Given an object, route successively towards nodes with greater prefix matches 2 4 7 B 9 A E 4 9 A 7 6 Node Object 9 F 1 9 A E 2 Store the object at each of these locations log(n) steps to insert or locate object

Plaxton Trees - Why is it a tree? Object 9 A E 2 Object 9 A 7 6 Object 9 F 1 Object 2 4 7 B

Pastry Based directly upon Plaxton Trees Exports a DHT interface Stores an object only at a node whose ID is closest to the object ID In addition to main routing table Maintains leaf set of nodes Closest L nodes (in ID space) L = 2(b + 1) ,typically

State and Neighbor Assignment in Pastry DHT 000 001 010 011 100 101 110 111 Nodes are leaves in a tree logN neighbors in sub-trees of varying heights

Routing in Pastry DHT 010 Route to the sub-tree with the destination 000 001 011 100 101 110 111 111 Route to the sub-tree with the destination

For d dimensions, routing takes O(dn1/d) hops CAN Map nodes and keys to coordinates in a multi-dimensional cartesian space Zone source key Routing through shortest Euclidean path For d dimensions, routing takes O(dn1/d) hops

State Assignment in CAN 1 Key space is a virtual d-dimensional Cartesian space

State Assignment in CAN 1 2 Key space is a virtual d-dimensional Cartesian space

State Assignment in CAN 3 1 2 Key space is a virtual d-dimensional Cartesian space

State Assignment in CAN 3 1 2 4 Key space is a virtual d-dimensional Cartesian space

State Assignment in CAN Key space is a virtual d-dimensional Cartesian space

CAN Topology and Route Selection (a,b) S Route by forwarding to the neighbor “closest” to the destination

Expected routing guarantee: O(1/k (log2 n)) hops Symphony Similar to Chord – mapping of nodes, keys ‘k’ links are constructed probabilistically! This link chosen with probability P(x) = 1/(x ln n) x Expected routing guarantee: O(1/k (log2 n)) hops

SkipNet Previous designs distribute data uniformly throughout the system Good for load balancing But, my data can be stored in Timbuktu! Many organizations want stricter control over data placement What about the routing path? Should a Microsoft  Microsoft end-to-end path pass through Sun?

SkipNet Content and Path Locality Basic Idea: Probabilistic skip lists Height Nodes Each node choose a height at random Choose height ‘h’ with probability 1/2h

SkipNet Content and Path Locality Height Nodes machine1.berkeley.edu machine1.unr.edu machine2.unr.edu Nodes are lexicographically sorted Still O(log n) routing guarantee!

What can DHTs do for us? Distributed object lookup Based on object ID De-centralized file systems CFS, PAST, Ivy Application Layer Multicast Scribe, Bayeux, Splitstream Databases PIER

Where are we now? Many DHTs offering efficient and relatively robust routing Unanswered questions Node heterogeneity Network-efficient overlays vs. Structured overlays Conflict of interest! What happens with high user churn rate? Security

Are DHTs a panacea? Useful primitive Tension between network efficient construction and uniform key-value distribution Does every non-distributed application use only hash tables? Many rich data structures which cannot be built on top of hash tables alone Exact match lookups are not enough Does any P2P file-sharing system use a DHT?