Lecture 2 Distributed Hash Table

Lecture 2 Distributed Hash Table

Information search in P2P
Suppose we have a P2P systems with N nodes. A file “F” is stored in one node How could an arbitrary node find “F” in the system.

P2P: centralized index original “Napster” design
directory server peers Alice Bob 1 2 3 original “Napster” design 1) when peer connects, it informs central server: IP address content 2) Alice queries for “Hey Jude” 3) Alice requests file from Bob

P2P: problems with centralized directory
single point of failure performance bottleneck copyright infringement: “target” of lawsuit is obvious file transfer is decentralized, but locating content is highly centralized Google shows that a single server farm is reliable and scalable, so the main issue is legal

Query flooding overlay network: graph fully distributed
edge between peer X and Y if there’s a TCP connection all active peers and edges form overlay net edge: virtual (not physical) link given peer typically connected with < 10 overlay neighbors fully distributed no central server used by Gnutella Each peer indexes the files it makes available for sharing (and no other files) Public-domain file-sharing application, protocol spec is minimal, leaving significant flexibility in how the client is implemented

Query flooding Query message sent over existing TCP connections
File transfer: HTTP Query message sent over existing TCP connections peers forward Query message QueryHit sent over reverse path Query QueryHit A peer-count field in query message is set to a specific limit (say, 7) The QueryHit message follows the reverse path of the Query message, using preexisting TCP connections – need Query message ID and state information to do so Client later sets up a direct TCP connection for download using HTTP GET – note that this path is outside of the overlay network, may have difficulty if the node to download from is behind a firewall Scalability: limited scope flooding 2: Application Layer SSL (7/09)

Gnutella: Peer joining
joining peer Alice must find another peer in Gnutella network: use list of candidate peers Alice sequentially attempts TCP connections with candidate peers until connection setup with Bob Flooding: Alice sends Ping message to Bob; Bob forwards Ping message to his overlay neighbors (who then forward to their neighbors….) peers receiving Ping message respond to Alice with Pong message Alice receives many Pong messages, and can then setup additional TCP connections A peer maintains a list of peers (IP addresses) that are often up in the network (these sites are special); alternatively, X can contact a Gnutella site that maintains such a list—this is also a special site, although not referred to as a server Ping message also contains a peer-count field Peer leaving: 1. graceful leaving, 2. abrupt leaving – left-behind neighbors need to find new connections!

Hierarchical Overlay Hybrid of centralized index, query flooding approaches each peer is either a super node or assigned to a super node TCP connection between peer and its super node. TCP connections between some pairs of super nodes. Super node tracks content in its children Typically, a group leader has a few hundred children peers Each group leader is an ordinary peer, but with higher bandwidth and connectivity than other peers An overlay network among group leaders—this overlay network is similar to Gnutella’s network in that a group leader can forward queries of its children to other group leaders (limited scope flooding also?)

Distributed Hash Table (DHT)
DHT = distributed P2P database Database has (key, value) pairs; key: ss number; value: human name key: content type; value: IP address Peers query database with key database returns values that match the key Peers can also insert (key, value) pairs into database Finding “needles” requires that the P2P system be structured

The Principle Of Distributed Hash Tables
A dynamic distribution of a hash table onto a set of cooperating nodes node A node D node B node C Key Value 1 Frozen 9 Tangled 11 Mulan 12 Lion King 21 Cinderella 22 Doreamon Each node has a routing table Pointers to some other nodes Typically, a constant or a logarithmic number of pointers (why?) →Node D : lookup(9) Basic service: lookup operation Key resolution from any node

DHT Desirable Properties
1. Keys mapped evenly to all nodes in the network 2. Each node maintains information about only a few other nodes 3. A key can be found efficiently by querying the system 4. Node arrival/departures only affect a few nodes Oct. 4

Chord Identifiers Assign integer identifier to each peer in range [0,2n-1]. Each identifier can be represented by n bits. Require each key to be an integer in same range. To get integer keys, hash original key. e.g., key = h(“Led Zeppelin IV”) This is why database is called a distributed “hash” table SHA1 hash is 160 bits, there are longer hashes MD5 hash 128 bits

Each key must be stored in a node
Central issue: Assigning (key, value) pairs to peers. Rule: assign to the peer that has the ID closest to key. Convention in lecture: closest is the immediate successor of the key (or equal to) Example: 4 bits; peers: 1,3,4,5,8,10,12,14; key = 13, then successor peer = 14 key = 15, then successor peer = 1 N is 128 or such

Chord [MIT] consistent hashing (SHA-1) assigns each node and object an m-bit ID IDs are ordered in an ID circle ranging from 0 – (2m-1). New nodes assume slots in ID circle according to their ID Key k is assigned to first node whose ID ≥ k  successor(k)

Consistent Hashing - Successor Nodes
identifier node X key 6 4 2 6 5 1 3 7 1 successor(1) = 1 identifier circle successor(6) = 0 6 2 successor(2) = 3 2

Consistent Hashing – Join and Departure
When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n. When node n leaves the network, all of its assigned keys are reassigned to n’s successor.

Consistent Hashing – Node Join
keys 5 7 4 2 6 5 1 3 7 keys 1 keys keys 2

Consistent Hashing – Node Dep.
keys 7 4 2 6 5 1 3 7 keys 1 keys 6 keys 2

Consistent Hashing: more example
For n = 6, # of identifiers is 64. The following DHT ring has 10 nodes and stores 5 keys. The successor of key 10 is node 14.

Circular DHT (1) 1 3 4 5 8 10 12 15 IP addresses are included in query and reply packets, so that each peer can send directly to another peer Each peer only aware of immediate successor and predecessor.

Circle DHT (2) 0001 0011 1111 0100 1100 0101 1010 1000 O(N) messages
on avg to resolve query, when there are N peers Who’s resp for key 1110 ? I am 0011 1111 1110 0100 1110 1110 1100 1110 0101 1110 Define closest as closest successor 1110 1010 1000

Circular DHT with Shortcuts
0001 Who’s resp for key 1110? 0011 1111 0100 1100 Kurose and Ross answer is wrong, also wrong in textbook 0101 1010 1000 Each peer keeps track of IP addresses of predecessor, successor, and short cuts. Reduced from 6 to 3 messages. Can design shortcuts such that O(log N) neighbors per peer, O(log N) messages per query

Scalable Key Location – Finger Tables
keys For. start succ. 6 0+20 0+21 0+22 1 2 4 1 3 4 2 6 5 1 3 7 finger table keys For. start succ. 1 1+20 1+21 1+22 2 3 5 3 finger table keys For. start succ. 2 3+20 3+21 3+22 4 5 7

Chord key location Lookup in finger table the furthest node that precedes key -> O(log n) hops

Peer Churn To handle peer churn, require each peer to know the IP address of its two successors Each peer periodically pings its two successors to see if they are still alive Limited solution for single join or single failure Churn is highly disruptive of structured overlay networks Peer 3 also has to update its second successor info From Jay Liu about DHT churn So I dug a little more into LimeWire's DHT (Mojito, based on Kademlia). First, they have 3 types of nodes: {ref: 1. Passive Leaf: do not participate in DHT routing, but can search on DHT 2. Passive: do not participate in DHT routing (Supernode) 3. Active: full participation in DHT Nodes go into (1) if either they cannot provide DHT services, or don't yet qualify for (3) (2) is a special case where if a node is a "supernode", they actually exclude it from DHT because they don't want to burden the supernodes. In (3), a peer is in DHT. A peer is only in DHT if it exhibits "good uptime". To find out what "good" means, I dug into their source code(*), and for a node to participate in DHT, they have to have (below are default config values): # The Average uptime is 2L*60L*60L*1000L ms (i.e. 2 hours). # Current uptime exceeds 2 hours.

Consistent Hashing – Node Join
keys 5 7 4 2 6 5 1 3 7 keys 1 keys keys 2

Consistent Hashing – Node Dep.
keys 7 4 2 6 5 1 3 7 keys 1 keys 6 keys 2

Node Joins and Stabilizations
The most important thing is the successor pointer. If the successor pointer is ensured to be up to date, which is sufficient to guarantee correctness of lookups, then finger table can always be verified. Each node runs a “stabilization” protocol periodically in the background to update successor pointer and finger table.

Node Joins and Stabilizations
“Stabilization” protocol contains 6 functions: create() join() stabilize() notify() fix_fingers() check_predecessor() When node n first starts, it calls n.join(n’), where n’ is any known Chord node. The join() function asks n’ to find the immediate successor of n.

Node Joins – stabilize()
Each time node n runs stabilize(), it asks its successor for the it’s predecessor p, and decides whether p should be n’s successor instead. stabilize() notifies node n’s successor of n’s existence, giving the successor the chance to change its predecessor to n. The successor does this only if it knows of no closer predecessor than n.

Node Joins – Join and Stabilization
n joins predecessor = nil n acquires ns as successor via some n’ n runs stabilize n notifies ns being the new predecessor ns acquires n as its predecessor np runs stabilize np asks ns for its predecessor (now n) np acquires n as its successor np notifies n n will acquire np as its predecessor all predecessor and successor pointers are now correct fingers still need to be fixed, but old fingers will still work ns pred(ns) = n n nil succ(np) = ns pred(ns) = np succ(np) = n np

Node Failures Successor lists are stabilized as follows:
Key step in failure recovery is maintaining correct successor pointers To help achieve this, each node maintains a successor-list of its r nearest successors on the ring If node n notices that its successor has failed, it replaces it with the first live entry in the list Successor lists are stabilized as follows: node n reconciles its list with its successor s by copying s’s successor list, removing its last entry, and prepending s to it. If node n notices that its successor has failed, it replaces it with the first live entry in its successor list and reconciles its successor list with its new successor.

Handling failures: redundancy
Each node knows IP addresses of next r nodes. Each key is replicated at next r nodes

Evaluation results 10,000 node network

Load distribution Probability density function

Failure rate

Path length

Failed lookups vs churn rate
Start with 500 nodes

Chord main problem Not good churn-handling solution
Only merely achieves “correctness” The definition of a correct Chord is letting each node maintain the predecessor and successor. Which allows a query to eventually arrive the key location, but…. Takes at most O(N) hops to find the key! Not log(N) as the original design claimed.

Chord main problem No good solution to maintain both scalable and consistent finger table under Churn. Not practical for P2P systems which are highly dynamic Paper talking about high consistency: Simon S. Lam and Huaiyu Liu, ``Failure Recovery for Structured P2P Networks: Protocol Design and Performance Evaluation,'' Proceedings of ACM SIGMETRICS 2004,

Chord problem 2 Only good for exact search
Cannot support range search and approximate search

Solution of BitTorrent
Maintain trackers (servers) as DHT, which are more reliable Users queries trackers to get the locations of the file File sharing are not structured.

DHT in a cloud Architecture
Servers are hosted in a cloud. Data are distributed among servers User is a device outside the cloud. User sends a query for a key (webpage, file, data, etc) to the cloud The query first arrives at an arbitrary server and be routed among the servers using DHT. It finally arrives at the server which has the data The server replies the user.

Next paper: Read and Write a review
End of Lecture02 Next paper: Read and Write a review Vivaldi: A Decentralized Network Coordinate System, Frank Dabek, Russ Cox, Frans Kaashoek and Robert Morris, Proceedings SIGCOMM 2004.

Lecture 2 Distributed Hash Table

Similar presentations

Presentation on theme: "Lecture 2 Distributed Hash Table"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 2 Distributed Hash Table

Similar presentations

Presentation on theme: "Lecture 2 Distributed Hash Table"— Presentation transcript:

Similar presentations

About project

Feedback