Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.

Structured P2P Overlays

Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets stays constant, even if the number of machines changes Properties: –Requires a mapping from hash entries to nodes –Don’t need to re-hash everything if node joins/leaves –Only the mapping (and allocation of buckets) needs to change when the number of nodes changes

Hybrid (Broker-mediated) –Unstructured+ centralized Ex.: Napster (closed) –Unstructured + super peer notion Ex.: KazaA, Morpheus (closed) Unstructured decentralized (or loosely controlled) +Files can be anywhere +Support of partial name and keyword queries –Inefficient search (some heuristics exist) & no guarantee of finding Ex.: GTK-Gnutella, Frostwire Structured (or tightly controlled, DHT) +Files are rigidly assigned to specific nodes +Efficient search & guarantee of finding –Lack of partial name and keyword queries Ex.: Chord, CAN, Pastry, Tapestry, Kademlia Classification of the P2P File Sharing Systems

Motivation How to find data in a distributed file sharing system? o Lookup is the key problem Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client ?

Centralized Solution o Requires O(M) state o Single point of failure Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client DB o Central server (Napster)

Distributed Solution (1) o Worst case O(N) messages per lookup Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Flooding (Gnutella, Morpheus, etc.)

Distributed Solution (2) o Routed messages (Freenet, Tapestry, Chord, CAN, etc.) Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Only exact matches

Distributed Hash Tables (DHT) Distributed version of a hash table data structure Stores (key, value) pairs –The key is like a filename –The value can be file contents Goal: Efficiently insert/lookup/delete (key, value) pairs Each peer stores a subset of (key, value) pairs in the system Core operation: Find node responsible for a key –Map key to node –Efficiently route insert/lookup/delete request to this node

Structured Overlays Properties –Topology is tightly controlled Well-defined rules determine to which other nodes a node connects –Files placed at precisely specified locations Hash function maps file names to nodes –Scalable routing based on file attributes In these systems: –files are associated with a key (produced, e.g., by hashing the file name) and –each node in the system is responsible for storing a certain range of keys

Document Routing The core of these DHT systems is the routing algorithm The DHT nodes form an overlay network with each node having several other nodes as neighbors When a lookup(key) is issued, the lookup is routed through the overlay network to the node responsible for that key The scalability of these DHT algorithms is tied directly to the efficiency of their routing algorithms

Document Routing Algorithms They take, as input, a key and, in response, route a message to the node responsible for that key –The keys are strings of digits of some length –Nodes have identifiers, taken from the same space as the keys (i.e., same number of digits) Each node maintains a routing table consisting of a small subset of nodes in the system When a node receives a query for a key for which it is not responsible, the node routes the query to the neighbour node that makes the most “progress” towards resolving the query –The notion of progress differs from algorithm to algorithm, but in general is defined in terms of some distance between the identifier of the current node and the identifier of the queried key

Content-Addressable Network ( CAN) A typical document routing method Virtual Cartesian coordinate space is used Entire space is partitioned amongst all the nodes –every node “owns” a zone in the overall space Abstraction –can store data at “points” in the space –can route from one “point” to another Point = node that owns the enclosing zone

CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: –Node n1:(1, 2) first node that joins  cover the entire space 1 234 5 670 1 2 3 4 5 6 7 0 n1

CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n2 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2

CAN Example: Two Dimensional Space Node n3:(3, 5) joins  space is divided between n1 and n3 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3

CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5

CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN: Query Example Each node knows its neighbours in the d-space Forward query to the neighbour that is closest to the query id Example: assume Node n1 queries File Item f4 1 234 5 670 1 2 3 4 5 6 7 0 n1 n2 n3 n4 n5 f1 f2 f3 f4

Document Routing – CAN nodeId fileIdAssociate to each node and item a unique id (nodeId and fileId) in an d-dimensional space Goals –Scales to hundreds of thousands of nodes –Handles rapid arrival and failure of nodes Properties –Routing table size O(d) –Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes Resource Discovery

Associative array An associative array is an abstract data type –It is composed of a collection of (key, value) pairs Data structure is a way of storing and organizing data in a computer to be used efficiently The dictionary problem is the task of designing a data structure of an associative array –A solution: the hash table Binding: the association between a key and a value

Hashing Any algorithm that maps data of a variable length to data of a fixed length It is used to generate fixed-length output data –It is a shortened reference to the original data In 1953, D. Knuth and H.P. Luhn of IBM used the concept of hashing In 1973, R. Morris used the term ”hashing” in a formal terminology –Before it was used in the technical jargon, only

Distributed Hash Tables Key identifies data uniquely DHT balances keys and data across nodes DHT replicates, caches, routes lookups, etc. Distributed hash tables Distributed applications Lookup (key) data node …. Insert(key, data)

DHT Applications Many services can be built on top of a DHT interface –File sharing –Archival storage –Databases –Chat service –Rendezvous-based communication –Publish/Subscribe systems

Chord It is a protocol and algorithm It was introduced in 2001 by I. Stoica, R. Morris, D. Karger, F. Kaashoek, & H. Balakrishnan It is based on the SHA-1 hashing algorithm –Secure Hash Standard It uses m-bit identifiers (IDs) –for the hashed IP addresses of the computers (nodes) –for the hashed data (keys) A logical ring with positions numbered 0 to 2 m -1 is formed among nodes

P2P Ring Nodes are arranged in a ring based on ID identifiers are arranged on a identifier circle modulo 2 m => Chord ring IDs are assigned randomly Very large ID space –m is large enough to make collisions improbable

Construction of the Chord ring Data items (keys) also have IDs Every node is responsible for a subset of the keys a key is assigned to the node whose ID (ID node ) is equal to or greater than the ID key this node is called successor of the key and is the first node clockwise from the ID key

Chord structure Every node is responsible for a subset of the data Routing algorithm locates data, with small per-node routing state Volunteer nodes join and leave system at any time All nodes have identical responsibilities All communication is symmetric Route(d46a1c) 65a1fc d13da3 d4213f d462ba d467c4 d471f1 d46a1c

o Every node knows of every other node o requires global information o Routing tables are large O(n) o Lookups are fast O(1) N32 N90 N123 0 Hash(“LetItBe”) = K60 N10 N55 Where is “LetItBe”? “N90 has K60” K60 Lookup with global knowledge

Successor and predecessor Each node has a successor and a predecessor The successor of a given node is that node, whose ID is equal to or follows the identifier of the given node If there are n nodes and k keys, then each node is responsible for about k/n keys Basic case - each node knows only the location of its successor Increasing the robustness - using more than one successors –Each node knows r immediate successors –After failure, will know first live successor Predecessor has less importance

Lookup with local knowledge // ask node n to find the successor of id n.find_successor(id) if (id (n; successor]) return successor; else // forward the query around the circle return successor.find_successor(id); Disadvantage –Number of messages linear in the number of nodes O (n )

Modulo operator Modulo operation finds the remainder of division of one number by another x mod y ↔ x - (y * int (x / y) ) Examples: 10 mod 8 = 2 5 mod 5 = 0

Lookup with routing (finger) table Additional routing information to accelerate lookups Each node contains a routing table with up to m entries => finger table –m is number of bits of the identifiers –Every node knows m other nodes in the ring The i th entry of a given node N contains the address of successor ((N+2 i-1 ) mod 2 m ) –Increase distance exponentially

Division of the distance by finger tables N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

o Finger i points to successor of N+2 i o In this case N=80, i=5 N120 N80 80 + 2 0 N112 N96 N16 80 + 2 1 80 + 2 2 80 + 2 3 80 + 2 4 80 + 2 5 80 + 2 6 Routing with finger tables Route via binary search Use fingers first Then successors not belonging into fingers Cost is O(log n)

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Important characteristics of this scheme: Each node stores information about only a small number of nodes (m) Each node knows more about nodes closely following it than about nodes farer away A finger table generally does not contain enough information to directly determine the successor of an arbitrary key k

0 4 26 5 1 3 7 1 2 4 [1,2) [2,4) [4,0) 1 3 0 finger table startint.succ. keys 1 235235 [2,3) [3,5) [5,1) 330330 finger table startint.succ. keys 2 457457 [4,5) [5,7) [7,3) 000000 finger table startint.succ. keys 6 Another example for finger tables m=3 ((N+2 i-1 ) mod 2 m ), N=0,…,n int. = node N search for successor in the interval int.

Node joins with finger tables 0 4 26 5 1 3 7 124124 [1,2) [2,4) [4,0) 130130 finger table startint.succ. keys 1 235235 [2,3) [3,5) [5,1) 330330 finger table startint.succ. keys 2 457457 [4,5) [5,7) [7,3) 000000 finger table startint.succ. keys finger table startint.succ. keys 702702 [7,0) [0,2) [2,6) 003003 6 6 6 6 6 int. = node N search for successor in the interval int.

Node departures with finger tables 0 4 26 5 1 3 7 124124 [1,2) [2,4) [4,0) 1313 finger table startint.succ. keys 1 235235 [2,3) [3,5) [5,1) 330330 finger table startint.succ. keys 2 457457 [4,5) [5,7) [7,3) 660660 finger table startint.succ. keys finger table startint.succ. keys 702702 [7,0) [0,2) [2,6) 003003 6 6 6 0 3

o Three-step process: o Initialize all fingers of new node o Update fingers of existing nodes o Transfer keys from successor to new node Joining the Chord ring

o Initialize the new node finger table o Locate any node N in the ring o Ask node N to lookup fingers of new node N36 o Return results to new node N36 1. Lookup(37,38,40,…,100,164) N60 N40 N5 N20 N99 N80 Joining the Chord ring – step 1

o Updating fingers of existing nodes o new node calls update function on existing nodes o existing nodes can recursively update fingers of other nodes N36 N60 N40 N5 N20 N99 N80 Joining the Chord ring – step 2

o Transfer keys from successor node to new node o only keys in the range are transferred Copy keys 21..36 from N40 to N36 K30 K38 N36 N60 N40 N5 N20 N99 N80 K30 K38 Joining the Chord ring – step 3

Joining the Chord ring To ensure correct lookups, all successor pointers must be up to date => stabilization protocol running periodically in the background Updates finger tables and successor pointers

Lookup Mechanism o Lookups take O (Log n) hops o n is the total number of nodes o Lookup: route to closest predecessor N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

o Cost of lookup is O(log n) Number of Nodes Average Messages per Lookup Measured lookup procedure

Handling failures: redundancy N32 N10 N5 N20 N110 N99 N80 N60 Each node knows IP addresses of next r nodes Each key is replicated at next r nodes N40 K19

Lookups find replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 N68 1. 2. 3. 4. Lookup(K19) K19

Chord software 3000 lines of C++ code Library to be linked with the application provides a lookup(key) – function: yields the IP address of the node responsible for the key Notifies the node of changes in the set of keys the node is responsible for

Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.

Similar presentations

Presentation on theme: "Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.

Similar presentations

Presentation on theme: "Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets."— Presentation transcript:

Similar presentations

About project

Feedback