Structured P2P Overlays. Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
P2P Systems and Distributed Hash Tables Section COS 461: Computer Networks Spring 2011 Mike Freedman
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
1 1 Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Dariotaki Roula
Distributed Hash Tables CPE 401 / 601 Computer Network Systems Modified from Ashwin Bharambe and Robert Morris.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Distributed Lookup Systems
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Effizientes Routing in P2P Netzwerken Chord: A Scalable Peer-to- peer Lookup Protocol for Internet Applications Dennis Schade.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Structured P2P Overlays. Hybrid (Broker-mediated) –Unstructured+ centralized Ex.: Napster –Unstructured + super peer notion Ex.: KazaA, Morpheus Unstructured.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
1 Distributed Hash tables. 2 Overview r Objective  A distributed lookup service  Data items are distributed among n parties  Anyone in the network.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
The Chord P2P Network Some slides taken from the original presentation by the authors.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
The Chord P2P Network Some slides have been borrowed from the original presentation by the authors.
A Scalable Peer-to-peer Lookup Service for Internet Applications
Peer-to-Peer Data Management
(slides by Nick Feamster)
EE 122: Peer-to-Peer (P2P) Networks
DHT Routing Geometries and Chord
P2P Systems and Distributed Hash Tables
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Structured P2P Overlays

Consistent Hashing – the Basis of Structured P2P Intuition: –We want to build a distributed hash table where the number of buckets stays constant, even if the number of machines changes Properties: –Requires a mapping from hash entries to nodes –Don’t need to re-hash everything if node joins/leaves –Only the mapping (and allocation of buckets) needs to change when the number of nodes changes

Hybrid (Broker-mediated) –Unstructured+ centralized Ex.: Napster (closed) –Unstructured + super peer notion Ex.: KazaA, Morpheus (closed) Unstructured decentralized (or loosely controlled) +Files can be anywhere +Support of partial name and keyword queries –Inefficient search (some heuristics exist) & no guarantee of finding Ex.: GTK-Gnutella, Frostwire Structured (or tightly controlled, DHT) +Files are rigidly assigned to specific nodes +Efficient search & guarantee of finding –Lack of partial name and keyword queries Ex.: Chord, CAN, Pastry, Tapestry, Kademlia Classification of the P2P File Sharing Systems

Motivation How to find data in a distributed file sharing system? o Lookup is the key problem Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client ?

Centralized Solution o Requires O(M) state o Single point of failure Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client DB o Central server (Napster)

Distributed Solution (1) o Worst case O(N) messages per lookup Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Flooding (Gnutella, Morpheus, etc.)

Distributed Solution (2) o Routed messages (Freenet, Tapestry, Chord, CAN, etc.) Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Only exact matches

Distributed Hash Tables (DHT) Distributed version of a hash table data structure Stores (key, value) pairs –The key is like a filename –The value can be file contents Goal: Efficiently insert/lookup/delete (key, value) pairs Each peer stores a subset of (key, value) pairs in the system Core operation: Find node responsible for a key –Map key to node –Efficiently route insert/lookup/delete request to this node

Structured Overlays Properties –Topology is tightly controlled Well-defined rules determine to which other nodes a node connects –Files placed at precisely specified locations Hash function maps file names to nodes –Scalable routing based on file attributes In these systems: –files are associated with a key (produced, e.g., by hashing the file name) and –each node in the system is responsible for storing a certain range of keys

Document Routing The core of these DHT systems is the routing algorithm The DHT nodes form an overlay network with each node having several other nodes as neighbors When a lookup(key) is issued, the lookup is routed through the overlay network to the node responsible for that key The scalability of these DHT algorithms is tied directly to the efficiency of their routing algorithms

Document Routing Algorithms They take, as input, a key and, in response, route a message to the node responsible for that key –The keys are strings of digits of some length –Nodes have identifiers, taken from the same space as the keys (i.e., same number of digits) Each node maintains a routing table consisting of a small subset of nodes in the system When a node receives a query for a key for which it is not responsible, the node routes the query to the neighbour node that makes the most “progress” towards resolving the query –The notion of progress differs from algorithm to algorithm, but in general is defined in terms of some distance between the identifier of the current node and the identifier of the queried key

Content-Addressable Network ( CAN) A typical document routing method Virtual Cartesian coordinate space is used Entire space is partitioned amongst all the nodes –every node “owns” a zone in the overall space Abstraction –can store data at “points” in the space –can route from one “point” to another Point = node that owns the enclosing zone

CAN Example: Two Dimensional Space Space divided between nodes All nodes cover the entire space Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 Example: –Node n1:(1, 2) first node that joins  cover the entire space n1

CAN Example: Two Dimensional Space Node n2:(4, 2) joins  space is divided between n1 and n n1 n2

CAN Example: Two Dimensional Space Node n3:(3, 5) joins  space is divided between n1 and n n1 n2 n3

CAN Example: Two Dimensional Space Nodes n4:(5, 5) and n5:(6,6) join n1 n2 n3 n4 n5

CAN Example: Two Dimensional Space Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN Example: Two Dimensional Space Each item is stored by the node who owns its mapping in the space n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN: Query Example Each node knows its neighbours in the d-space Forward query to the neighbour that is closest to the query id Example: assume Node n1 queries File Item f n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN: Query Example Each node knows its neighbours in the d-space Forward query to the neighbour that is closest to the query id Example: assume Node n1 queries File Item f n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN: Query Example Each node knows its neighbours in the d-space Forward query to the neighbour that is closest to the query id Example: assume Node n1 queries File Item f n1 n2 n3 n4 n5 f1 f2 f3 f4

CAN: Query Example Each node knows its neighbours in the d-space Forward query to the neighbour that is closest to the query id Example: assume Node n1 queries File Item f n1 n2 n3 n4 n5 f1 f2 f3 f4

Document Routing – CAN nodeId fileIdAssociate to each node and item a unique id (nodeId and fileId) in an d-dimensional space Goals –Scales to hundreds of thousands of nodes –Handles rapid arrival and failure of nodes Properties –Routing table size O(d) –Guarantees that a file is found in at most d*n 1/d steps, where n is the total number of nodes Resource Discovery

Associative array An associative array is an abstract data type –It is composed of a collection of (key, value) pairs Data structure is a way of storing and organizing data in a computer to be used efficiently The dictionary problem is the task of designing a data structure of an associative array –A solution: the hash table Binding: the association between a key and a value

Hashing Any algorithm that maps data of a variable length to data of a fixed length It is used to generate fixed-length output data –It is a shortened reference to the original data In 1953, D. Knuth and H.P. Luhn of IBM used the concept of hashing In 1973, R. Morris used the term ”hashing” in a formal terminology –Before it was used in the technical jargon, only

Distributed Hash Tables Key identifies data uniquely DHT balances keys and data across nodes DHT replicates, caches, routes lookups, etc. Distributed hash tables Distributed applications Lookup (key) data node …. Insert(key, data)

DHT Applications Many services can be built on top of a DHT interface –File sharing –Archival storage –Databases –Chat service –Rendezvous-based communication –Publish/Subscribe systems

Chord It is a protocol and algorithm It was introduced in 2001 by I. Stoica, R. Morris, D. Karger, F. Kaashoek, & H. Balakrishnan It is based on the SHA-1 hashing algorithm –Secure Hash Standard It uses m-bit identifiers (IDs) –for the hashed IP addresses of the computers (nodes) –for the hashed data (keys) A logical ring with positions numbered 0 to 2 m -1 is formed among nodes

P2P Ring Nodes are arranged in a ring based on ID identifiers are arranged on a identifier circle modulo 2 m => Chord ring IDs are assigned randomly Very large ID space –m is large enough to make collisions improbable

Construction of the Chord ring Data items (keys) also have IDs Every node is responsible for a subset of the keys a key is assigned to the node whose ID (ID node ) is equal to or greater than the ID key this node is called successor of the key and is the first node clockwise from the ID key

Chord structure Every node is responsible for a subset of the data Routing algorithm locates data, with small per-node routing state Volunteer nodes join and leave system at any time All nodes have identical responsibilities All communication is symmetric Route(d46a1c) 65a1fc d13da3 d4213f d462ba d467c4 d471f1 d46a1c

o Every node knows of every other node o requires global information o Routing tables are large O(n) o Lookups are fast O(1) N32 N90 N123 0 Hash(“LetItBe”) = K60 N10 N55 Where is “LetItBe”? “N90 has K60” K60 Lookup with global knowledge

Successor and predecessor Each node has a successor and a predecessor The successor of a given node is that node, whose ID is equal to or follows the identifier of the given node If there are n nodes and k keys, then each node is responsible for about k/n keys Basic case - each node knows only the location of its successor Increasing the robustness - using more than one successors –Each node knows r immediate successors –After failure, will know first live successor Predecessor has less importance

Lookup with local knowledge // ask node n to find the successor of id n.find_successor(id) if (id (n; successor]) return successor; else // forward the query around the circle return successor.find_successor(id); Disadvantage –Number of messages linear in the number of nodes O (n )

Modulo operator Modulo operation finds the remainder of division of one number by another x mod y ↔ x - (y * int (x / y) ) Examples: 10 mod 8 = 2 5 mod 5 = 0

Lookup with routing (finger) table Additional routing information to accelerate lookups Each node contains a routing table with up to m entries => finger table –m is number of bits of the identifiers –Every node knows m other nodes in the ring The i th entry of a given node N contains the address of successor ((N+2 i-1 ) mod 2 m ) –Increase distance exponentially

Division of the distance by finger tables N80 ½ ¼ 1/8 1/16 1/32 1/64 1/128

o Finger i points to successor of N+2 i o In this case N=80, i=5 N120 N N112 N96 N Routing with finger tables Route via binary search Use fingers first Then successors not belonging into fingers Cost is O(log n)

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Finger table: finger[i] = successor ((N + 2 i-1 ) mod 2 m ), N=0,…n

Scalable node localization Important characteristics of this scheme: Each node stores information about only a small number of nodes (m) Each node knows more about nodes closely following it than about nodes farer away A finger table generally does not contain enough information to directly determine the successor of an arbitrary key k

[1,2) [2,4) [4,0) finger table startint.succ. keys [2,3) [3,5) [5,1) finger table startint.succ. keys [4,5) [5,7) [7,3) finger table startint.succ. keys 6 Another example for finger tables m=3 ((N+2 i-1 ) mod 2 m ), N=0,…,n int. = node N search for successor in the interval int.

Node joins with finger tables [1,2) [2,4) [4,0) finger table startint.succ. keys [2,3) [3,5) [5,1) finger table startint.succ. keys [4,5) [5,7) [7,3) finger table startint.succ. keys finger table startint.succ. keys [7,0) [0,2) [2,6) int. = node N search for successor in the interval int.

Node departures with finger tables [1,2) [2,4) [4,0) 1313 finger table startint.succ. keys [2,3) [3,5) [5,1) finger table startint.succ. keys [4,5) [5,7) [7,3) finger table startint.succ. keys finger table startint.succ. keys [7,0) [0,2) [2,6)

o Three-step process: o Initialize all fingers of new node o Update fingers of existing nodes o Transfer keys from successor to new node Joining the Chord ring

o Initialize the new node finger table o Locate any node N in the ring o Ask node N to lookup fingers of new node N36 o Return results to new node N36 1. Lookup(37,38,40,…,100,164) N60 N40 N5 N20 N99 N80 Joining the Chord ring – step 1

o Updating fingers of existing nodes o new node calls update function on existing nodes o existing nodes can recursively update fingers of other nodes N36 N60 N40 N5 N20 N99 N80 Joining the Chord ring – step 2

o Transfer keys from successor node to new node o only keys in the range are transferred Copy keys from N40 to N36 K30 K38 N36 N60 N40 N5 N20 N99 N80 K30 K38 Joining the Chord ring – step 3

Joining the Chord ring To ensure correct lookups, all successor pointers must be up to date => stabilization protocol running periodically in the background Updates finger tables and successor pointers

Lookup Mechanism o Lookups take O (Log n) hops o n is the total number of nodes o Lookup: route to closest predecessor N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

o Cost of lookup is O(log n) Number of Nodes Average Messages per Lookup Measured lookup procedure

Handling failures: redundancy N32 N10 N5 N20 N110 N99 N80 N60 Each node knows IP addresses of next r nodes Each key is replicated at next r nodes N40 K19

Lookups find replicas N40 N10 N5 N20 N110 N99 N80 N60 N50 N Lookup(K19) K19

Chord software 3000 lines of C++ code Library to be linked with the application provides a lookup(key) – function: yields the IP address of the node responsible for the key Notifies the node of changes in the set of keys the node is responsible for