Somdas Bandyopadhyay 09305032 Anirban Basumallik 09305008.

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Peer to Peer and Distributed Hash Tables
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert MorrisDavid, Liben-Nowell, David R. Karger, M. Frans Kaashoek,
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Prepared by Ali Yildiz (with minor modifications by Dennis Shasha)
Technische Universität Yimei Liao Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei.
Technische Universität Chemnitz Kurt Tutschku Vertretung - Professur Rechner- netze und verteilte Systeme Chord - A Distributed Hash Table Yimei Liao.
Chord: A Scalable Peer-to- Peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Speaker: Cathrin Weiß 11/23/2004 Proseminar Peer-to-Peer Information Systems.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Distributed Hash-based Lookup for Peer-to-Peer Systems Mohammed Junaid Azad Gopal Krishnan Mtech1,CSE.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
1 1 Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Dariotaki Roula
Chord:A scalable peer-to-peer lookup service for internet applications
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up.
Robert Morris, M. Frans Kaashoek, David Karger, Hari Balakrishnan, Ion Stoica, David Liben-Nowell, Frank Dabek Chord: A scalable peer-to-peer look-up protocol.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
A Scalable Content Addressable Network (CAN)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Chord: A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Stoica et al. Presented by Tam Chantem March 30, 2007.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Peer-to-Peer.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Chord A Scalable Peer-to-peer Lookup Service for Internet Applications Lecture 3 1.
Effizientes Routing in P2P Netzwerken Chord: A Scalable Peer-to- peer Lookup Protocol for Internet Applications Dennis Schade.
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Introduction of P2P systems
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Presented by: Tianyu Li
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Lecture 2 Distributed Hash Table
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Distributed Hash Tables Steve Ko Computer Sciences and Engineering University at Buffalo.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Chapter 5 Naming (I) Speaker : Jyun-Yao Huang 1 Application and Practice of Distributed Systems.
CSE 486/586 Distributed Systems Distributed Hash Tables
Ryan Huebsch, Joseph M. Hellerstein, Ion Stoica, Nick Lanham, Boon Thau Loo, Scott Shenker Querying the Internet with PIER Speaker: Natalia KozlovaTutor:
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
1 Distributed Hash tables. 2 Overview r Objective  A distributed lookup service  Data items are distributed among n parties  Anyone in the network.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
CSE 486/586 Distributed Systems Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
(slides by Nick Feamster)
DHT Routing Geometries and Chord
Prof. Leonardo Mostarda University of Camerino
A Scalable Peer-to-peer Lookup Service for Internet Applications
Presentation transcript:

Somdas Bandyopadhyay Anirban Basumallik

 What are Peer to Peer System  Examples of Peer to Peer System  What are Distributed Hash Table?  DHT Example - CHORD  DHT Example - CAN  An application of DHT - PIER

 Distributed and Decentralized Architecture  Peers make a portion of their resources directly available to other network participants, without the need for central coordination by servers  Peers are equally privileged, equipotent participants in the application.  Peers are both suppliers and consumers, in contrast to the traditional client–server model

All in the application layer

 P 2 P file sharing system  Central Server stores the index of all the files available on the network  To retrieve a file, central server contacted to obtain location of desired file  Central server not scalable  Single point of failure

 On start-up, client contacts a few other nodes; these become its neighbors  Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender.  TTL limits propagation  No Single point of failure  Search might to succeed even if file was present in the network.

 First each client connects to a hub  Each hub maintains the list of users  To search for a file the client sends the following command to the hub - $Search :. is the IP address of the client and is a UDP port on which the client is listening for responses  The hub must forward this message unmodified to all the other users. Every other user with one or more matching files must send a UDP packet to :  Matching files are found using some kind of hashing.

 Each client contacts a web server with a file name. The Web server returns a torrent file that contains info like length of the file, size and the URL of the tracker.  The client then talks with the tracker, which returns the list of peers that are currently downloading/uploading the file  Each peer divides the files in pieces.  The client then downloads pieces from its peers. Simultaneously, it also uploads the pieces that it has already downloaded.

 In any P 2 P system, File transfer process is inherently scalable  However, the indexing scheme which maps file names to location crucial for scalability  Solution:- Distributed Hash Table

 Traditional name and location services provide a direct mapping between keys and values  What are examples of values? A value can be an address, a document, or an arbitrary data item  Distributed hash tables such as CAN/Chord implement a distributed service for storing and retrieving key/value pairs

DNS  Provides a host name to IP address mapping  Relies on a set of special root servers  Names reflect administrative boundaries  Is specialized to finding named hosts or services DHT  Can provide same service: Name = key, value = IP  Requires no special servers. Distributed Service  Imposes no naming structure. Flat naming  Can also be used to find data objects that are not tied to certain machines

CHORD

CHORD is a distributed hash table implementation Addresses a fundamental problem in P 2 P  Efficient location of the node that stores desired data item  One operation: Given a key, maps it onto a node  Data location by associating a key with each data item Adapts Efficiently  Dynamic with frequent node arrivals and departures  Automatically adjusts internal tables to ensure availability Uses Consistent Hashing  Load balancing in assigning keys to nodes  Little movement of keys when nodes join and leave

Efficient Routing  Distributed routing table  Maintains information about only O(logN) nodes  Resolves lookups via O(logN) messages Scalable  Communication cost and state maintained at each node scales logarithmically with number of nodes Flexible Naming  Flat key-space gives applications flexibility to map their own names to Chord keys

Each node and key is assigned a m-bit Identifier using SHA-1 as a hash function. A node’s identifier is chosen by hashing the node’s IP address, while a key’s identifier is chosen by hashing the key. Identifiers are ordered on an identifier circle modulo 2 m. Key K is assigned to the first node whose identifier is equal to or follows the identifier of K in the identifier circle. This node is called the successor of key K.

CHORD RING

1) Each node only knows about its successor 2) Minimum state required 3) Maximum path length

 Lookups are accelerated by maintaining additional routing information  Each node maintains a routing table with (at most) m entries (where N=2 m ) called the finger table  i th entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2 i-1 on the identifier circle (clarification on next slide)  s = successor(n + 2 i-1 ) (all arithmetic mod 2)  s is called the i th finger of node n, denoted by n.finger(i).node

 Each node stores information about only a small number of other nodes, and knows more about nodes closely following it than about nodes farther away  A node’s finger table generally does not contain enough information to determine the successor of an arbitrary key k  Repetitive queries to nodes that immediately precede the given key will lead to the key’s successor eventually

finger table startsucc. keys finger table startsucc. keys finger table startsucc. keys finger table startsucc. keys

 Basic “stabilization” protocol is used to keep nodes’ successor pointers up to date, which is sufficient to guarantee correctness of lookups  Those successor pointers can then be used to verify the finger table entries  Every node runs stabilize periodically to find newly joined nodes

npnp succ(n p ) = n s nsns n pred(n s ) = n p  n joins predecessor = nil n acquires n s as successor via some n’ n notifies n s being the new predecessor n s acquires n as its predecessor  n p runs stabilize n p asks n s for its predecessor (now n) n p acquires n as its successor n p notifies n n will acquire n p as its predecessor  all predecessor and successor pointers are now correct  fingers still need to be fixed, but old fingers will still work nil pred(n s ) = n succ(n p ) = n

For a lookup before stabilization has finished 1. Case 1: All finger table entries involved in the lookup are reasonably current then lookup finds correct successor in O(logN) steps 1. Case 2: Successor pointers are correct, but finger pointers are inaccurate. This scenario yields correct lookups but may be slower 1. Case 3: Incorrect successor pointers or keys not migrated yet to newly joined nodes. Lookup may fail. Option of retrying after a quick pause, during which stabilization fixes successor pointers

After stabilization, no effect other than increasing the value of N in O(logN) Before stabilization is complete Possibly incorrect finger table entries Does not significantly affect lookup speed, since distance halving property depends only on ID-space distance If new nodes’ IDs are between the target predecessor and the target, then lookup speed is influenced

finger table startsucc. keys finger table startsucc. keys finger table startsucc. keys finger table startsucc. keys

Problem: what if node does not know who its new successor is, after failure of old successor May be in a gap in the finger table Chord would be stuck! Maintain successor list of size r, containing the node’s first r successors If immediate successor does not respond, substitute the next entry in the successor list Modified version of stabilize protocol to maintain the successor list

Modified closest_preceding_node to search not only finger table but also successor list for most immediate predecessor Voluntary Node Departures Transfer keys to successor before departure Notify predecessor p and successor s before leaving

Implements Iterative Style (other one is recursive style) Node resolving a lookup initiates all communication unlike Recursive Style, where intermediate nodes forward request Optimizations During stabilization, a node updates its immediate successor and 1 other entry in successor list or finger table Each entry out of k unique entries gets refreshed once in ’k’ stabilization rounds When predecessor changes, immediate notification to old predecessor is sent to update its successor, without waiting for next stabilization round

Test ability of consistent hashing, to allocate keys to nodes evenly Number of keys per node exhibits large variations, that increase linearly with the number of keys Association of keys with Virtual Nodes Makes the number of keys per node more uniform and Significantly improves load balance Asymptotic value of query path length not affected much Not much increase in routing state maintained

In the absence of Virtual Nodes The mean and 1st and 99th percentiles of the number of keys stored per node in a 10,000 node network.

In the presence of Virtual Nodes The 1st and the 99th percentiles of the number of keys per node as a function of virtual nodes mapped to a real node. The network has 10 4 realnodes and stores 10 6 keys.

Number of nodes that must be visited to resolve a query, measured as the query path length The number of nodes that must be contacted to find a successor in an N-node Network is O(log N) Mean query path length increases logarithmically with number of nodes A network with N = 2 K nodes No of Keys 100 x 2 K K varied between 3 to 16 and Path length is measured

 No. of Nodes – 1000  Size of Successor list is 20.  The 1st and the 99th percentiles are in parentheses  The mean path length does not increase too much.

CAN

CAN is a distributed infrastructure that provides hash table like functionality CAN is composed of many individual nodes Each CAN node stores a chunk (zone) of the entire hash table Request for a particular key is routed by intermediate CAN nodes whose zone contains that key

 Involves a virtual d-dimensional Cartesian Co-ordinate space  The co-ordinate space is completely logical  Lookup keys hashed into this space  The co-ordinate space is partitioned into zones among all nodes in the system  Every node in the system owns a distinct zone

To store (Key,value) pairs, keys are mapped deterministically onto a point P in co-ordinate space using a hash function The (Key,value) pair is then stored at the node which owns the zone containing P To retrieve an entry corresponding to Key K, the same hash function is applied to map K to the point P The retrieval request is routed from requestor node to node owning zone containing P

Every CAN node holds IP address and virtual co-ordinates of each of it’s neighbours Every message to be routed holds the destination co-ordinates Using it’s neighbour’s co-ordinate set, a node routes a message towards the neighbour with co-ordinates closest to the destination co- ordinates

 For a d-dimensional space partitioned into n equal zones, routing path length = O(d.n 1/d ) hops  With increase in no. of nodes, routing path length grows as O(n 1/d )  Every node has 2d neighbours  With increase in no. of nodes, per node state does not change

PIER

 Peer-to-Peer Information Exchange and Retrieval  It is a query engine that scales up to thousands of participating nodes and can work on various data  It runs on top of P 2 P network  step up to distributed query processing at a larger scale  way for massive distribution: querying heterogeneous data  Architecture meets traditional database query processing with recent peer-to-peer technologies

lookup(key)  ipaddr join(landmarkNode) leave() locationMapChange() store(key, item) retrieve(key)  item remove(key) Routing Layer API Storage Manager API

Provider get (namespace, resourceID)  item put (namespace, resourceID, instanceID, item, lifetime) renew (namespace, resourceID, instanceID, lifetime)  bool multicast(namespace, resourceID, item) lscan(namespace)  items CALLBACK: newData(namespace, item) Each object in the DHT has a namespace, resourceID and instanceID DHT key = hash(namespace,resourceID) namespace - application or group of object, table resourceID – what is object, primary key or any attribute instanceID – integer, to separate items with the same namespace and resourceID

 Let N R and N S be two namespaces storing tuples of relations R and S respectively.  N R and N S performs lscan() to locate all tuples of R and S.  All tuples that satisfy the where clause conditions are put in a new namespace N Q  The value for the join attribute is made the resouceID  Tuples are tagged with source table names

 Each node registers with DHT to receive a newData callback  When a tuple arrives, a get is issued to N Q (this stays local.)  Matches are concatenated to the probe tuple and the output tuples are generated (which are sent to the next stage in the query or another DHT namespace.) Contd…

Questions