INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1
Introduction Client-servers will be discussed next week Peer to Peer this week. 2 Server Clients Simultaneous Server/Clients
Introduction First instance of peer-to-peer file sharing dates back to December 1987 Wayne Bell created WWIVnet Still exists: Other systems now exist. 3
P2P Networks Internet users that are ready to share their resources become peers and form a network When a peer in the network has a file to share, it makes it available to the rest of the peers An interested peer can connect itself to the computer where the file is stored and download it. 4
Centralised Network Hybrid P2P Network Directory system (listing peers and what they offer) located on a central server (client-server paradigm) Storage and downloading occurs via P2P paradigm Peer queries central server Server sends IP address of nodes holding files Peer then downloads files from those nodes Directory constantly updated as nodes join and leave network. 5
Centralised Network Maintenance of directory very simple Drawbacks Directory vulnerable to attack Whole system fails if servers go down Original Napster used centralised Network Made them liable for copyright breaches New Napster a legal pay per music site. 6
Figure 29.1: Centralised network 7
Decentralised Network Peers arrange themselves into an overlay network Logical network on top of the physical network Can be classified as Unstructured Networks Structured Networks. 8
Unstructured Network Nodes linked randomly Queries need to flood network Can result in high traffic ie not efficient Examples include Gnutella Freenet. 9
Gnutella Unstructured decentralised P2P network Directory randomly distributed between nodes Node A sends query (request for file location) to a known neighbour node (eg W) If Node W knows location of requested data Sends location of data back to Node A If Node W doesn’t know Sends queries to all its known neighbours Eventually info gets back to A (if it exists) and Node A can get copy of file. 10
Gnutella Queries flood the network and can cause a large amount of traffic NB each node must have at least 1 neighbour On initial software install, a list of peers are included Later the commands 'ping' and 'pong’ used to query if nodes 'alive' Unstructured networks do not scale well Gnutella uses a tiered system (ultra nodes and leaves) as well as Query Routing Protocol and Dynamic Querying to reduce overhead. 11
Structured Network Predefined set of rules to link nodes Queries are resolved effectively and efficiently Distributed Hash Table (DHT) most common technique used Domain Name System (DNS) BitTorrent. 12
Distributed Hash Table (DHT) Distributes data among a set of nodes according to some predefined rules Each peer in a DHT-based network becomes responsible for a range of data items DHT-based networks allow each peer to have partial knowledge about whole network Avoids flooding overhead found in unstructured P2P networks. 13
Address Space Each data item and responsible peer mapped to a point in a large address space of size 2 m Uses modular arithmetic Points in address space distributed evenly on a circle with 2 m points (from 0 to 2 m – 1) Most DHT implementations use m = 160 (~1.5x10 48 points) Textbook uses m = 5, 2 5 = 32 in examples for simplification. 14
Figure 29.2: Address space 15
Hashing Identifiers Peers added to address space ring Usually use a hash function to encode IP address hash function is any function that can be used to map digital data of arbitrary size to digital data of fixed size node ID = hash (Peer IP address) Name of object (eg filename) also hashed and added to address space ring key = hash (Object name) 16
Storing Objects Two strategies Direct Object stored (on original peer) closest to key Indirect Peer keeps object, reference to object stored on another peer close to key Most common strategy. 17
Example 29.1 For Figure 29.3, assume several peers already joined Node N5 (IP address ) has file 'Liberty’ to share with peers Node makes hash of filename, 'Liberty' to get key = 14 Closest node to key 14 is node N17 N5 creates reference to filename (key), its IP address, and the port number etc, then sends reference to be stored in node N17 ie file stored in N5, key of file is k14 (a point in the DHT ring), but reference to file stored in node N17. 18
Figure 29.3: Example
Distributed Hash Table (DHT) Main function is to route a query to node responsible for storing reference to an object Different routing strategies are used by different systems All involve nodes that have partial knowledge of the ring to route queries to node closest to responsible nodes All implementations need to handle departures and arrivals of peers in their networks. 20
P2P Networks Three P2P protocols that use DHT Chord protocol Simple and elegant approach to routing queries Pastry protocol More complex than chord Kademila protocol Similar to Pastry, different distance measuring protocol. 21
Chord Published by Stoker in 2001 Used in several applications Collaborative File System (CFS) ConChord Distributive Domain Name System (DDNS). 22
Pastry Another popular protocol in the P2P paradigm Designed by Rowstron and Druschel in 2001 Uses DHT Some fundamental differences between Pastry and Chord in identifier space and routing process. 23
Pastry Used in some applications PAST Distributed file system SCRIBE Decentralised publish/subscribe system. 24
Kademlia Another DHT peer-to-peer network Designed by Maymounkov and Mazières in 2002 Similar to Pastry, routes messages based on the distance between nodes Address space based on a binary tree Interpretation of the distance metric uses bitwise XOR function to measure distances. 25
Kademlia 26
BitTorrent Designed by Bram Cohen (2001) for sharing large files among a set of peers Based on Kademlia Sharing different from other file-sharing protocols Instead of one peer allowing another peer to download the whole file, a group of peers take part in process to give all peers in the group a copy of file File sharing a collaborative process called a torrent. 27
BitTorrent with a Tracker Original BitTorrent Another entity in a torrent, called 'the tracker’ Central server tracks seeds and peers in swarm Seeds Peer with whole file Leeches Peer with part data (downloading more). 28
29 Figure 29.12: Example of a torrent
Trackerless BitTorrent Original BitTorrent design If tracker fails, new peers cannot connect to network and updating interrupted New implementations of BitTorrent eliminate need for centralised tracker. 30
End 31