INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington
Overview What is a Peer to Peer Network? Centralized – Napster Decentralized –Unstructured Blind Informed –Structured Dynamic Hash Tables Benefits of Peer to Peer Networks
Peer to Peer Networks Decentralized and distributed system Nodes are equivalent (Peers) Data could be at ANY node on the network Nodes leave and join the network Network is resilient Avoid dependence on central resources Node Internet Node
Centralized Network Napster model Nodes register their contents with server Centralized server for searches File access done on a peer to peer basis –Poor scalability –Single point of failure Client Server Client Query Reply File Transfer
Unstructured Blind - Gnutella = forward query = processed query = source = found result = forward response Breadth-First Search (BFS)
Unstructured Blind - Gnutella A node/peer connects to a set of Gnutella neighbors Forward queries to neighbors Client which has the Information responds. Flood network with TTL for termination + Results are complete – Bandwidth wastage
Random Walkers Improved Unstructured Blind Similar structure to Gnutella Forward the query (called walker) to random subset of it neighbors + Reduced bandwidth requirements – Incomplete results Peer nodes
Unstructured Informed Networks Zero in on target based on information about the query and the neighbors. Intelligent routing +Reduces number of messages +Not complete, but more accurate –COST: Must thus flood in order to get initial information
Informed Searches: Local Indices Node keeps track of information available within a radius of r hops around it. Queries are made to neighbors just beyond the r radius. +Flooding limited to bounded part of network
Routing Indices For each query, calculate goodness of each neighbor. Calculating goodness: –Categorize or separate query into themes –Rank best neighbors for a given theme based on number of matching documents Follows chain of neighbors that are expected to yield the best results Backtracking possible
Bloom Filters Bloom filter is a bit pattern (Hash, etc) Contains the likelihood of a match Can determine the degree of similarity Also known as Lossy Distributed Index Attenuated Bloom Filters –Maintain downstream bloom filters for each neighbor –Reduce weight of distant nodes when choosing neighbors
Bloom Filters CONTD N neighbors Hash value Requesting Node
Structured P2P Networks Self-organizing Load balanced and Resilient Fault-tolerant Guarantees on numbers of hops to answer a query Based on a Distributed Hash Table (DHT)
Properties of DHT Keys mapped evenly to all nodes in the network Each node maintains information about only a few other nodes Efficient routing of messages to nodes Node insertion/deletion only affects a few nodes
Chord Chord provides operations: –P2P hash lookup must give: Lookup(key) IP address –Uses Hash function: Key identifier = SHA-1 (key) Node identifier = SHA-1 (IP add) –Both are uniformly distributed –Both exist in the same ID space How to map key IDs to node IDs? –A key is stored at its successor: node with next higher ID (modulo N) N10 N1 K0 K7 K4 Circular ID space K11
Chord continued….. Cont….. CHORDCont….. CHORD startIntervalSucc 100[100,101) [101,103)5 103[103,107)5 107[107,115)5 115[115,3)5 3[3,35)5 35[35,100)60 N32 N10 N5 N20 N110 N99 N80 N60 K19 ……… 9[9,13)10 13[13,21)20 Lookup (K19 )
Analysis of Chord In a system with N nodes and K keys: Each node manages at most K/N keys Bound information stored in every node Lookups resolved with O(logN) hops
Benefits of P2P Networks Ideally: –Allows peers anywhere to share information and/or resources dynamically –Decentralized –Resilient to failures and network changes –Utilizes resources located closer to requesting nodes
References N.ppt eP2P.ppt ntro.ppt L. Singh, Z. Joseph: Search Algorithms in Peer to Peer Networks (CSE5311 Fall ‘05)