PeerCluster: A Cluster-Based Peer-to-Peer System Xin-Mao Huang, Cheng-Yue Chang, and Ming-Syan Chen, Fellow, IEEE ECE 6102 Qiyu Liu Ethan Trewhitt

2 Agenda Background Structure Functional Protocols Structural Protocols
Scaling Performance

3 Background – Existing P2P Systems
Centralized system - Napster Pro: Low cost to resolve queries Cons: Single point of failure Decentralized/unstructured - Gnutella Pro: Fault-tolerant, resilient to join/leaves Cons: Search mechanism scales poorly Decentralized/structured - PeerCluster Same benefits of decentralized/unstructured Cluster structure reduces broadcast flooding

4 Background – PeerCluster
Principle of interest grouping A given user has few interests Queries relate to interests How to exploit? Logically group users with similar topics Increases query efficiency

5 Background – PeerCluster

6 Background – Query Resolution
A node receives a query if (query topic = present cluster’s interest topic) { broadcast to all nodes in present cluster // intracluster broadcasting } else route to responsible node in corresponding interest cluster // intercluster broadcasting Intra/intercluster broadcasting are main operations in query resolution How to implement?

7 Structure – Hypercube Three interests can be implemented with 5-D hypercube Nodes & edges are virtual One hypercube address  one computer However, one computer  multiple hypercube addresses

8 Structure – Clusters Interest-based
Realized with hypercubes within the overall system hypercube Initial size based on popularity, Huffman coding

9 Structure – Tree Creation
Assume n-dimensional hypercube with k different interest topics Ij: jth interest topic where 0 ≤ j ≤ k - 1 pop[Ij]: popularity of Ij 0 < pop[Ij] < 1 and Construct Huffman tree based on pop[Ij] Cluster size = 2n-length(prefix[Ij])

10 Structure – Routing Table
Routing table created for each computer Must keep track of mapping of neighboring computers to send messages addr(A): addresses owned by computer A NH(A): neighboring hypercube addresses =Uai Є addr(A) Ne(ai) – addr(A) where Ne(ai) is set of hypercube addresses adjacent to address ai

11 Structure – Assigned Tree
Assigned tree records number of free addresses in every cluster Root address is lowest address Parent and child address differ by 1 bit only Child address is longer than parent address Present address manages assignment of child address Every address records number of free addresses of all its children. Initial number of free addresses of children = total number of subtrees When parent address wants to assign free address to joining request, checks number of free addresses starting from lowest address

12 Functional Protocol – Broadcast
Proc_Broadcast(subq, msg, node_addr, step) for (i = step to subq – 1) { dest_addr = node_addr xor 2i; send(subq, msg, dest_addr, i++); }

13 Functional Protocol – Route
Proc_Route(msg, dest_addr, node_addr) if (dest_addr != node_addr) { i = Compare(dest_addr, node_addr); send(msg, dest_addr, node_addr xor 2i); }

14 JOIN Protocol Joining computer A finds any computer B in the system
Ask computer B to find computer C with the same major interest Ask computer C to find computer D that holds an available alias address* Take the available address and notify neighbors Computer D notifies its parent nodes of one less available address *if there are no available addresses, a cluster expansion must be performed

15 LEAVE Protocol Leaving computer A finds the root node B (smallest address) of the cluster Donate address (and aliases) to computer at B Computer B notifies its neighbors that A has left

16 SEARCH Protocol Searching computer A wants to find something
Query computer B in the corresponding interest cluster who has the same postfix Computer B broadcasts query to its cluster Computers in the queried cluster respond directly to A with relevant results

17 Cluster Expansion Runs whenever a computer wants to join but the cluster is full Query the utilization rates of neighboring clusters Choose a neighboring cluster The neighboring cluster splits and loans the upper half of its addresses Upper-half addresses rejoin at the lower half

18 Cluster Expansion Issues
Expansion and splitting cause partitions Clusters are no longer a single hypercube System restoration consolidates clusters If the cluster can’t be expanded or the system is full, the system must be expanded

19 System Expansion Easier than cluster expansion
Addresses gain an additional bit, entire system doubles in size Each node becomes two Each cluster doubles in size

20 Performance Setup Uses data from the Open Directory Project
Compares Gnutella and PeerCluster Determined the “query efficiency”, which is the ratio of files found to query messages sent Varied the Search Limit (SL), which acts like a TTL value Also varied the number of interest clusters Base 4 vs. base 2

21 Performance

22 Questions?

