Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe
Goals Three search techniques: 1. Iterative Deepening 2. Directed BFS 3. Local Indices Evaluation and extensive measurements of these techniques on the Gnutella network. Ready-to-use results and recommendations. Basically - just trying to reduce nodes that handle a query.
Current Techniques Gnutella –Breadth First Search (BFS) with depth limit D (typically 7). Disadvantages Wastage of resources Inefficient Freenet: Depth First Search (DFS) Disadvantages Poor Response Time
Iterative Deepening Required System Wide policy P={a,b,c} Time between successive iterations W. S P = {a,b,c} 1a FreezeFreeze Wait = W Resend [(TTL a) + query_id] … (TTL b-a) b
Directed BFS Send queries to a subset of nodes Subset nodes selected by heuristics like : Select node … That has highest number of results for provided queries Whose response messages have taken lowest avg number of hops. Who has forwarded most messages to our client Who has the shortest messages queue
Local Indices Each node n maintains an index of data for nodes within r hops So a node can process a query on behalf of every node within r hops small r = less storage. (e.g. for r(1)=70KB) S 1 process P= {1,5}
More work Node Join Sends join message with TTL of r, containing metadata over its collection A node receiving a join messages sends a return join message with its metadata Periodic refreshes Cost ?? QueryJoinRatio = Average ratio of queries to join messages QueryUpdateRatio = Average ratio of queries to update messages
Experiment Data Collection Observed Gnutella network traffic for 1 month Determined some general statistics like average number of files shared /user, query strings etc. Iterative Deepening For each query Q sent: log response message arriving in 2min. Ping messages to all neighbors: hops and IP addr. Same data used for Local Indices Directed BFS Same as above, but each query sent to single node.
Cost Bandwidth Cost in BFS: Processing Cost Nodes at depth N Redundant edges between n-1 and n Size of query message Total Records Response messages from nodes n Size of header Size of Record
Results Iterative Deepening Neighbors = 8 Desired number of results Z=50 Policies P={P d = {d, d+1, … D} for d=1,2,3..D} d = cost W = cost “ overshooting” W = time d = time COST
Directed BFS Studied 8 heuristics ‘Random neighbor’ is baseline for comparison COST
Local Indices
Conclusions Three new search systems specified and tested. Recommend: Local Indices with r=1. Savings: 61% bandwidth 49% processing