Download presentation
Presentation is loading. Please wait.
Published byStanley Daniels Modified over 9 years ago
1
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos Presented By Ranjan Dash
2
IR Techniques For P2P Networks2 Layout Introduction P2P Network IR Techniques PeerWare Infrastructure and experiments
3
IR Techniques For P2P Networks3 Introduction Major challenge efficiently search the content of other peers Definition Large number of peers collaborate dynamically in an ad hoc manner and share information in large-scale distributed environments without centralized co-ordination P2P environment characteristic Each peer has a database or collection of docs Query contains set of key words Reply message contains pointers to matching documents Different from static data environments No central repository Nodes join and leave in ad hoc and dynamically
4
IR Techniques For P2P Networks4 P2P Network IR Techniques Breadth-First Search (BFS) Random Breadth-First-Search (RBFS) Intelligent Search Mechanism (ISM) Directed BFS and >RES Random Walker Searches Randomized Gossiping Local Routing Indices Centralized Approaches Searching Object Identifiers Distributed IR
5
IR Techniques For P2P Networks5 P2P Network IR Techniques Breadth-First Search (BFS) Widely used in file-sharing systems Propagates to all neighbors except sender QueryHit Msg (#of docs, bandwidth info) follows the same path Simple, guarantees high hit rate Poor in performance and network utilization Low bandwidth node - a bottleneck Can be improved using TTL
6
IR Techniques For P2P Networks6 P2P Network IR Techniques Random Breadth-First Search (RBFS) Dramatic improvements over BFS Forwards only to a fraction of its peers, selected at random Does not need global knowledge, takes local decisions - faster Probabilistic – might not reach some large network segments
7
IR Techniques For P2P Networks7 P2P Network IR Techniques Intelligent Search Mechanism (ISM) Quick, efficient and least communication costs Propagates only to peers more likely to reply Consists of 2 components that run in each peer Profile mechanism Relevance rank Works good for query locality Forwards to same neighbor always -Starvation for new peers Solution – add small random subset of peers to most relevant set
8
IR Techniques For P2P Networks8 P2P Network IR Techniques Profile mechanism Builds a profile for each of its neighboring peers Maintains T most recent Queries and QueryHits with no of results Least recently used replacement policy for most recent query
9
IR Techniques For P2P Networks9 P2P Network IR Techniques Relevance rank Ranking of neighbors to decide which ones to forward a query Ranking of a peer ‘Pi’ for a query ‘q’ Qsim is cosine similarity between 2 queries = 0, most results in the past that matters like >RES
10
IR Techniques For P2P Networks10 P2P Network IR Techniques Directed BFS and >RES forwards a query to a subset of its peers based on some aggregated statistics Send out to ‘k’ peers which had returned the most results for the last ‘m’ queries BFS turned into a DFS for ‘k’ = 1, ‘m’=10 Similar to ISM, but simpler Does not explore nodes that contain content related to query Performs well because it routes larger networks segments
11
IR Techniques For P2P Networks11 P2P Network IR Techniques Random-Walker Searches Each node randomly forwards a query message, called a walker to one of its peers Can be extended from 1-walker to k-walker Resembles RBFS but message numbers increase linearly Like RBFS does not use most relevant content to guide query Adaptive Probability search (APS) – similar Uses feed back from previous searches to probabilistically guide future walkers
12
IR Techniques For P2P Networks12 Randomized Gossiping – PlanetP Global inverted index, partially constructed by each node, called local index bloom filter Propagates it to the rest through gossiping Adv. Of bloom filter – Smaller messages Saving in network I/O Problem of scalability for PlanetP P2P Network IR Techniques
13
IR Techniques For P2P Networks13 Local Routing Indices by Arturo Crespo and Hector Garcia-Molina Hybrid technique uses local indices containing the “direction” toward the documents 3 techniques – compound routing indices (CRI) hop-count routing index (HRI) exponentially aggregated index (ERI) Good for topologies where only few nodes have very large numbers of neighbors - (tree, tree with cycles) The routing indices are similar to the routing tables deployed in the Bellman–Ford CRI - a node q maintains statistics for each neighbor that indicate how many documents are reachable through each neighbor. HRI - CRI for k hops – prohibitive storage cost for large k. ERI - addresses the issue of HRI by aggregating HRI using a cost formula. P2P Network IR Techniques
14
IR Techniques For P2P Networks14 Centralized Approaches maintain an inverted index over all the documents in the participating hosts’ collections - Google, Yahoo, Napster Each joining peer A uploads an index of all its shared documents to the central repository R. A querying node B searches A’s documents through R. B can communicate with A directly (using an out-of-band protocol such as HTTP). Kazaa - Little different. Uses a set of more-powerful peers that acts as a central repositories different kind of animal than the rest. Simple, Robust, shorter search time, guaranteed to find all results P2P Network IR Techniques
15
IR Techniques For P2P Networks15 Searching Object Identifiers Distributed file indexing systems - Chord, OceanStore, and Content –Addressable Network (CAN), Freenet efficient searches using object identifiers (a hashcode on the name of a file) rather than keywords. Perform object lookup operations to get the address (an IP address) of the node that is storing the object. Optimizes object retrieval by minimizing the numbers of messages and hops required. Disadvantage - only search for object identifiers and thus can’t capture the relevance of the doc. P2P Network IR Techniques
16
IR Techniques For P2P Networks16 Distributed IR Having distributed databases, the main IR problem is deciding which databases are most likely to contain the most relevant documents. It’s possible to achieve good results for conceptually separated collections. However, the assumption is that the querying party has some statistical knowledge about each database’s contents (word frequencies in documents) and therefore must have a global view of the system. P2P Network IR Techniques
17
IR Techniques For P2P Networks17 PeerWare Infrastructure and experiments Evaluation metrics – recall rate – the fraction of documents each of the search mechanisms retrieves Efficiency - the number of messages needed to find the results Implemented only algorithms that require local knowledge when searching for documents. BFS (the baseline) Implemented RBFS, >RES (k = 0.5 * d and m = 100, where d is the degree of a node), and ISM these 3 techniques forward query messages to half the neighbors that BFS contacts. >RES and ISM use previous knowledge to decide on which peers to forward the query
18
IR Techniques For P2P Networks18 BFS requires almost 2.5 times as many messages as its competitors. PeerWare Infrastructure and experiments
19
IR Techniques For P2P Networks19 PeerWare Infrastructure and experiments ISM found the most documents. ISM achieved almost a 90-percent recall rate while using only 38 percent of the messages BFS required. ISM improves its knowledge over time. Both >RES and ISM started out with a low recall rate (around 40 to 50 percent) because initially they randomly choose their neighbors.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.