Presentation is loading. Please wait.

Presentation is loading. Please wait.

“A Local Search Mechanism for Peer-to-Peer Networks”

Similar presentations


Presentation on theme: "“A Local Search Mechanism for Peer-to-Peer Networks”"— Presentation transcript:

1 “A Local Search Mechanism for Peer-to-Peer Networks”
Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University of California – Riverside) < > CIKM 2002 – Eleventh International Conference on Information and Knowledge Management November 4-9, Mclean VA

2 Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

3 Introduction to Peer-to-Peer
Peer-to-Peer Computing definition: “Sharing of computer resources and information through direct exchange” The physical topology The virtual P2P topology Clients (downloaders) are also servers Clients may join or leave the network at any time => highly fault-tolerant but with a cost! Searches are done within the virtual network while actual downloads are done offline (with HTTP). Cost = increased network activity, large scale, hetoregenous resources, geographically distributed

4 Introduction to Peer-to-Peer
Peer-to-Peer (P2P) systems are increasingly becoming popular. P2P file-sharing systems, such as Gnutella, Napster and Freenet realized a distributed infrastructure for sharing files. Traditionally, files were shared using the Client-Server model (e.g. http). Not scalable since they are centralized services. P2P uncover new advantages in simplicity of use, robustness, self organization and scalability. Advancements in networks and multimedia

5 Information Retrieval in P2P
Problem: “How to efficiently retrieve Information in P2P systems where each node shares a collection of documents?” keywords I.R: Bulit a system that retrieves documents that users are likely to find relevant Documents consists of keywords. Resembles Information Retrieval but resources are distributed now. Primary Data Structures such as Global Inverted Indexes can’t be maintained efficiently.

6 Solutions for P2P Information Retrieval
1) Centralized Approaches Centralized Indexes e.g. Napster, 2) Purely Distributed Approaches Each node has only local knowledge. I.R is done using Brute force mechanisms e.g. Gnutella, Fasttrack (Kazaa) 3) Hybrid Approaches One or more peers have partial indexes of the contents of others. e.g. Limewire's Ultrapeers 1) Upload Index Centralized Index 2) Query/QueryHit 3) Download (offline) 1 2 3 1) Connect 2) Query/QueryHit 3) Download (offline) 1,2 3 1) Connect 2) Intelligent Query/QueryHit 3) Download (offline) 1,2 3

7 Motivation On 1st June we crawled the Gnutella P2P Network for 5 hours with 17 workstations. We analyzed 15,153,524 query messages. Observation: High locality of specific queries. We try to exploit this property for more efficient searches?

8 Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

9 Techniques for Distributed I.R.
Breadth-First Search (Gnutella) Each Query Message is propagated along all outgoing links of a peer using TTL (time-to-live). TTL is decremented on each forward until it becomes 0 Technique for I.R in P2P systems such as Gnutella. Results? The physical network comes to its knees Long Delays for search results. P2P Network N A QUERY 1 QUERYHIT 2 Peer q Peer d

10 Techniques for Distributed I.R.
2. Modified Random BFS Each Query Message is forwarded to only a fraction of outgoing links (e.g. ½ of them). TTL is again decremented on each forward until it becomes 0. Results? Fewer Messages but possibly less results This algorithm is probabilistic. Some segments may become unreachable unreachable B A QUERY 1 Peer q QUERYHIT P2P Network N 2 C Peer d

11 Techniques for Distributed I.R.
3. Intelligent Search Mechanism (ISM) Idea: Each Query Message is forwarded intelligently based on what queries a peer answered in the past. Components of ISM (for each node u) Profile Mechanism, for each neighbor N(u). Peer Ranking Mechanism, for ranking peers locally and send a search query only to the ones that most likely will answer. Similarity Function, for finding similar search queries. Search Mechanism, for propagating queries based on local indexes A QUERY 1 Peer q profiles QUERYHIT 2 ? Peer d

12 Techniques for Distributed I.R.
3. Intelligent Search Mechanism (ISM) a) Profile mechanism. Maintains a list of past queries routed through that host. Every time a QueryHit is received the table is updated The profile manager uses a Least Recently Used policy to keep most recent queries in repository. Profiles are kept for neighbors only so the cost for maintaining this cost is O(Td), T is a limiting factor per profile, d is the degree of a node Size: T*d Query GUID Connection timestamp Elections Bush Clinton G439ID Socket1 Super Bowl San Diego F549QL --- *** Italy earthquake disaster PN329D Socket5 }

13 Techniques for Distributed I.R.
3. Intelligent Search Mechanism (ISM) b) Peer Ranking Mechanism. Before forwarding a Query Message a peer performs an on-the-fly ranking of its peers to determine the best paths. We use the Aggregate Similarity of peer Pi to a query q, computed by a peer Pk as: Example Assume host Pk needs to forward a query q=“italy disaster” to two of its peers {P1, P2, P3}. Pk maintains queries {q1 ,q2,. ,q5} in its profile. P2 { P1 P3 { Sim(q, q1) = 0.8 Sim(q, q2) = 0.6 Sim(q, q3) = 0.5 Sim(q, q4) = 0.4 Sim(q, q5) = 0.4 } => PsimP(P2, q) = = 1.1 } => PsimP(P3, q) = = 0.7 => PsimP(P1, q) = 0.81 = 0.8 a is a constant which allows us to ass some weight to the most similar queries

14 Techniques for Distributed I.R.
3. Intelligent Search Mechanism (ISM) c) Similarity Function – The cosine similarity. Assume that L is a set of all words (in Profile Manager)\ e.g. L={elections, bush, clinton, super, bowl, san, diego, … ,italy, earthquake, disaster} We define an |L|-dimensional space where each query is a vector. If q=“italy disaster” => q (vector of q) = [0,0,0,…,1,0,1] Recall that we have a vector for each qi stored in the Profile Manager ( i.e. qi )

15 Techniques for Distributed I.R.
3. Intelligent Search Mechanism (ISM) d) Search Mechanism Utilizes the Peer Ranking Mechanism to forward Queries to nodes that will potentially contain the info we are looking for Peer d profiles ? QUERY 1 Peer q ?

16 Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

17 Experimental Evaluation
We use a decentralized Newspaper application built on top of the REUTERS dataset (22,531 documents grouped by 84 countries). Random Network of 100 peers Each peer has documents from 3 countries The average degree of a node is 7 ~= log2100 (connected graph) Data-Peer (e.g. usa) PDOM-XML Manager P2P Network Module XML Data Files XQL u.k germany france Routing Structures (Profiles) mexico argentina china india greece italy usa.graph

18 Experimental Evaluation
We perform 400 sequential queries with a delay of 4 sec. We compare Doc. Ratio (recall rate) vs. Num. of messages BFS (Gnutella Message Flooding) (forward to degree nodes). Modified BFS (randomly forward to degree/2 nodes). Intelligent Search Mechanism (forward to M=3 highest rank nodes + 1 random).

19 Experimental Evaluation
We measure Doc. Ratio (recall rate) vs. Num. of messages with Time-to-Live (TTL)=4 BFS (Gnutella) uses ~763 messages w/ recall rate 100% Random BFS(degree/2) uses ~120 (16%) msgs w/ recall rate 42% Intelligent Search uses ~131 (17%) msgs w/ recall rate ~55% Recall Rate improves over time with Intelligent Search since Peer Profiles get more knowledge.

20 Experimental Evaluation
We again measure Doc. Ratio (recall rate) vs. Num. of messages by increasing Time-to-Live (TTL) = 5 BFS (TTL=4) uses ~763 messages w/ recall rate 100% Random BFS(degree/2) uses ~28% msgs w/ recall rate ~72% Intelligent Search uses ~35% (of BFS msgs) w/ recall rate ~90% ! A large number of peers receive unnecessary messages. We get almost identical recall (90%) with only 35% of msgs

21 Presentation Outline Introduction: Information Retrieval (I.R) in Peer-to-Peer networks. Techniques for Distributed I.R. Breadth-First Search. Random Breadth-First Search. Intelligent Search with profiling. Experimental Evaluation. Related Work. Conclusions & Future Work.

22 Related Work Improving Search in P2P B.Yang et al. (Stanford)
Iterative Deepening, until Z results are returned Directed BFS based on aggregate statistics (e.g. num of results a peer returned, shortest queue, forwarded the most data) Local Indexes, each node maintains an index over the data of peers r hops away. Routing Indices for P2P Crespo et al. (Stanford) Compound Indices, each node sends a clustered summary of its topic to its neighbors. (e.g. 100 databases, 4 theory, 10 OS) Might be too costly for Highly dynamic P2P systems. Compound Indices => costly for highly dynamic systems.

23 Related Work Freenet (Clark et al.) Search by Identifiers.
uses SHA1 hashes of resources and information is retrieved based on the key closeness in a DFS manner. Others such as Chord. Systems that focus on scalable object location, which becomes feasible by hashing and distributing objects in the P2P system. (Searches are by Identifier). Chord -> But we are searching using keywords

24 Conclusions P2P systems offer several advantages such as scalability, robustness and simplicity of use. Efficient P2P Information Retrieval is not feasible with the current Search Algorithms. We propose an Intelligent Search Mechanism that uses local knowledge to improve Information Retrieval in P2P. Our mechanism achieves 90% recall rate while using only 35% of the initial messaging.

25 Future Work We plan to deploy our middleware infrastructure on a larger P2P network with more Queries. We want to probe different Network Topologies such as ASMap with PowerLaws. We want to probe different Peer-Profile maintenance policies at peers. Compare the performance of our method with different proposed algorithms (iterative deepening, local indexes, etc).

26 “A Local Search Mechanism for Peer-to-Peer Networks”
Vana Kalogeraki, Dimitrios Gunopulos & Demetris Zeinalipour (University of California – Riverside) < > CIKM 2002 – Eleventh International Conference on Information and Knowledge Management November 4-9, Mclean VA


Download ppt "“A Local Search Mechanism for Peer-to-Peer Networks”"

Similar presentations


Ads by Google