Download presentation
Presentation is loading. Please wait.
Published byJuniper Wilkinson Modified over 9 years ago
1
Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany http://www.mpi-inf.mpg.de/ 2 Technical University of Crete, Chania, Greece http://www.intelligence.tuc.gr/ A Measure for Cluster Cohesion in Semantic Overlay Networks
2
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Outline Motivation & Related work Distributed resource sharing iCluster architecture Measuring clustering quality Experimental evaluation Conclusion 2 of 25
3
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Motivation & Related work 3 of 25
4
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Motivation Resource sharing is at the core of today’s computing (Web, P2P, Grid) Information retrieval functionality is needed Overlay networks is a nice technology to built on Measures are used for evaluating network organisation and retrieval efficiency 4 of 25
5
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Related Work Semantic Overlay Networks Initial approaches include: [KJ04], [SMZ03], [PMW07] Based on the idea of small-world networks: [Smi04], [LLS04], [VSI06], DESENT Concepts & measures quantifying network organisation (generalised) Clustering coefficient: [WS98], [HAH07] Extensions/modifications: [FHJS02], [BGW08], [RMJ07], [FH06] 5 of 25
6
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Distributed resource sharing 6 of 25
7
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Semantic overlay networks Self-organising overlay networks The idea: Peers that are semantically, thematically, or socially close (i.e., sharing similar interests or resources) are organised into groups. Queries are routed to the appropriate group. Peers hold routing indices with links to other peers Peers connected to each other are called neighbours Support rich data models and expressive query languages 7 of 25
8
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Rewiring strategies Techniques for self-organising peers: abandon old connections and create new ones periodic process Inspired by the ‘small world effect’ reach anybody in a small number of routing hops 8 of 25
9
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete There are cliques and subgraphs that are characterised by connections between almost any two peers within them. Small-world networks Peers are not neighbours of one another Peers can be reached from every other peer by a small number of hops Main characteristics: 1. small average shortest path length 2. high clustering coefficient Most pairs of peers will be connected by at least one short path. 9 of 25
10
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 iCluster architecture 10 of 25
11
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete iCluster basics (i) intelligent + (Cluster) clustering = iClusterDL Contributions: Architecture and protocols to support IR functionality seamless and easy integration of peers, scalable fast query processing Self-organising peers based on SONs support rich query models benefits from loosely-connected peers 11 of 25
12
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete iCluster Protocols Peer join/leave Peer rewiring Query processing Document retrieval 12 of 25
13
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Peer rewiring A peer p 1.computes its intra-cluster similarity (average similarity with its neighbours) 2.initiates rewiring if similarity < threshold θ 3.sends a message (msg) with its interest to m neighbours All peers receiving msg append their interest and forward msg to m neighbours The message is sent back to p when TTL τ R = 0 13 of 25
14
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Query processing A peer p 1.compares q against its interests & selects the interest int most similar to q 2.if similarity ≥ threshold θ forwards a message (msg) including q to all its neighbours with TTL τ b 3.if similarity < threshold θ forwards msg to the m of its neighbours most similar to q All peers receiving msg do the same process The message is forwarded until TTL τ f = 0 14 of 25
15
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Measuring clustering quality 15 of 25
16
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Clustering coefficient The ratio of links between the peers within p i ’s neighborhood with the number of links that could possibly exist between them pipi c i = 1/6c i = 1/2 pipi pipi c i = 1c i = 0 pipi Takes values in the interval [0, 1] if c i = 1, every peer connected to p i is also connected to every other peer within the neighborhood If c i = 0, no peer that is connected to p i connects to any other peer connected to p i Takes into account only the immediate neighbours of the peer Takes high values when there are cliques Loses the general view of the network 16 of 25
17
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Clustering efficiency A new measure that quantifies network organisation and reflects retrieval effectiveness Based on the network organisation and on the query processing protocols Consider that a peer p i ’ s neighborhood consists of all peers by radius τ b around p i 17 of 25
18
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Takes values in the interval [0, 1] if κ i = 1, the neighborhood of p i contains all peers similar to p i If κ i = 0, the neighborhood of p i contains none peer similar to p i Clustering efficiency The number of peers similar to p i that can be reached from p i within τ b hops divided by the total number of similar peers pipi c i = 0 κ i = 1 Gives information about the underlying network organisation involving more than just the immediate neighbors Looks at how the network is organised at a larger scale 18 of 25
19
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Experimental evaluation 19 of 25
20
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation Used different parameters: Data corpus Similarity threshold Query TTL Forwarding strategies ParameterSymbolValue peersN2,000 short-range linkss8 long-range linksl4 similarity thresholdθ0.9 rewiring TTLτRτR 4 fixed forwarding TTLτfτf 6 broadcast TTLτbτb 2 message fanoutm2 OHSUMED TREC 30,000 medical articles 10 categories TREC-6 556,000 documents 100 categories the start of the rewiring is randomly chosen from the time interval [0, 4K] the periodicity is randomly selected from a normal distribution of 2K 20 of 25 Looked into the: Network organisation Recall The better the network organisation is, the better the performance of retrievals should be! The experiments are intended to: associate the performance of retrievals with the quality of network organisation recommend the clustering measure that better represents this association
21
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation Clustering coefficient c i for different forwarding strategies 21 of 25
22
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation Clustering efficiency κ i for different forwarding strategies 22 of 25
23
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation Retrieval 23 of 25
24
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Outlook 24 of 25
25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Conclusion The idea focus on IR on top of SON look at how the network is organised at a large scale Clustering efficiency quantifies the underlying (dynamic) P2P structure reflects retrieval effectiveness The results indicate that clustering efficiency measure is better modeling network clustering quality compared to other existing measures 25 of 25
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.