Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany

Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany http://www.mpi-inf.mpg.de/ 2 Technical University of Crete, Chania, Greece http://www.intelligence.tuc.gr/ A Measure for Cluster Cohesion in Semantic Overlay Networks

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Outline  Motivation & Related work  Distributed resource sharing  iCluster architecture  Measuring clustering quality  Experimental evaluation  Conclusion 2 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Motivation & Related work 3 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Motivation  Resource sharing is at the core of today’s computing (Web, P2P, Grid)  Information retrieval functionality is needed  Overlay networks is a nice technology to built on  Measures are used for evaluating network organisation and retrieval efficiency 4 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Related Work  Semantic Overlay Networks  Initial approaches include: [KJ04], [SMZ03], [PMW07]  Based on the idea of small-world networks: [Smi04], [LLS04], [VSI06], DESENT  Concepts & measures quantifying network organisation  (generalised) Clustering coefficient: [WS98], [HAH07]  Extensions/modifications: [FHJS02], [BGW08], [RMJ07], [FH06] 5 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Distributed resource sharing 6 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Semantic overlay networks  Self-organising overlay networks  The idea: Peers that are semantically, thematically, or socially close (i.e., sharing similar interests or resources) are organised into groups. Queries are routed to the appropriate group.  Peers hold routing indices with links to other peers  Peers connected to each other are called neighbours  Support rich data models and expressive query languages 7 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Rewiring strategies  Techniques for self-organising peers:  abandon old connections and create new ones  periodic process  Inspired by the ‘small world effect’  reach anybody in a small number of routing hops 8 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete There are cliques and subgraphs that are characterised by connections between almost any two peers within them. Small-world networks  Peers are not neighbours of one another  Peers can be reached from every other peer by a small number of hops  Main characteristics: 1. small average shortest path length 2. high clustering coefficient Most pairs of peers will be connected by at least one short path. 9 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 iCluster architecture 10 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete iCluster basics  (i) intelligent + (Cluster) clustering = iClusterDL Contributions:  Architecture and protocols to support IR functionality  seamless and easy integration of peers, scalable  fast query processing  Self-organising peers based on SONs  support rich query models  benefits from loosely-connected peers 11 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete iCluster Protocols  Peer join/leave  Peer rewiring  Query processing  Document retrieval 12 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Peer rewiring A peer p 1.computes its intra-cluster similarity (average similarity with its neighbours) 2.initiates rewiring if similarity < threshold θ 3.sends a message (msg) with its interest to m neighbours  All peers receiving msg append their interest and forward msg to m neighbours  The message is sent back to p when TTL τ R = 0 13 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Query processing A peer p 1.compares q against its interests & selects the interest int most similar to q 2.if similarity ≥ threshold θ forwards a message (msg) including q to all its neighbours with TTL τ b 3.if similarity < threshold θ forwards msg to the m of its neighbours most similar to q  All peers receiving msg do the same process  The message is forwarded until TTL τ f = 0 14 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Measuring clustering quality 15 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Clustering coefficient  The ratio of links between the peers within p i ’s neighborhood with the number of links that could possibly exist between them pipi c i = 1/6c i = 1/2 pipi pipi c i = 1c i = 0 pipi  Takes values in the interval [0, 1]  if c i = 1, every peer connected to p i is also connected to every other peer within the neighborhood  If c i = 0, no peer that is connected to p i connects to any other peer connected to p i  Takes into account only the immediate neighbours of the peer  Takes high values when there are cliques  Loses the general view of the network 16 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Clustering efficiency  A new measure that  quantifies network organisation and  reflects retrieval effectiveness  Based on the network organisation and on the query processing protocols  Consider that a peer p i ’ s neighborhood consists of all peers by radius τ b around p i 17 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete  Takes values in the interval [0, 1]  if κ i = 1, the neighborhood of p i contains all peers similar to p i  If κ i = 0, the neighborhood of p i contains none peer similar to p i Clustering efficiency  The number of peers similar to p i that can be reached from p i within τ b hops divided by the total number of similar peers pipi c i = 0 κ i = 1 Gives information about the underlying network organisation involving more than just the immediate neighbors Looks at how the network is organised at a larger scale 18 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Experimental evaluation 19 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation  Used different parameters:  Data corpus  Similarity threshold  Query TTL  Forwarding strategies ParameterSymbolValue peersN2,000 short-range linkss8 long-range linksl4 similarity thresholdθ0.9 rewiring TTLτRτR 4 fixed forwarding TTLτfτf 6 broadcast TTLτbτb 2 message fanoutm2 OHSUMED TREC 30,000 medical articles 10 categories TREC-6 556,000 documents 100 categories the start of the rewiring is randomly chosen from the time interval [0, 4K] the periodicity is randomly selected from a normal distribution of 2K 20 of 25  Looked into the:  Network organisation  Recall The better the network organisation is, the better the performance of retrievals should be!  The experiments are intended to:  associate the performance of retrievals with the quality of network organisation  recommend the clustering measure that better represents this association

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation Clustering coefficient c i for different forwarding strategies 21 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation Clustering efficiency κ i for different forwarding strategies 22 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Experimental Evaluation Retrieval 23 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Outlook 24 of 25

Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008 Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete Conclusion  The idea  focus on IR on top of SON  look at how the network is organised at a large scale  Clustering efficiency  quantifies the underlying (dynamic) P2P structure  reflects retrieval effectiveness  The results indicate that clustering efficiency measure is better modeling network clustering quality compared to other existing measures 25 of 25

Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany

Similar presentations

Presentation on theme: "Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany

Similar presentations

Presentation on theme: "Paraskevi Raftopoulou 1,2 Paraskevi Raftopoulou 1,2 and Euripides G.M. Petrakis 2 1 Max-Planck Institute for Informatics, Saarbruecken, Germany"— Presentation transcript:

Similar presentations

About project

Feedback