Relative Validity Criteria for Community Mining Evaluation

Slides:



Advertisements
Similar presentations
Social network partition Presenter: Xiaofei Cao Partick Berg.
Advertisements

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Analysis and Modeling of Social Networks Foudalis Ilias.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Measuring Scholarly Communication on the Web Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Bibliometric Analysis.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Topologically biased random walks with application for community finding Vinko Zlatić Dep. Of Physics, “Sapienza”, Roma, Italia Theoretical Physics Division,
Network Motifs Zach Saul CS 289 Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al.
A scalable multilevel algorithm for community structure detection
Cluster Validation.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.
Community Detection by Modularity Optimization Jooyoung Lee
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Community detection algorithms: a comparative analysis Santo Fortunato.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Discovering Community Structure
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison February 2, 2010 Acknowledgments:
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Networks Igor Segota Statistical physics presentation.
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Complex Network Theory – An Introduction Niloy Ganguly.
Network Community Behavior to Infer Human Activities.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
Topical Scientific Community —A combined perspective of topic and topology Jin Mao Postdoc, School of Information, University of Arizona Sept 4, 2015.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Community detection via random walk Draft slides.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
Clustering CSC 600: Data Mining Class 21.
Sofus A. Macskassy Fetch Technologies
Community detection in graphs
Distributed Representations of Subgraphs
CSE 4705 Artificial Intelligence
Overcoming Resolution Limits in MDL Community Detection
Anastasia Baryshnikova  Cell Systems 
Affiliation Network Models of Clusters in Networks
Presentation transcript:

Relative Validity Criteria for Community Mining Evaluation Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo J. G. B. Campello Department of Computing Science, University of Alberta, Edmonton, Canada ASONAM 2012 Aug 2012

Motivation Applications in different domains; sociology, criminology Module identification in Biological Networks Clusters in Protein-Protein Interaction Networks Protein complexes and parts of pathways; Clusters in a protein similarity network protein families. (R Guimerà et al., Functional cartography of complex metabolic networks, Nature 433, 2005) Prerequisite of further analysis; Targeted advertising, link prediction, recommendation Social Networks: personalized news feed, easier privacy settings Gmail's "Don't Forget Bob!" and "Got the Wrong Bob?" features (M Roth et al., Suggesting Friends Using the Implicit Social Graph, KDD 2010) Citation network of scholars Paper and collaborator recommendation, Network visualization and Navigation; e.g. CiteULike, Arnet Miner and Microsoft Academic Hyperlinks between web pages - WWW Detecting Group of closely related topics to refined search results (J Chen et al., An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities. Web Intelligence 2008) 1

Community Loosely defined as groups of nodes that have relatively more links between themselves than to the rest of the network Nodes that have structural similarity (SCAN, Xu et al. 2007) Nodes that are connected with cliques (CFinder by Palla et al. 2005) Nodes that a random walk is likely to trap within them (MCL by Dongen, Walktrap by Pons and Latapy) Nodes that follow the same leader (TopLeaders, 2010) Nodes that make the graph compress efficiently (Infomap, Infomod, Rosvall and Bergstrom, 2011) Nodes that are separated from the rest by min cut, conductance (flow based methods, e.g. Kernighan-Lin (KL), betweenness of Newman) Nodes that number of links between them is more than chance (Newman's Q modularity, FastModularity, Blondel et al.’s Louvain) 2

Evaluation; overlooked Internal Evaluation Predefined quality/structure for the communities Graph partitioning measures (density, conductance) External Evaluation Agreement between the results and a given known ground-truth A clustering similarity/agreement indexes; Rand Index, Jaccard Benchmarks with ground truth; GN(2002), LFR(2008) The community structure is not known beforehand No ground truth No large data set with known ground truth The synthetic benchmarks disagree with some real network characteristics Karate GN LFR 3

Relative Validity Criteria Validity criteria defined for clustering evaluation; compares different clusterings of a same data set We altered criteria Generalized distance; graph distance measures Generalized mean/centroid notion; averaging v.s. medoid e.g. Variance Ratio Criterion (VRC) Same for: Dunn index, Silhouette Width Criterion (SWC), Alternative Silhouette, PBM, C-Index, Z-Statistics, Point-Biserial (PB) Distance Alternatives: Edge Path (ED), Shortest Path Distance (SPD), Adjacency Relation Distance (ARD), Neighbour Overlap Distance (NOD), Pearson Correlation Distance (PCD), ICloseness Distance (ICD) 4

Correlation with External Index Correlation of relative criteria and external scores on different clusterings of same data set random clusterings that range from very close to very far from ground truth For karate; 5

Correlation with External Index Correlation of relative criteria and external scores on different clusterings of same data set random clusterings that range from very close to very far from ground truth For karate; 5

Ranking of Criteria on Real World Benchmarks Difficulty Analysis Data set statistics Overall Ranking 6

Ranking of Criteria on Synthetic Benchmarks Ranking for well separated communities Data set statistics Overall ranking for very mixed communities 7

Ranking varies Criteria Ranking is affected by: Choice of benchmarks, synthetic generator and its parameters Choice of External agreement Index; ARI, NMI, AMI, Jacard Choice of correlation measure; Pearson & Spearman correlation Choice of clustering randomization Get the ranking in your setting www.cs.ualberta.ca/~rabbanyk/criteriaComparison 8

Future Works Evaluation Issues Community mining specific agreement measure Realistic synthetic benchmarks Extensions of criteria Incorporating attributes; combine clustering and community mining for cases for which we have both attributes and relations Incorporating uncertainty and edges with probability ... 9

End Questions? 10

Alternative Distances Edge Path (ED), Shortest Path Distance (SPD), Adjacency Relation Distance (ARD), Neighbour Overlap Distance (NOD), Pearson Correlation Distance (PCD), ICloseness Distance (ICD) A

Relative criteria Variance Ratio Criterion (VRC) Dunn index, Silhouette Width Criterion (SWC), Alternative Silhouette, PBM, Davies-Bouldin C-Index, Point-Biserial (PB) B