ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.

Slides:



Advertisements
Similar presentations
Mining User Similarity Based on Location History Yu Zheng, Quannan Li, Xing Xie Microsoft Research Asia.
Advertisements

Multi-label Relational Neighbor Classification using Social Context Features Xi Wang and Gita Sukthankar Department of EECS University of Central Florida.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Fast Algorithms For Hierarchical Range Histogram Constructions
Spectral graph reduction for image and streaming video segmentation Fabio Galasso 1 Margret Keuper 2 Thomas Brox 2 Bernt Schiele 1 1 Max Planck Institute.
Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Community Detection and Evaluation
Xiaowei Ying Xintao Wu Univ. of North Carolina at Charlotte 2009 SIAM Conference on Data Mining, May 1, Sparks, Nevada Graph Generation with Prescribed.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Imperial College LondonFebruary 2007 Bubble Rap: Forwarding in Small World DTNs in Ever Decreasing Circles Part 2 - People Are the Network Jon Crowcroft.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University.
Models of Influence in Online Social Networks
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Department of Engineering Science Department of Zoology Soft partitioning in networks via Bayesian nonnegative matrix factorization Ioannis Psorakis, Steve.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
Hao-Shang Ma and Jen-Wei Huang Knowledge and Information Discovery Lab, Dept. of Electrical Engineering, National Cheng Kung University The 7th Workshop.
Community Detection by Modularity Optimization Jooyoung Lee
Community detection algorithms: a comparative analysis Santo Fortunato.
On Finding Fine-Granularity User Communities by Profile Decomposition Seulki Lee, Minsam Ko, Keejun Han, Jae-Gil Lee Department of Knowledge Service Engineering.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte 2008 SIAM Conference on Data Mining, April 25 th Atlanta, Georgia.
A Local Seed Selection Algorithm for Overlapping Community Detection 1 A Local Seed Selection Algorithm for Overlapping Community Detection Farnaz Moradi,
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Relative Validity Criteria for Community Mining Evaluation
Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions.
Overlapping Communities for Identifying Misbehavior in Network Communications 1 Overlapping Communities for Identifying Misbehavior in Network Communications.
SpeakEasy: Algorithm for Robust Community Detection
Network Community Behavior to Infer Human Activities.
Measuring Behavioral Trust in Social Networks
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Trajectory Simplification: On Minimizing the Direction-based Error
CS 590 Term Project Epidemic model on Facebook
Overlapping Community Detection in Networks
Unsupervised Streaming Feature Selection in Social Media
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
MEIKE: Influence-based Communities in Networks
Jon Crowcroft Pan Hui Computer Laboratory University of Cambridge
DEMON A Local-first Discovery Method For Overlapping Communities
BlackHole: Robust Community Detection Inspired by Graph Drawing
Community detection in graphs
Friend Recommendation with a Target User in Social Networking Services
Learning with information of features
Discovering Functional Communities in Social Media
Approximating the Community Structure of the Long Tail
Jiawei Han Department of Computer Science
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Affiliation Network Models of Clusters in Networks
Label propagation algorithm
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and Jae-Gil Lee † † Dept. of Knowledge Service Engineering, KAIST ‡ Samsung Advanced Institute of Technology § Graduate School of Cultural Technology, KAIST ¶ Dept. of Electrical and Computer Engineering, SNU

Contents Motivation Link-Space Transformation Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

Clusters are NOT overlapped Community Detection Network communities Sets of nodes where the nodes in the same set are similar (more internal links) and the nodes in different sets are dissimilar (less external links) Communities, clusters, modules, groups, etc. Non-overlapping community detection Finding a good partition of nodes Clusters are NOT overlapped

Overlapping Community Detection A person (node) can belong to multiple communities, e.g., family, friends, colleagues, etc. Overlapping community detection allows that a node can be included in different groups family, friends, colleagues,

Existing Methods Node-based: A node overlaps if more than one belonging coefficient values are larger than some threshold Label Propagation (COPRA) [Gregory 2010, Subelj and Bajec 2011] Structure-based: A node overlaps if it participates in multiple base structures with different memberships Clique Percolation (CPM) [Palla et al. 2005, Derenyi et al. 2005] Link Partition [Evans and Lambiotte 2009 , Ahn et al. 2010] f(i,c1)=0.35, f(i,c2)=0.05, f(i,c3)=0.4, … Base structure: cliques of size 𝑘 Base structure: links 𝜏=0.3 𝑘=4 i i i f(i,c)=mean(f(j,c)) j ∈ nbr(i)

Limitations of Existing Methods The existing methods do not perform well for 1. networks with many highly overlapping nodes, 2. networks with various base structures, and 3. networks with many weak-ties i f(i,c1)=0.2, f(i,c2)=0.15, f(i,c3)=0.25, f(i,c4)=0.2, … c1 c4 c2 c3 𝜏=0.3 𝑘≥3 Weak-tie i: overlapping COPRA fails i: non-overlapping CPM fails Link partition fails

Contents Motivation Link-Space Transformation Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

Our Solution We propose a new framework called the link-space transformation that transforms a given graph into the link-space graph We develop an algorithm that performs a non-overlapping clustering on the link-space graph, which enables us to discover overlapping clustering Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

Overall Procedure We propose an overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

Link-Space Transformation Topological structure Each link of an original graph maps to a node of the link-space graph Two nodes of the links-space graph are adjacent if the corresponding two links of the original graph are incident Weights Weights of links of the link-space graph are calculated from the similarity of corresponding links of the original graph i1 j1 1 2 3 4 i0 i2 j2 j3 i j ik jk j4 k k5 k8 𝑤 𝑣 𝑖𝑘 , 𝑣 𝑗𝑘 =𝜎 𝑒 𝑖𝑘 , 𝑒 𝑗𝑘 5 6 7 8 k6 k7

Overall Procedure Overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

Clustering on Link-Space Graph Applying a non-overlapping clustering algorithm to the link-space graph We use structural clustering that can assign a node into hubs or outliers (neutral membership) 1 4 03 3 13 34 Another weights are less than 1/3 1/2 1/2 1 1 2 5 12 23 35 45 1/2 1/2 Original graph Non-overlapping clustering on the link-space graph

Overall Procedure Overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Non-overlapping Clustering Membership Translation

Membership Translation Memberships of nodes of the link-space graph map to the memberships of links of the original graph Memberships of a node of the original graph are from the memberships of incident links of the node 03 1 4 13 34 1/2 1/2 3 1 1 12 23 35 45 1/2 1/2 2 5 Non-overlapping clustering on the link-space graph Membership translation

Advantages of Link-Space Graph Inheriting the advantages of the link-space graph, finding disjoint communities enables us to find overlapping communities where its original structure is preserved since similarity properly reflect the structure of the original graph. Easier to find overlapping communities Preserving the original structure Easier to find overlapping communities while preserving the original structure Link-space graph +

Contents Motivation Link-Space Transformation Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

LinkSCAN* We propose an efficient overlapping clustering algorithm using the link-space transformation For a massive graph, it may be dense Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Structural Clustering Membership Translation

LinkSCAN* We propose an efficient overlapping clustering algorithm using the link-space transformation Sampling process Original Graph Link-Space Graph Link Communities Overlapping Communities Link-Space Transformation Structural Clustering Membership Translation

LinkSCAN* We propose an efficient overlapping clustering algorithm using the link-space transformation Original Graph Link-Space Graph Sampled Graph Link Communities Overlapping Communities Link-Space Transformation Link Sampling Structural Clustering Membership Translation

Link Sampling Sampling Strategy: For each node 𝑣, we sample 𝑛 𝑣 incident links of 𝑣, where 𝑛 𝑣 = min 𝑑 𝑣 ,𝛼+𝛽 ln 𝑑 𝑣 and 𝑑 𝑣 is the degree of 𝑣 Thm 1 guarantees that sampling errors are not significant even when 𝑛 𝑣 is small For real nets, a sampled graph and the link-space graph are close (NMI>0.9) , while sampling rate is small (~0.1) Thm 1 (Error bound) Applying Chernoff bound, the estimation error of selecting core nodes decreases exponentially as the 𝑛 𝑣 ’s increase.

Contents Motivation Link-Space Transformation Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

Network Datasets Synthetic network: LFR benchmark networks [Lancichinetti and Fortunato 2009] Real network: Social and information networks [snap.stanford.edu/data/ and www.nd.edu/~networks/resources.htm] # nodes # links Aver. degree Clust. Coeff. DBLP 1,068,037 3,800,963 7.50 0.19 Amazon 334,863 925,872 5.53 0.21 Enron-email 36,692 183,831 10.02 0.08 Brightkite 58,228 214,078 7.35 0.11 Facebook 63,392 816,886 25.77 0.15 WWW 325,729 1,090,108 6.69 0.09

Performance Evaluation When ground-truth is known NMI for overlapping clustering [ancichietti et al. 2009] F-score (performance of identifying overlapping nodes) When ground-truth is unknown Quality (Mov): Modularity for overlapping clustering [Lazar et al. 2010] Coverage (CC): Clustering coverage [Ahn et al. 2010]

Problem 1 For networks with many highly overlapping nodes, LinkSCAN* outperforms the existing methods.

Problem 2 For networks with various base-structures, our method performs well compared to the existing methods

Problem 3 For networks with many weak ties, the existing methods fail for the following toy networks. But, LinkSCAN* detects all the clusters well

Real Networks For real network datasets, the normalized measure of (Quality + Coverage) indicates that LinkSCAN* is better than the existing methods.

Link Sampling The comparisons between the use of the link-space graph (LinkSCAN) and the use of sampled graphs (LinkSCAN*) show that LinkSCAN* improves efficiency with small errors Enron-email network # nodes = 37K # links = 184K 𝛼=0.5 𝑑 ~16 𝑑 𝛽=1

Scalability The running time of LinkSCAN∗ for a set of LFR benchmark networks shows that LinkSCAN∗ has near-linear scalability LFR benchmark networks # nodes = 1K to 1M # links = 10K to 10M 𝛼=2 𝑑 𝛽=1

Contents Motivation Link-Space Transformation Proposed Algorithm: LinkSCAN* Experiment Evaluation Conclusions

Conclusions We propose a notion of the link-space transformation and develop a new overlapping clustering algorithms LinkSCAN* that satisfy membership neutrality LinkSCAN* outperforms existing algorithms for the networks with many highly overlapping nodes and those with various base-structures

Acknowledgement Coauthors Funding Agencies This research was supported by National Research Foundation of Korea

Thank You!