DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Linked data: P redicting missing properties Klemen Simonic, Jan Rupnik, Primoz Skraba {klemen.simonic, jan.rupnik,
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
DIDS part II The Return of dIDS 2/12 CIS GrIDS Graph based intrusion detection system for large networks. Analyzes network activity on networks.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Models of Influence in Online Social Networks
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Data Mining Chun-Hung Chou
Social Networking and On-Line Communities: Classification and Research Trends Maria Ioannidou, Eugenia Raptotasiou, Ioannis Anagnostopoulos.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Link Recommendation In P2P Social Networks Yusuf Aytaş, Hakan Ferhatosmanoğlu, Özgür Ulusoy Bilkent University, Ankara, Turkey.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Event Metadata Records as a Testbed for Scalable Data Mining David Malon, Peter van Gemmeren (Argonne National Laboratory) At a data rate of 200 hertz,
An affinity-driven clustering approach for service discovery and composition for pervasive computing J. Gaber and M.Bakhouya Laboratoire SeT Université.
Victor Lee.  What are Social Networks?  Role and Position Analysis  Equivalence Models for Roles  Block Modelling.
VAST 2011 Sebastian Bremm, Tatiana von Landesberger, Martin Heß, Tobias Schreck, Philipp Weil, and Kay Hamacher Interactive-Graphics Systems TU Darmstadt,
Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
A Graph-based Friend Recommendation System Using Genetic Algorithm
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign User Profiling in Ego-network: Co-profiling Attributes and Relationships.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.
Network Community Behavior to Infer Human Activities.
Minas Gjoka, Emily Smith, Carter T. Butts
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
COMMUNITY DISCOVERY PART 1: A (BRIEF) INTRODUCTION Giulio Rossetti WMA - 4 May 2015.
Peer to Peer Network Design Discovery and Routing algorithms
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Quantification in Social Networks Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, Fabrizio Sebastiani Computer Science.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
CS 590 Term Project Epidemic model on Facebook
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
PhD Thesis Proposal Evolution in Social Networks Candidate Giulio Rossetti Supervisor Dino Pedreschi Supervisor Fosca Giannotti Pisa, Computer Science.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
November 22, Algorithms and Data Structures Lecture XII Simonas Šaltenis Nykredit Center for Database Research Aalborg University
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Multidimensional Network Analysis Foundations of multidimensional Network Analysis, Berlingerio, Coscia, Giannotti, Monreale, Pedreschi. WWW Journal 2012.
Graph clustering to detect network modules
DEMON A Local-first Discovery Method For Overlapping Communities
Community detection in graphs
CASE − Cognitive Agents for Social Environments
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Presentation transcript:

DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer Science Dep., University of Pisa, Italy 2 ISTI - CNR KDDLab, Pisa, Italy {fosca.giannotti, 3 Harvard Kennedy School, Cambridge, MA, US April 23th 2013

Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

What is a community? Unfortunately does not exist a completely shared definition of what a community is. A general idea is that a community represent: “A set of entities where each entity is closer, in the network sense, to the other entities within the community than to the entities outside it.” or “A set of nodes tightly connected within each other than with nodes belonging to other sets.”

Community Discovery The aim of CD algorithms is to identify communities hidden into complex network structure Why Community Discovery? “Cluster” homogeneous nodes relying on topological information (Clustering networked entities) Major Problems: Each algorithm models different properties of real world communities Comparison and evaluation of different methodologies is not trivial Found an acceptable compromise between number of communities and their sizes Context Dependency

Community Discovery Approaches Given the complexity of the problem a number of different typologies of approaches where proposed, analyzing: Directed\Undirected edges Weighted\Unweighted edges Top-Down\Bottom-Up partitioning Multidimensionality Overlap among Communities Hierarchical Communities … DEMON: Undirected, Bottom-Up, Overlapping (with Directed, Weighted, Hierarchical extensions)

Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

Communities in (Social) Networks Communities can be seen as the basic bricks of a (social) network In simple, small, networks it is easy identify them by looking at the structure..

…but real world networks are not “simple” We can’t identify easily different communities Too many nodes and edges

Are they two different phenomena? No!

A Matter of Perspective The only difference is in the scale Locally, for each node the structure makes sense Globally, we are tangled in complex overlaps Idea: a bottom-up approach!

Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

Reducing the complexity Real Networks are Complex Objects Can we make them “simpler”? Ego-Networks (networks builded upon a focal node, the "ego”, and the nodes to whom ego is directly connected to plus the ties, if any, among the alters)

DEMON Algorithm For each node n: 1. Extract the Ego Network of n 2. Remove n from the Ego Network 3. Perform a Label Propagation 1 4. Insert n in each community found 5. Update the raw community set C For each raw community c in C 1. Merge with “similar” ones in the set (given a threshold) (i.e. merge iff at most the ε% of the smaller one is not included in the bigger one) 1 Usha N. Raghavan, R ́eka Albert, and Soundar Kumara. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E

Each node has an unique label (i.e. its id) In the first (setup) iteration each node, with probability α, change its label to one of the labels of its neighbors; At each subsequent iteration each node adopt as label the one shared (at the end of the previous iteration) by the majority of its neighbors; We iterate untill consensus is reached. Label Propagation – The idea

Label Propagation – Discussion Why Label Propagation? Quasi-linear algorithm Share our idea of what a community is Problem: Ping-Pong effect (the algorithm is non-deterministic) Solution Multilabel allowed (we need overlapping communities after all…)

DEMON - Two nice properties Incrementality: Given a graph G, an initial set of communities C and an incremental update ∆G consisting of new nodes and new edges added to G, where ∆G contains the entire ego networks of all new nodes and of all the preexisting nodes reached by new links, then Those property makes the algorithm highly parallelizable: it can run independently on different fragments of the overall network with a relatively small combination work Compositionality: Consider any partition of a graph G into two subgraphs G1, G2 such that, for any node v of G, the entire ego network of v in G is fully contained either in G1 or G2. Then, given an initial set of communities C: DEMON(G 1 ∪ G 2,C) = Max(DEMON(G 1,C), DEMON(G 2,C)) DEMON(∆G ∪ G,C) = DEMON(∆ G, DEMON(G,C))

Experiments  Networks (with metadata) : Congress (nodes US politicians, connected if they co-sponsor the same bills) IMDb (nodes Actors, connected if they play in the same movies) Amazon (nodes Products, connected if they were purchased together)  Compared Algorithms: Infomap, non-overlapping state-of-the-art  Rosvall and Bergstrom “Maps of random walks on complex networks reveal community structure”, PNAS, 2008 HLC, overlapping state-of-the-art  Ahn, Bagrow and Lehmann “Link communities reveal multiscale complexity in networks”, Nature, 2010

Quality Evaluation – Community size number of communities average community size Amazon

Quality Evaluation - Label Prediction Multilabel Classificator (BRL, Binary Relevance Learner) Community memberships of a node as known attributes, real world labels (qualitative attributes) target to be predicted; IMDbCongress

Quality Evaluation - Community Cohesion How good is our community partition in describing real world knowledge about the clustered entities? “Similar nodes share more qualitative attributes than dissimilar nodes” Iff CQ(P)>1 we are grouping together similar nodes

HDemon – Hierarchical merge Why Hierarchical merge? 1. Classic DEMON Merge function did not scale well Complexity issue (~O(|C| 2 )) Bottleneck for huge networks (such as social graphs) 2. We need to find the right granularity for the communities Extensions needed for Label Propagation Algorithm: Weighted networks Directed networks (not yet used)

HDemon – Hierarchical merge HDemon(Graph G) Cc = connectedComponent(G) C = ExtractCommunities(G) while (|C|>Cc) For c in C: N <- N ∪ make_node(c) For (n,m) in N: If (n share nodes with m): E <- E ∪ (n,m) C <- ExtractCommunities(new Graph(N,E)) ExtractCommunities(Graph G) Egos <- EgoNetworks(G) for e in Egos: C = C ∪ LabelPropagation(e) return C

Outline Problem Definition What is a community? Community Discovery Communities and complex (social) networks A matter of perspective DEMON Algorithm(s) Properties Experiments Extension Conclusions

Future works – Framework structure HFDemon(Graph G) Cc ← |connectedComponent(G)| C ← ExtractCommunities(G) while (|C|>Cc) For c in C: N ← N ∪ make_node(c) For (n,m) in N: If n share nodes with m E ← E ∪ (n,m) C ← ExtractCommunities(new Graph(N,E)) ExtractCommunities(Graph G) C ← return C Different scenarios may require requires alternative communities “definitions”. Framework for Bottom-up (and overlapping) CD Regular vs. Hierarchical FDemon(Graph G) C ← Forall c in C(v) C ← Merge(C,c,merging_function) return C

Future Works – Social Community Evolution Thesis proposal “Evolution in Social Networks” Idea: 1. Social networks are not static objects Nodes, Edges can appear and disappear The same interaction could occur multiple times Communities changes consequently Major Problems 1. Size and granularity of the communities influence hevily evolutive models Hierarchical merging? 2. Which are the nodes prone to leave\join a communities? Role identification 3. How “strong” is a community? Community strength measure Community life-cycle

Conclusions DEMON approaches the community discovery problem trough the analysis of simple network sub-structures (ego-networks) Overlapping and Hierarchical algorithms are guided by a social perspective DEMON outperforms state-of-the-art methodologies Possible parallel implementation: high scalability

Bibliography

Thanks! Questions ? Code (extensions coming soon!)