COMMUNITY DISCOVERY PART 1: A (BRIEF) INTRODUCTION Giulio Rossetti WMA - 4 May 2015.

Slides:



Advertisements
Similar presentations
Community Detection and Graph-based Clustering
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
Analysis and Modeling of Social Networks Foudalis Ilias.
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Community Detection and Evaluation
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Introduction to Bioinformatics
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Fast algorithm for detecting community structure in networks.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Clustering (Part II) 10/07/09. Outline Affinity propagation Quality evaluation.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Clustering Unsupervised learning Generating “classes”
Social Media Mining Graph Essentials.
Models of Influence in Online Social Networks
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
© Y. Zhu and Y. University of North Carolina at Charlotte, USA 1 Chapter 1: Social-based Routing Protocols in Opportunistic Networks Ying Zhu and.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer.
Victor Lee.  What are Social Networks?  Role and Position Analysis  Equivalence Models for Roles  Block Modelling.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Clustering.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Concept Switching Azadeh Shakery. Concept Switching: Problem Definition C1C2Ck …
University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.
Topics Paths and Circuits (11.2) A B C D E F G.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.
Network Community Behavior to Infer Human Activities.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Quantification in Social Networks Letizia Milli, Anna Monreale, Giulio Rossetti, Dino Pedreschi, Fosca Giannotti, Fabrizio Sebastiani Computer Science.
CS 590 Term Project Epidemic model on Facebook
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: Department of Computer Science and Engineering Shanghai Jiao Tong.
PhD Thesis Proposal Evolution in Social Networks Candidate Giulio Rossetti Supervisor Dino Pedreschi Supervisor Fosca Giannotti Pisa, Computer Science.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Alan Mislove Bimal Viswanath Krishna P. Gummadi Peter Druschel.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
DEMON A Local-first Discovery Method For Overlapping Communities
Greedy Algorithm for Community Detection
Community detection in graphs
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Resolution Limit in Community Detection
Overcoming Resolution Limits in MDL Community Detection
Presentation transcript:

COMMUNITY DISCOVERY PART 1: A (BRIEF) INTRODUCTION Giulio Rossetti WMA - 4 May 2015

Community Discovery The aim of Community Discovery algorithms is to identify communities hidden into complex network structures Why Community Discovery? “Cluster” homogeneous nodes relying on topological information (Clustering networked entities) Major Problems: Community Discovery is an ill posed problem Each algorithm models different properties of communities Different approaches comparison Context Dependency

Community Charachteristics Given the complexity of the problem a number of different typologies of approaches where proposed in order to: Analyze: Directed\Undirected graphs Weighted\Unweighted graphs Multidimensional graphs … Following: Top-Down\Bottom-Up partitioning … Producing: Overlapping Communities Hierarchical Communities …

But…what is exactly a community? Unfortunately does not exist a completely shared definition of what a community is… A general idea is that a community represent: “A set of entities where each entity is closer, in the network sense, to the other entities within the community than to the entities outside it.” or “A set of nodes tightly connected within each other than with nodes belonging to other sets.”

Communities in Complex Networks Communities can be seen as the basic bricks of a network In simple, small, networks it is easy identify them by looking at the structure…

A first example… Zachary’s Karate Club Communities emerge from the breakup of the Club

…however, real world networks are not often that “simple”… We can’t identify easily different communities Too many nodes and edges

Communities: some Hypotesis H1: The community structure is uniquely encoded in the wiring diagram of the overall network H2: A community corresponds to a connected subgraph H3: Communities are locally dense neighborhoods of a network

COMMUNITY DISCOVERY PART 2: THE NIGHTMARE OF AN ILL POSED PROBLEM Giulio Rossetti WMA - 4 May 2015

Can we classify CD algorithms according to a taxonomy of community definitions? Which kind of community do you like? We can divide CD algorithms in: Internal density Bridge detection Feature distance Percolation Entity closeness Structure definition Link communities No a priori definition

Internal Density “A community in a complex network is a set of entities that are densely connected” Each community must have a number of edges significantly higher than the expected number of edges in a random graph

Internal density (cont’d) How to assure high density? General Idea: define a quality function measuring the density of a community and then try to maximize it Popular concept: Modularity optimization Example: Louvain

Bridge detection Partitioning approaches Example: Girvan-Newman ( edge-betweenness) “A community is a component of the network obtained by removing from the structure all the sparse bridges that connects the dense parts of the network”

Density vs Bridges These two definitions seems very similar... Are they equivalent? ● for some networks yes; ● for very dense network there are no clear bridges.

Density vs Bridges (cont’d) Moreover, for very sparse networks a density definition will fail even if we can detect some bridges

Feature distance Once defined a distance measure based on the values of the selected features, the entities within a community are very close to each other, more than entities outside the community. Clustering approach ● It considers any kind of vertex features, not only their adjacencies (in the latter case we can map this definition in the density one). “A community is a set of entities that share a precise set of features”

Percolation Example: Label Propagation, DEMON “A community is a set of nodes who are grouped together by the propagation of the same property, action or information in the network”

Each node has an unique label (i.e. its id) In the first (setup) iteration each node, with probability α, change its label to one of the labels of its neighbors; At each subsequent iteration each node adopt as label the one shared (at the end of the previous iteration) by the majority of its neighbors; We iterate untill consensus is reached. Label Propagation

DEMON: A Matter of Perspective… Locally, each node is able to identify its communities Globally, we are tangled in complex overlaps Idea: a bottom-up approach!

Reducing the complexity Real Networks are Complex Objects Can we make them “simpler”? Ego-Networks (networks builded upon a focal node, the "ego”, and the nodes to whom ego is directly connected to plus the ties, if any, among the alters)

DEMON Algorithm For each node n: 1. Extract the Ego Network of n 2. Remove n from the Ego Network 3. Perform a Label Propagation 4. Insert n in each community found 5. Update the raw community set C For each raw community c in C 1. Merge with “similar” ones in the set (given a threshold) (i.e. merge iff at most the ε% of the smaller one is not included in the bigger one)

1. Log out from Facebook and clean your browser cookies 2. Visit: kddsna.isti.cnr.it: Log In with Facebook 4. Select one of the two options: 1. “Visualize your network” 2. “Demon Communities” 5. Wait for the data to be collected and displayed 6. Zoom-in/out and drag communities with your mouse Personal Facebook Communities

Entity closeness Example: Random Walks (conductance analysis) “A community is a set of nodes who can reach any member of its group crossing a very low number of edges, significantly lower than the average shortest path in the network”

Structure definition Example: k-cliques “A community is a set of nodes with a precise number of edges among them, distributed in a very precise topology defined by a number of rules”

Example: Clique percolation ● A very popular structure definition algorithm: k-cliques ● Also this case is different from the density definition: node 7 is in some sense “dense” (is in a triangle), but outside of any community

Link communities “A community is a set of nodes which share a number of relations clustered together since they belong to a particular relational environment” It is the relation that belongs to a community and then the nodes belong to the communities of their connections No a Priori Definitions “Communities are sets which present a number of particular features defined by an analyst”

COMMUNITY DISCOVERY PART 3: EVOLUTIONARY COMMUNITY DISCOVERY Giulio Rossetti WMA - 4 May 2015

Are we missing something? Real world networks evolve quickly: Social interactions Buyer-seller Stock-exchanges … In these scenarios a QSSA (Quasi Steady State Assumption) rarely holds: Network cannot be “frozen in time” Nodes and edges rise and fall producing perturbation on the whole topology The reduction to static scenarios trough temporal discretization is not always a good idea How can we chose the temporal threshold? To what extent can we trust the obtained results?

Community life-cycle As time goes by the rising of novel nodes and edges (as well as the vanishing of old ones) led to network perturbations Communities can be deeply affected by such changes

Conclusions Nowadays Community Discovery is, perhaps, the hottest topic in complex network analysis Major issues: Problem definition Community evaluation Problem specializations: Multidimensional Community Discovery Evolutionary Community Discovery How communities evolve in dynamic networks? …

Bibliography S. Fortunato Community detection in graphs. Physics Reports 486 (Feb. 2010). M. Coscia, F. Giannotti, and D. Pedreschi. A classification for community discovery methods in complex networks. Statistical Analysis and Data Mining 4, 5 (2011), 512–546.