Statistical properties of network community structure

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic.
Jure Leskovec (Stanford) Kevin Lang (Yahoo! Research) Michael Mahoney (Stanford)
The Connectivity and Fault-Tolerance of the Internet Topology
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Models of Network Formation Networked Life NETS 112 Fall 2013 Prof. Michael Kearns.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
CMU SCS KDD 2006Leskovec & Faloutsos1 ??. CMU SCS KDD 2006Leskovec & Faloutsos2 Sampling from Large Graphs poster# 305 Jurij (Jure) Leskovec Christos.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Community Structure in Large Social and Information Networks Michael W. Mahoney Stanford University (Joint work with Kevin Lang and Anirban Dasgupta of.
Power Laws Otherwise known as any semi- straight line on a log-log plot.
Community Structure in Large Social and Information Networks Michael W. Mahoney (Joint work at Yahoo with Kevin Lang and Anirban Dasgupta, and also Jure.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Approximation Algorithms as “Experimental Probes” of Informatics Graphs Michael W. Mahoney Stanford University ( For more info, see: cs.stanford.edu/people/mmahoney/
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Patterns And A Generative Model Jan 24, 2014 Authors: Jianwei Niu, Wanjiun Liao, Jing Peng, Chao Tong Presenter: Guoming Wang Published: Performance Computing.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Self-Similarity of Complex Networks Maksim Kitsak Advisor: H. Eugene Stanley Collaborators: Shlomo Havlin Gerald Paul Zhenhua Wu Yiping Chen Guanliang.
Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney WWW’ 2008 Statistical Properties of Community Structure in Large Social and Information.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
Graph clustering to detect network modules
Network (graph) Models
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Random Walk for Similarity Testing in Complex Networks
Cohesive Subgraph Computation over Large Graphs
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Groups of vertices and Core-periphery structure
Lecture 1: Complex Networks
Topics In Social Computing (67810)
Community detection in graphs
Overview of Communities in networks
Hierarchical clustering approaches for high-throughput data
Effective Social Network Quarantine with Minimal Isolation Costs
Models of Network Formation
Lecture 13 Network evolution
Community Detection: Overlapping Communities
Models of Network Formation
Generative models preserving community structure
Peer-to-Peer and Social Networks Fall 2017
Models of Network Formation
Small World Networks Scotty Smith February 7, 2007.
Department of Computer Science University of York
Network Science: A Short Introduction i3 Workshop
Approximating the Community Structure of the Long Tail
Models of Network Formation
CS 201 Fundamental Structures of Computer Science
Overcoming Resolution Limits in MDL Community Detection
3.3 Network-Centric Community Detection
A Introduction to Computing II Lecture 13: Trees
Section 8.3: Degree Distribution
Lecture 21 Network evolution
Network Science: A Short Introduction i3 Workshop
Detecting Important Nodes to Community Structure
Science ACT Prep Class Research Summary.
Affiliation Network Models of Clusters in Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Network Models Michael Goodrich Some slides adapted from:
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Statistical properties of network community structure Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Network communities Communities: Assumption: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption: Networks are (hierarchically) composed of communities (modules) Our question: Are large networks really like this? Communities, clusters, groups, modules

Community score (quality) How community like is a set of nodes? Need a natural intuitive measure Conductance (normalized cut) S’ Φ(S) = # edges cut / # edges inside Small Φ(S) corresponds to more community-like sets of nodes

Community score (quality) What is “best” community of 5 nodes? Score: Φ(S) = # edges cut / # edges inside

Community score (quality) What is “best” community of 5 nodes? Bad community Φ=5/6 = 0.83 Score: Φ(S) = # edges cut / # edges inside

Community score (quality) What is “best” community of 5 nodes? Bad community Φ=5/7 = 0.7 Better community Φ=2/5 = 0.4 Score: Φ(S) = # edges cut / # edges inside

Community score (quality) What is “best” community of 5 nodes? Bad community Φ=5/7 = 0.7 Best community Φ=2/8 = 0.25 Better community Φ=2/5 = 0.4 Score: Φ(S) = # edges cut / # edges inside

Network Community Profile Plot We define: Network community profile (NCP) plot Plot the score of best community of size k Search over all subsets of size k and find best: Φ(k=5) = 0.25 NCP plot is intractable to compute Use approximation algorithm

NCP plot: Small Social Network Dolphin social network Two communities of dolphins NCP plot Network

NCP plot: Zachary’s karate club Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B Network NCP plot

NCP plot: Network Science Collaborations between scientists in Networks Network NCP plot

Geometric and Hierarchical graphs Geometric (grid-like) network – Small social networks – Geometric and – Hierarchical network have downward NCP plot Hierarchical network

Our work: Large networks Previously researchers examined community structure of small networks (~100 nodes) We examined more than 70 different large social and information networks Large real-world networks look very different!

Example of our findings Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges)

NCP: LiveJournal (N=5M, E=42M) Community score Community size Better and better communities Best communities get worse and worse Best community has 100 nodes

Explanation: Downward part NCP plot Whiskers are responsible for downward slope of NCP plot Largest whisker Whisker is a set of nodes connected to the network by a single edge

Explanation: Upward part NCP plot Each new edge inside the community costs more Φ=1/3 = 0.33 Φ=2/4 = 0.5 Φ=8/6 = 1.3 Φ=64/14 = 4.5 Each node has twice as many children

Suggested network structure Denser and denser core of the network Core contains 60% node and 80% edges Whiskers are responsible for good communities Network structure: Core-periphery (jellyfish, octopus)

Caveat: Bag of whiskers What if we allow cuts that give disconnected communities? Cut all whiskers Compose communities out of whiskers How good “community” do we get?

Communities made of whiskers We get better community scores when composing disconnected sets of whiskers Community score Connected communities Bag of whiskers Community size

Comparison to rewire network Take a real network G Rewire edges for a long time We obtain a random graph with same degree distribution as the real network G

Comparison to a rewired network Rewired network: random network with same degree distribution

What is a good model? What is a good model that explains such network structure? None of the existing models work Pref. attachment Flat Flat and Down Down and Flat Small World Geometric Pref. Attachment

Forest Fire model works Forest Fire: connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively As community grows it blends into the core of the network

Forest Fire NCP plot rewired network Bag of whiskers

Conclusion and connections Whiskers: Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social relationship to at most 150 people Bond vs. identity communities Core: Core has little structure (hard to cut) Still more structure than the random network

Conclusion and connections NCP plot is a way to analyze network community structure Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms) But large networks are different Large networks Whiskers + Core structure Small well isolated communities blend into the core of the networks as they grow