Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.

Slides:



Advertisements
Similar presentations
Network analysis Sushmita Roy BMI/CS 576
Advertisements

Emergence of Scaling in Random Networks Albert-Laszlo Barabsi & Reka Albert.
Jure Leskovec (Stanford) Kevin Lang (Yahoo! Research) Michael Mahoney (Stanford)
Analysis and Modeling of Social Networks Foudalis Ilias.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 A Random-Surfer Web-Graph Model (Joint work with Avrim Blum & Hubert Chan) Mugizi Rwebangira.
CS728 Lecture 5 Generative Graph Models and the Web.
Graphs (Part I) Shannon Quinn (with thanks to William Cohen of CMU and Jure Leskovec, Anand Rajaraman, and Jeff Ullman of Stanford University)
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
The Barabási-Albert [BA] model (1999) ER Model Look at the distribution of degrees ER ModelWS Model actorspower grid www The probability of finding a highly.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
Community Structure in Large Social and Information Networks Michael W. Mahoney Stanford University (Joint work with Kevin Lang and Anirban Dasgupta of.
Community Structure in Large Social and Information Networks Michael W. Mahoney (Joint work at Yahoo with Kevin Lang and Anirban Dasgupta, and also Jure.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Survey on Evolving Graphs Research Speaker: Chenghui Ren Supervisors: Prof. Ben Kao, Prof. David Cheung 1.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Self-Similarity of Complex Networks Maksim Kitsak Advisor: H. Eugene Stanley Collaborators: Shlomo Havlin Gerald Paul Zhenhua Wu Yiping Chen Guanliang.
Graph Algorithms: Properties of Graphs? William Cohen.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney WWW’ 2008 Statistical Properties of Community Structure in Large Social and Information.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Complex network of the brain II Attack tolerance vs. Lethality Jaeseung Jeong, Ph.D. Department of Bio and Brain Engineering, KAIST.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Graph clustering to detect network modules
Network (graph) Models
Structures of Networks
Groups of vertices and Core-periphery structure
Topics In Social Computing (67810)
How Do “Real” Networks Look?
Community detection in graphs
CS224W: Social and Information Network Analysis
How Do “Real” Networks Look?
How Do “Real” Networks Look?
Lecture 13 Network evolution
Community Detection: Overlapping Communities
Statistical properties of network community structure
Peer-to-Peer and Social Networks Fall 2017
How Do “Real” Networks Look?
Lecture 21 Network evolution
Network Science: A Short Introduction i3 Workshop
Affiliation Network Models of Clusters in Networks
Diffusion in Networks
Network Models Michael Goodrich Some slides adapted from:
Presentation transcript:

Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research

 Big data  Study emerging behaviors  How are small networks different from large 2

 Communities (groups, clusters, modules):  Sets of nodes with lots of connections inside and few to outside (the rest of the network) 3 Communities, clusters, groups, modules

 Nodes represent proteins  Edges represent interactions/associations  Proteins with same function interact more  Can use network to discover functional groups 4 Yeast transcriptional regulatory modules [Bar-Joseph et al., 2003]

 Clusters correspond to social communities, organizational units (e.g., departments) 5 Zachary’s Karate club network During the study the club split into 2 The split corresponds to min-cut ( ● vs. ■ )

6 [Adamic-Glance 2005] Democrat vs. Republican blogs

7 Citations Collaborations [Newman 2003]

 Nested communities: modular structure of networks is hierarchically organized 8 CS Math DramaMusic Science Arts University

 Recursive hierarchical network 9 (a) N=5, E=8 (b) N=25, E=56 (c) N=125, E=344

 Intuition: Find nodes that can be easily separated from the rest of the network  Various objective functions  Min-cut  Normalized-cut  Centrality, Modularity  Various algorithms  Spectral clustering (random walks)  Girvan-Newman (centrality)  Metis (contraction based) 10 Girvan-Newman: 1) Betweenness centrality: number of shortest paths passing through an edge. 2) Remove edges by decreasing centrality

11

Statistical properties of community structure  Instead of searching for communities we measure well how expressed are communities Questions  What is the community structure of real world networks?  How to measure and quantify this?  What does this tell us about network structure?  What is a good model (intuition)?  What are consequences for clustering/partitioning algorithms? 12

 How community like is a set of nodes?  Need a natural intuitive measure  Conductance (normalized cut) Φ(S) = # edges cut / # edges inside  Small Φ(S) corresponds to more community-like sets of nodes S S’ 13

Score: Φ(S) = # edges cut / # edges inside What is “best” community of 5 nodes? 14

Score: Φ(S) = # edges cut / # edges inside Bad community Φ=5/6 = 0.83 What is “best” community of 5 nodes? 15

Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 What is “best” community of 5 nodes? 16

Score: Φ(S) = # edges cut / # edges inside Better community Φ=5/7 = 0.7 Bad community Φ=2/5 = 0.4 Best community Φ=2/8 = 0.25 What is “best” community of 5 nodes? 17

 We define: Network community profile (NCP) plot Plot the score of best community of size k  Search over all subsets of size k and find best: Φ(k=5) = 0.25  NCP plot is intractable to compute 18

 We define: Network community profile (NCP) plot Plot the score of best community of size k 19 Community size, log k log Φ(k) k=5, Φ(k)=0.25 k=7, Φ(k)=0.18

20 Community size, log k Community score, log Φ(k)

 Local spectral clustering algorithm  Pick a seed node  Slowly diffuse mass around it (via PageRank like random walk)  Find the bottleneck  Repeat many times  Many seed nodes for very local walks  Less seed nodes for more global (longer) walks 21

22

 Dolphin social network  Two communities of dolphins NCP plot Network 23

 Zachary’s university karate club social network  During the study club split into 2  The split (squares vs. circles) corresponds to cut B NCP plotNetwork 24

 Collaborations between scientists in Networks NCP plotNetwork 25

26 NCP plot Network

27 NCP plot Network

 Manifold learning dataset (Hands) 28 NCP plot Network

 Eastern US power grid: 29

30 NCP plot Network – Small social networks – Geometric and – Hierarchical network have downward NCP plot What about large networks?

31

 Previously researchers examined community structure of small networks (~100 nodes)  We examined more than 70 different large networks Large real-world networks look very different! 32

 Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges) 33

Community score Community size Better and better communities Best communities get worse and worse Best community has 100 nodes 34

 Whiskers are responsible for downward slope of NCP plot Whisker is a set of nodes connected to the network by a single edge NCP plot Largest whisker 35

 Each new edge inside the community costs more NCP plot Φ=2/4 = 0.5 Φ=8/6 = 1.3 Φ=64/14 = 4.5 Each node has twice as many children Φ=1/3 =

 Take a real network G  Rewire edges for a long time  We obtain a random graph with same degree distribution as the real network G 37

38 Rewired network: random network with same degree distribution

39 Whiskers in real networks are larger than expected

40 Whiskers in real networks are non-trivial (richer than trees) Edge to cut

What if we allow cuts that give disconnected communities? Cut all whiskers Compose communities out of whiskers How good “communities” do we get? 41

Community score Community size We get better community scores when composing disconnected sets of whiskers Connected communities Bag of whiskers 42

43 Nothing happens! Now we have 2-edge connected whiskers to deal with.

44 Connected communities Bag of whiskers Rewired network

Network structure: Core-periphery (jellyfish, octopus) Whiskers are responsible for good communities Denser and denser core of the network Core contains 60% node and 80% edges 45

46

 (Sparse) Random graph:  Start with N nodes  Pick pairs of nodes uniformly at random and connect 47 Flat (long random connections) Theorem (works for any degree distribution) Sparsity does not explain our observation

48  Preferential attachment [Price 1965, Albert & Barabasi 1999]:  Add a new node, create m out-links  Probability of linking a node k i is proportional to its degree  Based on Herbert Simon’s result  Power-laws arise from “Rich get richer” (cumulative advantage) Flat (connections to hubs – no locality)

 Let’s exploit local connections 49 Down (locally network looks like a mesh) and Flat (at large scale network looks random)

 Geometric preferential attachment:  Place nodes at random in 2D  Pick a node  Pick nodes in a radius  Connect preferentially 50 Flat (locally network is random) and Down (globally network is a mesh – union of local expanders)

 Forest Fire: connections spread like a fire  New node joins the network  Selects a seed node  Connects to some of its neighbors  Continue recursively As community grows it blends into the core of the network 51

rewired network Bag of whiskers 52

 Whiskers:  Largest whisker has ~100 nodes  Independent of network size  Dunbar number: a person can maintain social relationship to at most 150 people  Core:  Core has little structure (hard to cut)  Still more structure than the random network 53

 Other researchers examined small networks so they did not hit the Dunbar’s limit  Small evidence:  400k nodes Amazon co-purchasing network [Clauset et al. 2004] ▪ Largest community has 50% of all nodes ▪ It was labeled “Miscelaneous”  Karate club has no significant community structure [Newman et al. 2007] 54

 Bond vs. identity communities  Multiple hierarchies that blur the community boundaries 55

 Ground truth  Yes, use attributes, better link semantics 56

 NCP plot is a way to analyze network community structure  Our results agree with previous work on small networks (that are commonly used for testing community finding algorithms)  But large networks are different  Large networks  Whiskers + Core structure  Small well isolated communities blend into the core of the networks as they grow 57