Download presentation
Presentation is loading. Please wait.
Published byDarrell Allison Modified over 9 years ago
1
Jure Leskovec (jure@cs.stanford.edu) Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos Faloutsos (CMU), Michael Mahoney (Stanford), Kevin Lang (Yahoo), Anirban Dasgupta (Yahoo)
2
Large on-line computing applications have detailed records of human activity: On-line communities: Facebook (120 million) Communication: Instant Messenger (~1 billion) News and Social media: Blogging (250 million) We model the data as a network (an interaction graph) Can observe and study phenomena at scales not possible before Communication network
3
Community (cluster) structure of networks 3 Collaborations in NetSci (N=380) Tiny part of a large social network What is the structure of the network? How can we model that?
4
Conductance (normalized cut): How community like is a set of nodes? Idea: Use approximation algorithms for NP-hard graph partitioning problems as experimental probes of network structure. Small Φ(S) == more community-like sets of nodes S S’ 4 [w/ Mahoney, Lang, Dasgupta, WWW ’08]
5
We define: Network community profile (NCP) plot Plot the score of best community of size k 5 Community size, log k log Φ(k) Φ(5)=0.25 Φ(7)=0.18 k=5 k=7 [w/ Mahoney, Lang, Dasgupta, WWW ’08]
6
Collaborations between scientists in Networks [Newman, 2005] 6 Community size, log k Conductance, log Φ(k) [w/ Mahoney, Lang, Dasgupta, WWW ’08]
7
Typical example: General relativity collaboration network (4,158 nodes, 13,422 edges) 7 [w/ Mahoney, Lang, Dasgupta, WWW ’08]
8
8
9
Φ(k), (conductance) k, (community size) Better and better communities Communities get worse and worse Best community has ~100 nodes 9 [w/ Mahoney, Lang, Dasgupta, WWW ’08]
10
Each dot is a different network 10 Practically constant! [w/ Mahoney, Lang, Dasgupta, WWW ’08]
11
Core-periphery (jellyfish, octopus) Small good communities Denser and denser core of the network Core contains ~60% nodes and ~80% edges 11 So, what’s a good model?
12
Kronecker product of matrices A and B is given by We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices N x MK x L N*K x M*L 12 [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05]
13
Kronecker graph: a growing sequence of graphs by iterating the Kronecker product Each Kronecker multiplication exponentially increases the size of the graph One can easily use multiple initiator matrices ( G 1 ’, G 1 ’’, G 1 ’’’ ) that can be of different sizes 13 [w/ Chakrabarti-Kleinberg-Faloutsos, PKDD ’05]
14
Kronecker graphs mimic real networks: Theorem: Power-law degree distribution, Densification, Shrinking/stabilizing diameter, Spectral properties Initiator (9x9) (3x3) (27x27) 14 p ij Edge probability Starting intuition: Recursion & self-similarity [w/ Chakrabarti, Kleinberg, Faloutsos, PKDD ’05]
15
15
16
Initiator matrix G 1 is a similarity matrix Node u is described with k binary attributes: u 1, u 2,…, u k Probability of a link between nodes u, v: P(u,v) = ∏ G 1 [u i, v i ] 16 ab cd ab cd ab cd v u = (0,1,1,0) P(u,v) = b∙d∙c∙b 0 1 0 1 v = (1,1,0,1) u Given a real graph. How to estimate the initiator G 1 ?
17
Want to generate realistic networks: How to estimate initiator matrix: Method of moments [Owen ‘09] : Compare counts of subgraphs and solve Maximum likelihood [Leskovec&Faloutsos, ’07] : arg max P( | G 1 ) SVD [VanLoan&Pitsianis ‘93] : Can solve using SVD 17 Compare graphs properties, e.g., degree distribution Given a real network Generate a synthetic network ab cd
18
What do estimated parameters tell us about the network structure? 18 [w/ Dasgupta-Lang-Mahoney, WWW ’08] ab cd a edges d edges b edges c edges
19
What do estimated parameters tell us about the network structure? 19 Core 0.9 edges Periphery 0.1 edges 0.5 edges Core-periphery (jellyfish, octopus) [w/ Dasgupta-Lang-Mahoney, WWW ’08] 0.90.5 0.1
20
Small and large networks are very different: 20 Collaboration network (N=4,158, E=13,422) Scientific collaborations (N=397, E=914) 0.990.54 0.490.13 0.990.17 0.82 G 1 =
21
Computational tools as probes into the structure of large networks Community structure of large networks: Core-periphery structure Scale to natural community size: Dunbar number Model: Kronecker graphs Analytically tractable: provable properties Can efficiently estimate parameters from data Implications: No large clusters: no/little hierarchical structure Can’t be well embedded – no underlying geometry 21
22
Why are networks the way they are? Only recently have basic properties been observed on a large scale Confirms social science intuitions; calls others into question What are good tractable network models? Builds intuition and understanding Benefits of working with large data Observe structures not visible at smaller scales 22
24
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, by J. Leskovec, J. Kleinberg, C. Faloutsos, KDD 2005 Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by J. Leskovec, D. Chakrabarti, J. Kleinberg and C. Faloutsos, PKDD 2005 Scalable Modeling of Real Graphs using Kronecker Multiplication, by J. Leskovec and C. Faloutsos, ICML 2007 Statistical Properties of Community Structure in Large Social and Information Networks, by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney, WWW 2008 Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters, by J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney, Arxiv 2008 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.