CMU SCS Large Graph Mining: Power Tools and a Practitioner’s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU.

Slides:



Advertisements
Similar presentations
05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.
Advertisements

Partitional Algorithms to Detect Complex Clusters
The Stability of a Good Clustering Marina Meila University of Washington
Introduction to Network Theory: Modern Concepts, Algorithms
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
Spectral Clustering Scatter plot of a 2D data set K-means ClusteringSpectral Clustering U. von Luxburg. A tutorial on spectral clustering. Technical report,
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
CS 584. Review n Systems of equations and finite element methods are related.
3D Geometry for Computer Graphics
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
Zig-Zag Expanders Seminar in Theory and Algorithmic Research Sashka Davis UCSD, April 2005 “ Entropy Waves, the Zig-Zag Graph Product, and New Constant-
Multigrid Eigensolvers for Image Segmentation Andrew Knyazev Supported by NSF DMS This presentation is at
CSE 321 Discrete Structures Winter 2008 Lecture 25 Graph Theory.
3D Geometry for Computer Graphics
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Singular Value Decomposition and Data Management
Undirected ST-Connectivity In Log Space
Spectral Graph Theory. Outline: Definitions and different spectra Physical analogy Description of bisection algorithm Relationship of spectrum to graph.
CS8803-NS Network Science Fall 2013
Spectral Graph Theory (Basics)
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
1 Entropy Waves, The Zigzag Graph Product, and New Constant-Degree Expanders Omer Reingold Salil Vadhan Avi Wigderson Lecturer: Oded Levy.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Eigenvalue Problems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. The second main.
Wei Wang Xi’an Jiaotong University Generalized Spectral Characterization of Graphs: Revisited Shanghai Conference on Algebraic Combinatorics (SCAC), Shanghai,
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Different centrality measures of.
1 CS104 : Discrete Structures Chapter V Graph Theory.
Three different ways There are three different ways to show that ρ(A) is a simple eigenvalue of an irreducible nonnegative matrix A:
Fan Chung University of California, San Diego The PageRank of a graph.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Solving Linear Systems Solving linear systems Ax = b is one part of numerical linear algebra, and involves manipulating the rows of a matrix. Solving linear.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Graph Partitioning using Single Commodity Flows
Spectral Graph Theory and the Inverse Eigenvalue Problem of a Graph Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011
Spectral Partitioning: One way to slice a problem in half C B A.
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Presented by Alon Levin
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Mesh Segmentation via Spectral Embedding and Contour Analysis Speaker: Min Meng
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
1 Entropy Waves, The Zigzag Graph Product, and New Constant-Degree Expanders Omer Reingold Salil Vadhan Avi Wigderson Lecturer: Oded Levy.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Large Graph Mining: Power Tools and a Practitioner’s guide
Combinatorial Spectral Theory of Nonnegative Matrices
Markov Chains Mixing Times Lecture 5
Network analysis.
Structural Properties of Low Threshold Rank Graphs
Degree and Eigenvector Centrality
Ilan Ben-Bassat Omri Weinstein
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
Undirected ST-Connectivity In Log Space
Great Theoretical Ideas In Computer Science
GRAPHS Lecture 17 CS2110 Spring 2018.
Richard Anderson Winter 2019 Lecture 5
Presentation transcript:

CMU SCS Large Graph Mining: Power Tools and a Practitioner’s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Cheeger Inequality and Sparsest Cut: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-2

CMU SCS Matrix Representations of G(V,E) Associate a matrix to a graph: Adjacency matrix Laplacian Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-3 Main focus

CMU SCS Recall: Intuition A as vector transformation Axx’ = x

CMU SCS Intuition By defn., eigenvectors remain parallel to themselves (‘fixed points’) Av1v1 v1v1 = 3.62 * 1

CMU SCS Intuition By defn., eigenvectors remain parallel to themselves (‘fixed points’) And orthogonal to each other

CMU SCS Keep in mind! For the rest of slides we will be talking for square nxn matrices and symmetric ones, i.e, KDD'09Faloutsos, Miller, Tsourakakis P7-7

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Cheeger Inequality and Sparsest Cut: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-8

CMU SCS Adjacency matrix KDD'09Faloutsos, Miller, Tsourakakis P7-9 A= Undirected

CMU SCS Adjacency matrix KDD'09Faloutsos, Miller, Tsourakakis P7-10 A= Undirected Weighted

CMU SCS Adjacency matrix KDD'09Faloutsos, Miller, Tsourakakis P Observation If G is undirected, A = A T Directed

CMU SCS Spectral Theorem Theorem [Spectral Theorem] If M=M T, then KDD'09Faloutsos, Miller, Tsourakakis P7-12 Reminder 1: x i,x j orthogonal 0 0 x1x1 x2x2

CMU SCS Spectral Theorem Theorem [Spectral Theorem] If M=M T, then KDD'09Faloutsos, Miller, Tsourakakis P7-13 Reminder 2: x i i-th principal axis λ i length of i-th principal axis

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Cheeger Inequality and Sparsest Cut: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-14

CMU SCS Eigenvectors: Give groups Specifically for bi-partite graphs, we get each of the two sets of nodes Details:

CMU SCS Bipartite Graphs KDD'09Faloutsos, Miller, Tsourakakis P7-16 Any graph with no cycles of odd length is bipartite Q1: Can we check if a graph is bipartite via its spectrum? Q2: Can we get the partition of the vertices in the two sets of nodes? K 3,3

CMU SCS Bipartite Graphs KDD'09Faloutsos, Miller, Tsourakakis P7-17 Adjacency matrix where Eigenvalues: K 3,3 Λ=[3,-3,0,0,0,0]

CMU SCS Bipartite Graphs KDD'09Faloutsos, Miller, Tsourakakis P7-18 Adjacency matrix where K 3,3 Why λ 1 =-λ 2 =3? Recall: Ax=λx, (λ,x) eigenvalue-eigenvector

CMU SCS Bipartite Graphs KDD'09Faloutsos, Miller, Tsourakakis P =3x1 each node: eg., enthusiasm about a product

CMU SCS Bipartite Graphs KDD'09Faloutsos, Miller, Tsourakakis P =3x1 1-vector remains unchanged (just grows by ‘3’ = 1 )

CMU SCS Bipartite Graphs KDD'09Faloutsos, Miller, Tsourakakis P =3x1 Which other vector remains unchanged?

CMU SCS Bipartite Graphs KDD'09Faloutsos, Miller, Tsourakakis P =(-3)x1

CMU SCS Bipartite Graphs Observation u 2 gives the partition of the nodes in the two sets S, V-S! KDD'09Faloutsos, Miller, Tsourakakis P7-23 SV-S Theorem: λ 2 =-λ 1 iff G bipartite. u 2 gives the partition. Question: Were we just “lucky”?Answer: No

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Cheeger Inequality and Sparsest Cut: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-24

CMU SCS Walks A walk of length r in a directed graph: where a node can be used more than once. Closed walk when: KDD'09Faloutsos, Miller, Tsourakakis P Walk of length Closed walk of length

CMU SCS Walks Theorem: G(V,E) directed graph, adjacency matrix A. The number of walks from node u to node v in G with length r is (A r ) uv Proof: Induction on k. See Doyle-Snell, p.165 KDD'09Faloutsos, Miller, Tsourakakis P7-26

CMU SCS Walks Theorem: G(V,E) directed graph, adjacency matrix A. The number of walks from node u to node v in G with length r is (A r ) uv KDD'09Faloutsos, Miller, Tsourakakis P7-27 (i,j) (i, i 1 ),(i 1,j) (i,i 1 ),..,(i r-1,j)

CMU SCS Walks KDD'09Faloutsos, Miller, Tsourakakis P i=2, j=4

CMU SCS Walks KDD'09Faloutsos, Miller, Tsourakakis P i=3, j=3

CMU SCS Walks KDD'09Faloutsos, Miller, Tsourakakis P Always 0, node 4 is a sink

CMU SCS Walks Corollary: If A is the adjacency matrix of undirected G(V,E) (no self loops), e edges and t triangles. Then the following hold: a) trace(A) = 0 b) trace(A 2 ) = 2e c) trace(A 3 ) = 6t KDD'09Faloutsos, Miller, Tsourakakis P

CMU SCS Walks Corollary: If A is the adjacency matrix of undirected G(V,E) (no self loops), e edges and t triangles. Then the following hold: a) trace(A) = 0 b) trace(A 2 ) = 2e c) trace(A 3 ) = 6t KDD'09Faloutsos, Miller, Tsourakakis P7-32 Computing A r may be expensive!

CMU SCS Remark: virus propagation The earlier result makes sense now: The higher the first eigenvalue, the more paths available -> Easier for a virus to survive

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Cheeger Inequality and Sparsest Cut: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-34

CMU SCS Main upcoming result the second eigenvector of the Laplacian (u 2 ) gives a good cut: Nodes with positive scores should go to one group And the rest to the other

CMU SCS Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-36 L= D-A= Diagonal matrix, d ii =d i

CMU SCS Weighted Laplacian KDD'09Faloutsos, Miller, Tsourakakis P

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Cheeger Inequality and Sparsest Cut: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-38

CMU SCS Connected Components Lemma: Let G be a graph with n vertices and c connected components. If L is the Laplacian of G, then rank(L)= n-c. Proof: see p.279, Godsil-Royle KDD'09Faloutsos, Miller, Tsourakakis P7-39

CMU SCS Connected Components KDD'09Faloutsos, Miller, Tsourakakis P7-40 G(V,E) L= eig(L)= #zeros = #components

CMU SCS Connected Components KDD'09Faloutsos, Miller, Tsourakakis P7-41 G(V,E) L= eig(L)= #zeros = #components Indicates a “good cut”

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Cheeger Inequality and Sparsest Cut: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-42

CMU SCS Adjacency vs. Laplacian Intuition KDD'09Faloutsos, Miller, Tsourakakis P7-43 V-S Let x be an indicator vector: S Consider now y=Lx k-th coordinate

CMU SCS Adjacency vs. Laplacian Intuition KDD'09Faloutsos, Miller, Tsourakakis P7-44 Consider now y=Lx G 30,0.5 S k

CMU SCS Adjacency vs. Laplacian Intuition KDD'09Faloutsos, Miller, Tsourakakis P7-45 Consider now y=Lx G 30,0.5 S k

CMU SCS Adjacency vs. Laplacian Intuition KDD'09Faloutsos, Miller, Tsourakakis P7-46 Consider now y=Lx G 30,0.5 S k k Laplacian: connectivity, Adjacency: #paths

CMU SCS Outline Reminders Adjacency matrix –Intuition behind eigenvectors: Eg., Bipartite Graphs –Walks of length k Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Sparsest Cut and Cheeger inequality: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-47

CMU SCS Why Sparse Cuts? Clustering, Community Detection And more: Telephone Network Design, VLSI layout, Sparse Gaussian Elimination, Parallel Computation KDD'09Faloutsos, Miller, Tsourakakis P cut

CMU SCS Quality of a Cut Isoperimetric number φ of a cut S: KDD'09Faloutsos, Miller, Tsourakakis P #edges across #nodes in smallest partition

CMU SCS Quality of a Cut Isoperimetric number φ of a graph = score of best cut: KDD'09Faloutsos, Miller, Tsourakakis P and thus

CMU SCS Quality of a Cut Isoperimetric number φ of a graph = score of best cut: KDD'09Faloutsos, Miller, Tsourakakis P Best cut: hard to find BUT: Cheeger’s inequality gives bounds 2 : Plays major role Let’s see the intuition behind 2

CMU SCS Laplacian and cuts - overview A cut corresponds to an indicator vector (ie., 0/1 scores to each node) Relaxing the 0/1 scores to real numbers, gives eventually an alternative definition of the eigenvalues and eigenvectors

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-53 Characteristic Vector x Then: S V-S Edges across cut

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P cut x=[1,1,1,1,0,0,0,0,0] T S V-S x T Lx=2

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-55 Ratio cut Sparsest ratio cut NP-hard Relax the constraint: Normalize: ?

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-56 Sparsest ratio cut NP-hard Relax the constraint: Normalize: λ2λ2 because of the Courant-Fisher theorem (applied to L)

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-57 Each ball 1 unit of mass x1x1 xnxn OSCILLATE Dfn of eigenvector Matrix viewpoint:

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-58 Each ball 1 unit of mass x1x1 xnxn OSCILLATE Force due to neighbors displacement Hooke’s constant Physics viewpoint:

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-59 Each ball 1 unit of mass Eigenvector value Node id x1x1 xnxn OSCILLATE For the first eigenvector: All nodes: same displacement (= value)

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-60 Each ball 1 unit of mass Eigenvector value Node id x1x1 xnxn OSCILLATE

CMU SCS Why λ 2 ? KDD'09Faloutsos, Miller, Tsourakakis P7-61 Fundamental mode of vibration: “along” the separator

CMU SCS Cheeger Inequality KDD'09Faloutsos, Miller, Tsourakakis P7-62 Score of best cut (hard to compute) Max degree 2 nd smallest eigenvalue (easy to compute)

CMU SCS Cheeger Inequality and graph partitioning heuristic: KDD'09Faloutsos, Miller, Tsourakakis P7-63 Step 1: Sort vertices in non-decreasing order according to their score of the second eigenvector Step 2: Decide where to cut. Bisection Best ratio cut Two common heuristics

CMU SCS Outline Reminders Adjacency matrix Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Sparsest Cut and Cheeger inequality: Derivation, intuition Example Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-64

CMU SCS Example: Spectral Partitioning KDD'09Faloutsos, Miller, Tsourakakis P7-65 K 500 dumbbell graph A = zeros(1000); A(1:500,1:500)=ones(500)-eye(500); A(501:1000,501:1000)= ones(500)-eye(500); myrandperm = randperm(1000); B = A(myrandperm,myrandperm); In social network analysis, such clusters are called communities

CMU SCS Example: Spectral Partitioning This is how adjacency matrix of B looks KDD'09Faloutsos, Miller, Tsourakakis P7-66 spy(B)

CMU SCS Example: Spectral Partitioning This is how the 2 nd eigenvector of B looks like. KDD'09Faloutsos, Miller, Tsourakakis P7-67 L = diag(sum(B))-B; [u v] = eigs(L,2,'SM'); plot(u(:,1),’x’) Not so much information yet…

CMU SCS Example: Spectral Partitioning This is how the 2 nd eigenvector looks if we sort it. KDD'09Faloutsos, Miller, Tsourakakis P7-68 [ign ind] = sort(u(:,1)); plot(u(ind),'x') But now we see the two communities!

CMU SCS Example: Spectral Partitioning This is how adjacency matrix of B looks now KDD'09Faloutsos, Miller, Tsourakakis P7-69 spy(B(ind,ind)) Cut here! Community 1 Community 2 Observation: Both heuristics are equivalent for the dumbbell

CMU SCS Outline Reminders Adjacency matrix Laplacian –Connected Components –Intuition: Adjacency vs. Laplacian –Sparsest Cut and Cheeger inequality: Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-70

CMU SCS Why Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-71 K 500 The only weighted edge! Cut here φ=φ=φ=φ= > So, φ is not good here…

CMU SCS Why Normalized Laplacian KDD'09Faloutsos, Miller, Tsourakakis P7-72 K 500 The only weighted edge! Cut here φ=φ=φ=φ= > Optimize Cheeger constant h(G), balanced cuts where

CMU SCS Extensions Normalized Laplacian –Ng, Jordan, Weiss Spectral Clustering –Laplacian Eigenmaps for Manifold Learning –Computer Vision and many more applications… KDD'09Faloutsos, Miller, Tsourakakis P7-73 Standard reference: Spectral Graph Theory Monograph by Fan Chung Graham

CMU SCS Conclusions Spectrum tells us a lot about the graph: Adjacency: #Paths Laplacian: Sparse Cut Normalized Laplacian: Normalized cuts, tend to avoid unbalanced cuts KDD'09Faloutsos, Miller, Tsourakakis P7-74

CMU SCS References Fan R. K. Chung: Spectral Graph Theory (AMS) Chris Godsil and Gordon Royle: Algebraic Graph Theory (Springer) Bojan Mohar and Svatopluk Poljak: Eigenvalues in Combinatorial Optimization, IMA Preprint Series #939 Gilbert Strang: Introduction to Applied Mathematics (Wellesley-Cambridge Press) KDD'09Faloutsos, Miller, Tsourakakis P7-75