CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs.
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
CMU SCS Copyright: C. Faloutsos (2012)# : Multimedia Databases and Data Mining Lecture #27: Graph mining - Communities and a paradox Christos.
1 The PageRank Citation Ranking: Bring Order to the web Lawrence Page, Sergey Brin, Rajeev Motwani and Terry Winograd Presented by Fei Li.
Dimensionality Reduction PCA -- SVD
Link Analysis: PageRank
DATA MINING LECTURE 7 Minimum Description Length Principle Information Theory Co-Clustering.
15-826: Multimedia Databases and Data Mining
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Text Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and video.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Link Analysis, PageRank and Search Engines on the Web
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
1 Fully Automatic Cross-Associations Deepayan Chakrabarti (CMU) Spiros Papadimitriou (CMU) Dharmendra Modha (IBM) Christos Faloutsos (CMU and IBM)
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Spectral Graph Theory (Basics)
SVD(Singular Value Decomposition) and Its Applications
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
CMU SCS Large Graph Mining: Power Tools and a Practitioner’s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
DATA MINING LECTURE 10 Minimum Description Length Information Theory Co-Clustering.
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Talk 2: Graph Mining Tools - SVD, ranking, proximity Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
Jure Leskovec and Christos Faloutsos Machine Learning Department
15-826: Multimedia Databases and Data Mining
NetMine: Mining Tools for Large Graphs
Graph and Tensor Mining for fun and profit
LSI, SVD and Data Management
Large Graph Mining: Power Tools and a Practitioner’s guide
Singular Value Decomposition
Part 2: Graph Mining Tools - SVD and ranking
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Graph and Tensor Mining for fun and profit
15-826: Multimedia Databases and Data Mining
Next Generation Data Mining Tools: SVD and Fractals
Presentation transcript:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 2 Welcome ! TueWeThu 9:00-10:30ToolsLaplaciansParallelism 11:00-12:30NELLRich graphsCommunities 1:30-3:00ExercisesPanelScalability 3:30-5:00Graph. modelsPostersGraph ‘Laws’ Reception

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 3 Roadmap Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions

CMU SCS C. Faloutsos (CMU) 4 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] >$10B revenue >0.5B users Graph Analytics wkshp

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 5 Graphs - why should we care? IR: bi-partite graphs (doc-terms) ‘NELL’: ‘ merkel ’ ‘ chancellor ’ ‘ germany ’ - facts -> tensors web: hyper-text graph... and more: D1D1 DNDN T1T1 TMTM...

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 6 Graphs - why should we care? ‘viral’ marketing web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection.... Any M:N relationship -> Graph Any subject-verb-object construct: -> Graph/Tensor

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 7 Graphs and matrices Closely related Powerful tools from matrix algebra, for graph mining

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 8 Examples of Matrices: Graph - social network John PeterMaryNick... John Peter Mary Nick

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 9 Examples of Matrices: Market basket market basket as in Association Rules milkbreadchoc.wine... John Peter Mary Nick...

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 10 Examples of Matrices: Documents and terms Paper#1 Paper#2 Paper#3 Paper#4 dataminingclassif.tree...

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 11 Examples of Matrices: Authors and terms dataminingclassif.tree... John Peter Mary Nick...

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 12 Roadmap Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 13 Node importance - Motivation: Given a graph (eg., web pages containing the desirable query word) Q: Which node is the most important?

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 14 Node importance - Motivation: Given a graph (eg., web pages containing the desirable query word) Q: Which node is the most important? A1: HITS (SVD = Singular Value Decomposition) A2: eigenvector (PageRank) ‘I am important, if my friends are important’ -> Fixed point / eigenvector

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 15 Node importance - motivation SVD and eigenvector analysis: very closely related

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 16 Roadmap Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 17 Task 1 - SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case Studies –HITS –PageRank

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 18 SVD - Motivation problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 19 SVD - Motivation problem #1: text - LSI: find ‘concepts’

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 20 SVD - Motivation Customer-product, for recommendation system: bread lettuce beef vegetarians meat eaters tomatos chicken

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 21 SVD - Motivation problem #2: compress / reduce dimensionality

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 22 Problem - specs Visualize customers

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 23 SVD - Motivation

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 24 SVD - Motivation

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 25 Task 1 - SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case Studies –HITS –PageRank

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 26 SVD - Definition A = U  V T - example:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 27 SVD - Definition A [n x m] = U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 28 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U  V T, where U,  V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) –U T U = I; V T V = I (I: identity matrix)  : singular are positive, and sorted in decreasing order

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 29 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 30 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 31 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 32 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 33 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 34 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 35 Task 1 - SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 36 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 37 SVD – Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? A: Q: A A T ? A:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 38 Copyright: Faloutsos, Tong (2009) 2-38 SVD – Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? A: term-to-term ([m x m]) similarity matrix Q: A A T ? A: document-to-document ([n x n]) similarity matrix ICDE’09

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 39 Copyright: Faloutsos, Tong (2009) 2-39 SVD properties V are the eigenvectors of the covariance matrix A T A U are the eigenvectors of the Gram (inner- product) matrix AA T Further reading: 1. Ian T. Jolliffe, Principal Component Analysis (2 nd ed), Springer, Gilbert Strang, Linear Algebra and Its Applications (4 th ed), Brooks Cole, 2005.

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 40 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 41 SVD - Motivation

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 42 SVD - interpretation #2 minimum RMS error SVD: gives best axis to project v1 first singular vector

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 43 SVD - Interpretation #2

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 44 SVD - Interpretation #2 A = U  V T - example: = xx v1

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 45 SVD - Interpretation #2 A = U  V T - example: = xx variance (‘spread’) on the v1 axis

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 46 SVD - Interpretation #2 A = U  V T - example: –U  gives the coordinates of the points in the projection axis = xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 47 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? = xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 48 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? A: set the smallest singular values to zero: = xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 49 SVD - Interpretation #2 ~ xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 50 SVD - Interpretation #2 ~ xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 51 SVD - Interpretation #2 ~ xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 52 SVD - Interpretation #2 ~

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 53 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 54 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 55 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT n m

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 56 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT n m n x 1 1 x m r terms

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 57 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) =u1u1 1 vT1vT1 u2u2 2 vT2vT n m assume: 1 >= 2 >=...

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 58 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s) =u1u1 1 vT1vT1 u2u2 2 vT2vT n m assume: 1 >= 2 >=...

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 59 Pictorially: matrix form of SVD –Best rank-k approximation in L2 A m n  m n U VTVT 

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 60 Pictorially: Spectral form of SVD –Best rank-k approximation in L2 A m n  + 1u1v11u1v1 2u2v22u2v2

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 61 Task 1 - SVD - Detailed outline Motivation Definition - properties Interpretation –#1: documents/terms/concepts –#2: dim. reduction –#3: picking non-zero, rectangular ‘blobs’ Complexity Case studies Additional properties

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 62 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 63 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 64 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = ‘communities’ (bi-partite cores, here) Row 1 Row 4 Col 1 Col 3 Col 4 Row 5 Row 7

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 65 Task 1 - SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case Studies –HITS –PageRank

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 66 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less) less work, if we just want singular values or if we want first k singular vectors or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica...)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 67 SVD - conclusions so far SVD: A= U  V T : unique (*) U: document-to-concept similarities V: term-to-concept similarities  : strength of each concept dim. reduction: keep the first few strongest singular values (80-90% of ‘energy’) –SVD: picks up linear correlations SVD: picks up non-zero ‘blobs’

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 68 Task 1 - SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case Studies –HITS –PageRank

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 69 Kleinberg’s algo (HITS) Kleinberg, Jon (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 70 Recall: problem dfn Given a graph (eg., web pages containing the desirable query word) Q: Which node is the most important?

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 71 Kleinberg’s algorithm Problem dfn: given the web and a query find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms Step 1: expand by one move forward and backward

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 72 Kleinberg’s algorithm Step 1: expand by one move forward and backward

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 73 Kleinberg’s algorithm on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to give high importance score (‘hubs’) to nodes that point to good ‘authorities’) hubsauthorities

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 74 Kleinberg’s algorithm observations recursive definition! each node (say, ‘i’-th node) has both an authoritativeness score a i and a hubness score h i

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 75 Kleinberg’s algorithm Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores. Then:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 76 Kleinberg’s algorithm Then: a i = h k + h l + h m that is a i = Sum (h j ) over all j that (j,i) edge exists or a = A T h k l m i

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 77 Kleinberg’s algorithm symmetrically, for the ‘hubness’: h i = a n + a p + a q that is h i = Sum (q j ) over all j that (i,j) edge exists or h = A a p n q i

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 78 Kleinberg’s algorithm In conclusion, we want vectors h and a such that: h = A a a = A T h SVD properties: A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] u 1 T A = 1 v 1 T =

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 79 Kleinberg’s algorithm In short, the solutions to h = A a a = A T h are the left- and right- singular-vectors of the adjacency matrix A. Starting from random a’ and iterating, we’ll eventually converge (Q: to which of all the singular-vectors? why?)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 80 Kleinberg’s algorithm (Q: to which of all the singular-vectors? why?) A: to the ones of the strongest singular-value: (A T A ) k v’ ~ (constant) v 1

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 81 Kleinberg’s algorithm - results Eg., for the query ‘java’: java.sun.com (“the java developer”)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 82 Kleinberg’s algorithm - discussion ‘authority’ score can be used to find ‘similar pages’ (how?)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 83 Task 1 - SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case Studies –HITS –PageRank

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 84 PageRank (google) Brin, Sergey and Lawrence Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf. Larry Page Sergey Brin

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 85 Problem: PageRank Given a directed graph, find its most interesting/central node A node is important, if it is connected with important nodes (recursive, but OK!)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 86 Problem: PageRank - solution Given a directed graph, find its most interesting/central node Proposed solution: Random walk; spot most ‘popular’ node (-> steady state prob. (ssp)) A node has high ssp, if it is connected with high ssp nodes (recursive, but OK!)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 87 (Simplified) PageRank algorithm Let A be the adjacency matrix; let B be the transition matrix: transpose, column-normalized - then = To From B

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 88 (Simplified) PageRank algorithm B p = p =

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 89 Definitions AAdjacency matrix (from-to) DDegree matrix = (diag ( d1, d2, …, dn) ) BTransition matrix: to-from, column normalized B = A T D -1

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 90 (Simplified) PageRank algorithm B p = 1 * p thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is column-normalized ) Why does such a p exist? –p exists if B is nxn, nonnegative, irreducible [Perron–Frobenius theorem]

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 91 (Simplified) PageRank algorithm In short: imagine a particle randomly moving along the edges compute its steady-state probabilities (ssp) Full version of algo: with occasional random jumps Why? To make the matrix irreducible

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 92 Full Algorithm With probability 1-c, fly-out to a random node Then, we have p = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 93 Alternative notation MModified transition matrix M = c B + (1-c)/n 1 1 T Then p = M p That is: the steady state probabilities = PageRank scores form the first eigenvector of the ‘modified transition matrix’

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 94 Parenthesis: intuition behind eigenvectors

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 95 Formal definition If A is a (n x n) square matrix , x) is an eigenvalue/eigenvector pair of A if A x = x CLOSELY related to singular values:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 96 Property #1: Eigen- vs singular-values if B [n x m] = U [n x r]   r x r] (V [m x r] ) T then A = ( B T B ) is symmetric and C(4): B T B v i = i 2 v i ie, v 1, v 2,...: eigenvectors of A = (B T B)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 97 Property #2 If A [nxn] is a real, symmetric matrix Then it has n real eigenvalues (if A is not symmetric, some eigenvalues may be complex)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 98 Property #3 If A [nxn] is a real, symmetric matrix Then it has n real eigenvalues And they agree with its n singular values, except possibly for the sign

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 99 Intuition A as vector transformation Axx’ = x

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 100 Intuition By defn., eigenvectors remain parallel to themselves (‘fixed points’) Av1v1 v1v1 = 3.62 * 1

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 101 Convergence Usually, fast:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 102 Convergence Usually, fast:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 103 Convergence Usually, fast: depends on ratio 1 : 2 1 2

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 104 Kleinberg/google - conclusions SVD helps in graph analysis: hub/authority scores: strongest left- and right- singular-vectors of the adjacency matrix random walk on a graph: steady state probabilities are given by the strongest eigenvector of the (modified) transition matrix

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 105 Conclusions SVD: a valuable tool given a document-term matrix, it finds ‘concepts’ (LSI)... and can find fixed-points or steady-state probabilities (google/ Kleinberg/ Markov Chains)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 106 Conclusions cont’d (We didn’t discuss/elaborate, but, SVD... can reduce dimensionality (KL)... and can find rules (PCA; RatioRules)... and can solve optimally over- and under- constraint linear systems (least squares / query feedbacks)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 107 References Berry, Michael: Brin, S. and L. Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 108 References Christos Faloutsos, Searching Multimedia Databases by Content, Springer, (App. D)Searching Multimedia Databases by Content Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. I.T. Jolliffe Principal Component Analysis Springer, 2002 (2 nd ed.)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 109 References cont’d Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms. Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 110 PART 2: Communities

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 111 Roadmap Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 112 Task 2 – Communities - Detailed outline Motivation Hard clustering – k pieces Hard clustering – optimal # pieces Observations

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 113 Problem Given a graph, and k Break it into k (disjoint) communities

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 114 Problem Given a graph, and k Break it into k (disjoint) communities k = 2

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 115 Solution #1: METIS Arguably, the best algorithm Open source, at – and *many* related papers, at same url Main idea: –coarsen the graph; –partition; –un-coarsen

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 116 Solution #1: METIS G. Karypis and V. Kumar. METIS 4.0: Unstructured graph partitioning and sparse matrix ordering system. TR, Dept. of CS, Univ. of Minnesota, 1998.

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 117 Solution #2 (problem: hard clustering, k pieces) Spectral partitioning: Consider the 2 nd smallest eigenvector of the (normalized) Laplacian See details in ‘Task 7’, later

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 118 Solutions #3, … Many more ideas: Clustering on the A 2 (square of adjacency matrix) [Zhou, Woodruff, PODS’04] Minimum cut / maximum flow [Flake+, KDD’00] …

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 119 Task 2 – Communities - Detailed outline Motivation Hard clustering – k pieces Hard clustering – optimal # pieces Observations

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 120 Cross-association Desiderata: Simultaneously discover row and column groups Fully Automatic: No “magic numbers” Scalable to large matrices Reference: 1.Chakrabarti et al. Fully Automatic Cross-Associations, KDD’04

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 121 What makes a cross-association “good”? versus Column groups Row groups Why is this better?

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 122 What makes a cross-association “good”? versus Column groups Row groups Why is this better? simpler; easier to describe easier to compress!

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 123 What makes a cross-association “good”? Problem definition: given an encoding scheme decide on the # of col. and row groups k and l and reorder rows and columns, to achieve best compression

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 124 Main Idea size i * H(x i ) + Cost of describing cross-associations Code Cost Description Cost ΣiΣi Total Encoding Cost = Good Compression Better Clustering Minimize the total cost (# bits) for lossless compression

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 125 Algorithm k = 5 row groups k=1, l=2 k=2, l=2 k=2, l=3 k=3, l=3 k=3, l=4 k=4, l=4 k=4, l=5 l = 5 col groups

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 126 Experiments “CLASSIC” 3,893 documents 4,303 words 176,347 “dots” Combination of 3 sources: MEDLINE (medical) CISI (info. retrieval) CRANFIELD (aerodynamics) Documents Words

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 127 Experiments “CLASSIC” graph of documents & words: k=15, l=19 Documents Words

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 128 Experiments “CLASSIC” graph of documents & words: k=15, l=19 MEDLINE (medical) insipidus, alveolar, aortic, death, prognosis, intravenous blood, disease, clinical, cell, tissue, patient

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 129 Experiments “CLASSIC” graph of documents & words: k=15, l=19 CISI (Information Retrieval) providing, studying, records, development, students, rules abstract, notation, works, construct, bibliographies MEDLINE (medical)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 130 Experiments “CLASSIC” graph of documents & words: k=15, l=19 CRANFIELD (aerodynamics) shape, nasa, leading, assumed, thin CISI (Information Retrieval) MEDLINE (medical)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 131 Experiments “CLASSIC” graph of documents & words: k=15, l=19 paint, examination, fall, raise, leave, based CRANFIELD (aerodynamics) CISI (Information Retrieval) MEDLINE (medical)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 132 Algorithm Code for cross-associations (matlab): tgz Variations and extensions: ‘Autopart’ [Chakrabarti, PKDD’04]

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 133 Algorithm Hadoop implementation [ICDM’08] Spiros Papadimitriou, Jimeng Sun: DisCo: Distributed Co-clustering with Map- Reduce: A Case Study towards Petabyte-Scale End-to-End Mining. ICDM 2008:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 134 Task 2 – Communities - Detailed outline Motivation Hard clustering – k pieces Hard clustering – optimal # pieces Observations

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 135 Observation #1 Skewed degree distributions – there are nodes with huge degree (>O(10^4), in facebook/linkedIn popularity contests!)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 136 Observation #2 Maybe there are no good cuts: ``jellyfish’’ shape [Tauro+’01], [Siganos+,’06], strange behavior of cuts [Chakrabarti+’04], [Leskovec+,’08]

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 137 Observation #2 Maybe there are no good cuts: ``jellyfish’’ shape [Tauro+’01], [Siganos+,’06], strange behavior of cuts [Chakrabarti+,’04], [Leskovec+,’08] ? ?

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 138 Jellyfish model [Tauro+] … A Simple Conceptual Model for the Internet Topology, L. Tauro, C. Palmer, G. Siganos, M. Faloutsos, Global Internet, November 25-29, 2001 Jellyfish: A Conceptual Model for the AS Internet Topology G. Siganos, Sudhir L Tauro, M. Faloutsos, J. of Communications and Networks, Vol. 8, No. 3, pp , Sept

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 139 Strange behavior of min cuts ‘negative dimensionality’ (!) NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy Statistical Properties of Community Structure in Large Social and Information Networks, J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. WWW 2008.

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 140 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes Mincut size = sqrt(N)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 141 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 142 “Min-cut” plot Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut Slope = -0.5 For a d-dimensional grid, the slope is -1/d

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 143 “Min-cut” plot log (# edges) log (mincut-size / #edges) Slope = -1/d For a d-dimensional grid, the slope is -1/d log (# edges) log (mincut-size / #edges) For a random graph, the slope is 0

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 144 “Min-cut” plot What does it look like for a real-world graph? log (# edges) log (mincut-size / #edges) ?

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 145 Experiments Datasets: –Google Web Graph: 916,428 nodes and 5,105,039 edges –Lucent Router Graph: Undirected graph of network routers from 112,969 nodes and 181,639 edges –User  Website Clickstream Graph: 222,704 nodes and 952,580 edges NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 146 Experiments Used the METIS algorithm [ Karypis, Kumar, 1995] log (# edges) log (mincut-size / #edges) Google Web graph Values along the y- axis are averaged “lip” for large edges Slope of -0.4, corresponds to a 2.5- dimensional grid! Slope~ -0.4

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 147 Experiments Same results for other graphs too… log (# edges) log (mincut-size / #edges) Lucent Router graphClickstream graph Slope~ Slope~ -0.45

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 148 Task 2 – Communities Conclusions – Practitioner’s guide Hard clustering – k pieces Hard clustering – optimal # pieces Observations METIS Cross-associations ‘jellyfish’: no good cuts

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 149 PART 3: Tensors

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 150 Roadmap Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 151 Task 3 – Tensors - Detailed roadmap Motivation Definitions: PARAFAC Case study: web mining

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 152 Examples of Matrices: Authors and terms dataminingclassif.tree... John Peter Mary Nick...

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 153 But: if it changes over time?? A: treat it as ‘tensor’ dataminingclassif.tree... John Peter Mary Nick... KDD’08 KDD’07 KDD’09

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 154 Motivation: Why tensors? Q: what is a tensor?

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 155 Motivation: Why tensors? A: N-D generalization of matrix: dataminingclassif.tree... John Peter Mary Nick... KDD’09

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 156 Motivation: Why tensors? A: N-D generalization of matrix: dataminingclassif.tree... John Peter Mary Nick... KDD’08 KDD’07 KDD’09

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 157 Tensors are useful for 3 or more modes Terminology: ‘mode’ (or ‘aspect’): dataminingclassif.tree... Mode (== aspect) #1 Mode#2 Mode#3

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 158 Notice 3 rd mode does not need to be time we can have more than 3 modes... IP destination Dest. port IP source

CMU SCS Background: Tensors Tensors (=multi-dimensional arrays) are everywhere –Sensor stream (time, location, type) –Predicates (subject, verb, object) in knowledge base “Barrack Obama is the president of U.S.” “Eric Clapton plays guitar” (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M Graph Analytics wkshp 159 C. Faloutsos (CMU)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 160 Task 3 – Tensors - Detailed roadmap Motivation Definitions: PARAFAC Case study: web mining

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 161 Tensor basics Multi-mode extensions of SVD – recall that:

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 162 Reminder: SVD –Best rank-k approximation in L2 A m n  m n U VTVT 

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 163 Reminder: SVD –Best rank-k approximation in L2 A m n  + 1u1v11u1v1 2u2v22u2v2

CMU SCS Extension to (>=)3 modes Graph Analytics wkshp 164 C. Faloutsos (CMU)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 165 Main points: 2 major types of tensor decompositions: PARAFAC and Tucker (not examined here) both can be solved with ``alternating least squares’’ (ALS)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 166 Task 3 – Tensors - Detailed outline Motivation Definitions: PARAFAC Case study: web mining

CMU SCS Discoveries: Problem Definition Most important concepts and synonyms? (26M) (48M) NELL (Never Ending Language Learner) data Nonzeros =144M Graph Analytics wkshp 167 C. Faloutsos (CMU)

CMU SCS A1: Concept Discovery Concept Discovery in Knowledge Base Graph Analytics wkshp 168 C. Faloutsos (CMU)

CMU SCS A2.1: Concept Discovery Graph Analytics wkshp 169 C. Faloutsos (CMU)

CMU SCS A2: Synonym Discovery Synonym Discovery in Knowledge Base a1a1 a2a2 aRaR … (Given) noun phrase (Discovered) synonym 1 (Discovered) synonym 2 Graph Analytics wkshp 170 C. Faloutsos (CMU)

CMU SCS 171 C. Faloutsos (CMU) A2: Synonym Discovery Graph Analytics wkshp

CMU SCS GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos KDD 2012 Evangelos Papalexakis Abhay Harpale Graph Analytics wkshp 172 C. Faloutsos (CMU)

CMU SCS Experiments GigaTensor solves 100x larger problem Number of nonzero = I / 50 (J) (I) (K) GigaTensor Tensor Toolbox Out of Memory 100x Graph Analytics wkshp 173 C. Faloutsos (CMU)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 174 Conclusions Real data may have multiple aspects (modes) Tensors provide elegant theory and algorithms –PARAFAC (and Tucker): discover groups GigaTensor: scales up (hadoop/PEGASUS) –

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 175 References T. G. Kolda, B. W. Bader and J. P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. In: ICDM 2005, Pages , November Jimeng Sun, Spiros Papadimitriou, Philip Yu. Window-based Tensor Analysis on High- dimensional and Multi-aspect Streams, Proc. of the Int. Conf. on Data Mining (ICDM), Hong Kong, China, Dec 2006

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 176 Resources See tutorial on tensors, KDD’07 (w/ Tamara Kolda and Jimeng Sun):

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 177 Tensor tools - resources Toolbox: from Tamara Kolda: csmr.ca.sandia.gov/~tgkolda/TensorToolbox Copyright: Faloutsos, Tong (2009) ICDE’09 T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review, Volume 51, Number 3, September 2009 csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/TensorReview-preprint.pdf T. Kolda and J. Sun: Scalable Tensor Decomposition for Multi-Aspect Data Mining (ICDM 2008)

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 178 PART 4: Theory

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 179 Roadmap Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time – Tensors Task 4: Theory – intro to Laplacians Conclusions

CMU SCS Task 4 – Theory - Detailed roadmap Adjacency matrix Laplacian –Connected Components –Intuition: 2 nd smallest eigenvalue -> ‘good cut’ 180 Graph Analytics wkshp 180 C. Faloutsos (CMU)

CMU SCS Adjacency matrix Graph Analytics wkshpC. Faloutsos (CMU) 181 A=

CMU SCS Adjacency matrix Graph Analytics wkshpC. Faloutsos (CMU) 182 A= step-away paths

CMU SCS Adjacency matrix Graph Analytics wkshpC. Faloutsos (CMU) Obvious extensions, for directed and/or weighted cases

CMU SCS Task 4 – Theory - Detailed roadmap Adjacency matrix Laplacian –Connected Components –Intuition: 2 nd smallest eigenvalue -> ‘good cut’ 184 Graph Analytics wkshp 184 C. Faloutsos (CMU)

CMU SCS Main upcoming result the second smallest eigenvector of the Laplacian (u 2 ) gives a good cut: Nodes with positive scores should go to one group And the rest to the other Graph Analytics wkshp 185 C. Faloutsos (CMU)

CMU SCS Laplacian Graph Analytics wkshpC. Faloutsos (CMU) 186 L= D-A= Diagonal matrix, d ii =d i

CMU SCS Task 4 – Theory - Detailed roadmap Adjacency matrix Laplacian –Connected Components –Intuition: 2 nd smallest eigenvalue -> ‘good cut’ 187 Graph Analytics wkshp 187 C. Faloutsos (CMU)

CMU SCS Connected Components Lemma: Let G be a graph with n vertices and c connected components. If L is the Laplacian of G, then rank(L)= n-c. Proof: see p.279, Godsil-Royle Graph Analytics wkshpC. Faloutsos (CMU) 188

CMU SCS Connected Components Graph Analytics wkshpC. Faloutsos (CMU) 189 G(V,E) L= eig(L)=

CMU SCS Connected Components Graph Analytics wkshpC. Faloutsos (CMU) 190 G(V,E) L= eig(L)= #zeros = #components

CMU SCS Connected Components Graph Analytics wkshpC. Faloutsos (CMU) 191 G(V,E) L= eig(L)=

CMU SCS Connected Components Graph Analytics wkshpC. Faloutsos (CMU) 192 G(V,E) L= eig(L)= #zeros = #components

CMU SCS Connected Components Graph Analytics wkshpC. Faloutsos (CMU) 193 G(V,E) L= eig(L)= Indicates a “good cut”

CMU SCS Task 4 – Theory - Detailed roadmap Reminders Adjacency matrix Laplacian –Connected Components –Intuition: 2 nd smallest eigenvalue -> ‘good cut’ 194 Graph Analytics wkshp 194 C. Faloutsos (CMU)

CMU SCS Example: Spectral Partitioning Graph Analytics wkshpC. Faloutsos (CMU) 195 K 500 dumbbell graph ?Montagues Capulets Romeo Juliet

CMU SCS Example: Spectral Partitioning This is how adjacency matrix of B looks Graph Analytics wkshpC. Faloutsos (CMU) 196 spy(B)

CMU SCS Example: Spectral Partitioning 2 nd eigenvector u 2 of B: B u 2 = u 2 Graph Analytics wkshpC. Faloutsos (CMU) 197 L = diag(sum(B))-B; [u v] = eigs(L,2,'SM'); plot(u(:,1),’x’) Not so much information yet… Node-id ‘i’ u 2,i score

CMU SCS Example: Spectral Partitioning 2 nd eigenvector after sorting on x 2,i score Graph Analytics wkshpC. Faloutsos (CMU) 198 [ign ind] = sort(u(:,1)); plot(u(ind),'x') x 2,i score Node-id ‘i’

CMU SCS Example: Spectral Partitioning 2 nd eigenvector after sorting on x 2,i score Graph Analytics wkshpC. Faloutsos (CMU) 199 [ign ind] = sort(u(:,1)); plot(u(ind),'x') But now we see the two communities! x 2,i score Node-id ‘i’

CMU SCS Example: Spectral Partitioning This is how adjacency matrix of B looks now Graph Analytics wkshpC. Faloutsos (CMU) 200 spy(B(ind,ind))

CMU SCS Why λ 2 ? 201 Each ball 1 unit of mass x1x1 xnxn OSCILLATE Dfn of eigenvector Matrix viewpoint: Graph Analytics wkshp 201 C. Faloutsos (CMU)

CMU SCS Why λ 2 ? 202 Each ball 1 unit of mass x1x1 xnxn OSCILLATE Force due to neighbors displacement Hooke’s constant Physics viewpoint: Graph Analytics wkshp 202 C. Faloutsos (CMU)

CMU SCS Why λ 2 ? Graph Analytics wkshpC. Faloutsos (CMU) 203 Each ball 1 unit of mass Eigenvector value Node id x1x1 xnxn OSCILLATE For the first eigenvector: All nodes: same displacement (= value)

CMU SCS Why λ 2 ? 204 Each ball 1 unit of mass Eigenvector value Node id x1x1 xnxn OSCILLATE Graph Analytics wkshp 204 C. Faloutsos (CMU)

CMU SCS Conclusions Spectrum tells us a lot about the graph: Adjacency: #Paths Laplacian: Sparse Cut Graph Analytics wkshpC. Faloutsos (CMU) 205

CMU SCS References Fan R. K. Chung: Spectral Graph Theory (AMS) Chris Godsil and Gordon Royle: Algebraic Graph Theory (Springer) Bojan Mohar and Svatopluk Poljak: Eigenvalues in Combinatorial Optimization, IMA Preprint Series #939 Gilbert Strang: Introduction to Applied Mathematics (Wellesley-Cambridge Press) Graph Analytics wkshpC. Faloutsos (CMU) 206

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 207 PART 5: Conclusions

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) P9-208 Summary Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time Task 4: Spectral graph theory

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) P9-209 Summary Task 1: Node importance Task 2: Community detection Task 3: Mining graphs over time Task 4: Spectral graph theory ->SVD, PageRank, HITS -> METIS; ‘no good cuts’ -> Tensors -> Laplacians

CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) P9-210 Acknowledgements Funding: IIS , IIS , DBI , CNS

Graph Analytics wkshpC. Faloutsos (CMU) P9-211 THANK YOU! Christos Faloutsos