Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs.
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Dimensionality Reduction PCA -- SVD
15-826: Multimedia Databases and Data Mining
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Principal Component Analysis
Text Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and video.
Computer Graphics Recitation 5.
3D Geometry for Computer Graphics
Hongtao Cheng 1 Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences Author:Flip Korn H. V. Jagadish Christos Faloutsos From ACM.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Link Analysis, PageRank and Search Engines on the Web
Singular Value Decomposition
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Text and Web Search. Text Databases and IR Text databases (document databases) Large collections of documents from various sources: news articles, research.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
3D Geometry for Computer Graphics
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
3D Geometry for Computer Graphics
Ordinary least squares regression (OLS)
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Dimensionality Reduction
Multimedia Indexing and Dimensionality Reduction.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
SVD(Singular Value Decomposition) and Its Applications
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Non Negative Matrix Factorization
Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
SINGULAR VALUE DECOMPOSITION (SVD)
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Dimensionality Reduction
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
Chapter 13 Discrete Image Transforms
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
CS479/679 Pattern Recognition Dr. George Bebis
Lecture: Face Recognition and Feature Reduction
15-826: Multimedia Databases and Data Mining
LSI, SVD and Data Management
Part 2: Graph Mining Tools - SVD and ranking
Recitation: SVD and dimensionality reduction
Next Generation Data Mining Tools: SVD and Fractals
Presentation transcript:

Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)2 Part I: Singular Value Decomposition (SVD) = Eigenvalue decomposition

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)3 SVD - outline Motivation Definition - properties Interpretation; more properties Complexity Applications - success stories Conclusions

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)4 SVD - Motivation problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)5 SVD - Motivation problem #1: text - LSI: find ‘concepts’

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)6 SVD - Motivation problem #2: compress / reduce dimensionality

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)7 Problem - specs ~10**6 rows; ~10**3 columns; no updates; Compress / extract features

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)8 SVD - Motivation

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)9 SVD - Motivation

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)10 SVD - outline Motivation Definition - properties Interpretation; more properties Complexity Applications - success stories Conclusions

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)11 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 =

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)12 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)13 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)14 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)15 SVD - Definition (reminder: matrix multiplication x=

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)16 SVD - notation Conventions: bold capitals -> matrix (eg. A, U, , V) bold lower-case -> column vector (eg., x, v 1, u 3 ) regular lower-case -> scalars (eg., 1, r )

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)17 SVD - Definition A [n x m] = U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)18 SVD - Definition A = U  V T - example:

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)19 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U  V T, where U,  V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) –U T U = I; V T V = I (I: identity matrix)  : eigenvalues are positive, and sorted in decreasing order

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)20 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)21 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)22 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)23 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)24 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)25 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)26 SVD - outline Motivation Definition - properties Interpretation; more properties Complexity Applications - success stories Conclusions

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)27 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: [Foltz+92] U: document-to-concept similarity matrix V: term-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)28 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? Q: A A T ?

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)29 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? A: term-to-term ([m x m]) similarity matrix Q: A A T ? A: document-to-document ([n x n]) similarity matrix

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)30 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)31 SVD - Motivation

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)32 SVD - interpretation #2 minimum RMS error SVD: gives best axis to project v1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)33 SVD - Interpretation #2

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)34 SVD - Interpretation #2 A = U  V T - example: = xx v1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)35 SVD - Interpretation #2 A = U  V T - example: = xx variance (‘spread’) on the v1 axis

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)36 SVD - Interpretation #2 A = U  V T - example: –U  gives the coordinates of the points in the projection axis = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)37 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)38 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? A: set the smallest eigenvalues to zero: = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)39 SVD - Interpretation #2 ~ xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)40 SVD - Interpretation #2 ~ xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)41 SVD - Interpretation #2 ~ xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)42 SVD - Interpretation #2 ~

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)43 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)44 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)45 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT n m

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)46 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT n m n x 1 1 x m r terms

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)47 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) =u1u1 1 vT1vT1 u2u2 2 vT2vT n m assume: 1 >= 2 >=...

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)48 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s) =u1u1 1 vT1vT1 u2u2 2 vT2vT n m assume: 1 >= 2 >=...

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)49 SVD - outline Motivation Definition - properties Interpretation –#1: documents/terms/concepts –#2: dim. reduction –#3: picking non-zero, rectangular ‘blobs’ –#4: fixed points - graph interpretation Complexity...

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)50 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)51 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)52 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)53 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)54 SVD - Interpretation #4 A as vector transformation Axx’ = x

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)55 SVD - Interpretation #4 if A is symmetric, then, by defn., its eigenvectors remain parallel to themselves (‘fixed points’) Av1v1 v1v1 = 3.62 * 1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)56 SVD - Interpretation #4 fixed point operation (A T A is symmetric) –A T A v i = i 2 v i (i=1,.., r) [Property 1] –eg., A: transition matrix of a Markov Chain; then v 1 is related to the steady state probability distribution (= a particle, doing a random walk on the graph) convergence: for (almost) any v’: –(A T A) k v’ ~ (constant) v 1 [Property 2] –where v 1 : strongest ‘right’ eigenvector

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)57 SVD - Interpretation #4 convergence - illustration:

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)58 SVD - Interpretation #4 convergence - illustration:

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)59 SVD - Interpretation #4 v 1... v r : ‘right’ eigenvectors symmetrically: u 1... u r : ‘left’ eigenvectors for matrix A (= U  V T ): A v i = i u i u i T A = i v i T (right eigenvectors -> ‘authority’ scores in Kleinberg’s algo; left ones -> ‘hub’ scores)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)60 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)61 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less) less work, if we just want eigenvalues... or if we want first k eigenvectors... or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica...)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)62 SVD - conclusions so far SVD: A= U  V T : unique (*) U: document-to-concept similarities V: term-to-concept similarities  : strength of each concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)63 SVD - conclusions so far dim. reduction: keep the first few strongest eigenvalues (80-90% of ‘energy’) SVD: picks up linear correlations SVD: picks up non-zero ‘blobs’ v 1 : fixed point (-> steady-state prob.)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)64 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)65 SVD - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)66 Case study - LSI Q1: How to do queries with LSI? Q2: multi-lingual IR (english query, on spanish text?)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)67 Case study - LSI Q1: How to do queries with LSI? Problem: Eg., find documents with ‘data’ data inf. retrieval brain lung = CS MD xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)68 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung = CS MD xx

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)69 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung qT=qT= term1 term2 v1 q v2

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)70 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung term1 term2 v1 q v2 A: inner product (cosine similarity) with each ‘concept’ vector v i qT=qT=

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)71 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung term1 term2 v1 q v2 A: inner product (cosine similarity) with each ‘concept’ vector v i q o v1 qT=qT=

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)72 Case study - LSI compactly, we have: q T concept = q T V Eg: data inf. retrieval brain lung qT=qT= term-to-concept similarities = CS-concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)73 Case study - LSI Drill: how would the document (‘information’, ‘retrieval’) handled by LSI?

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)74 Case study - LSI Drill: how would the document (‘information’, ‘retrieval’) handled by LSI? A: SAME: d T concept = d T V Eg: data inf. retrieval brain lung dT=dT= term-to-concept similarities = CS-concept

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)75 Case study - LSI Observation: document (‘information’, ‘retrieval’) will be retrieved by query (‘data’), although it does not contain ‘data’!! data inf. retrieval brain lung dT=dT= CS-concept qT=qT=

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)76 Case study - LSI Q1: How to do queries with LSI? Q2: multi-lingual IR (english query, on spanish text?)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)77 Case study - LSI Problem: –given many documents, translated to both languages (eg., English and Spanish) –answer queries across languages

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)78 Case study - LSI Solution: ~ LSI data inf. retrieval brain lung CS MD datos informacion

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)79 Case study - LSI Solution: ~ LSI data inf. retrieval brain lung CS MD datos informacion

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)80 SVD - outline - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)81 Case study: compression [Korn+97] Problem: given a matrix compress it, but maintain ‘random access’

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)82 Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)83 Idea

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)84 SVD - reminder space savings: 2:1 minimum RMS error

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)85 Performance - scaleup space error

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)86 SVD - outline - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)87 PCA - ‘Ratio Rules’ [Korn+00] Typically: ‘Association Rules’ (eg., {bread, milk} -> {butter} But: –which set of rules is ‘better’? –how to reconstruct missing/corrupted values?

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)88 PCA - ‘Ratio Rules’ Idea: try to find ‘concepts’: Eigenvectors dictate rules about ratios: bread:milk:butter = 2:4:3 $ on bread $ on milk $ on butter 2 3 4

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)89 PCA - ‘Ratio Rules’ [Korn+00] Typically: ‘Association Rules’ (eg., {bread, milk} -> {butter} But: –how to reconstruct missing/corrupted values? –which set of rules is ‘better’?

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)90 PCA - ‘Ratio Rules’ Moreover: visual data mining, for free:

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)91 PCA - Ratio Rules no Gaussian clusters; Zipf-like distribution phonecalls stocks

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)92 PCA - Ratio Rules NBA dataset ~500 players; ~30 attributes

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)93 PCA - Ratio Rules PCA: get eigenvectors v1, v2,... ignore entries with small abs. value try to interpret the rest

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)94 PCA - Ratio Rules NBA dataset - V matrix (term to ‘concept’ similarities) v1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)95 Ratio Rules - example RR1: minutes:points = 2:1 corresponding concept? v1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)96 Ratio Rules - example RR1: minutes:points = 2:1 corresponding concept? A: ‘goodness’ of player

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)97 Ratio Rules - example RR2: points: rebounds negatively correlated(!)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)98 Ratio Rules - example RR2: points: rebounds negatively correlated(!) - concept? v2

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)99 Ratio Rules - example RR2: points: rebounds negatively correlated(!) - concept? A: position: offensive/defensive

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)100 SVD - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)101 K-L transform [Duda & Hart]; [Fukunaga] A subtle point: SVD will give vectors that go through the origin v1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)102 K-L transform A subtle point: SVD will give vectors that go through the origin Q: how to find v 1 ’ ? v1 v1’v1’

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)103 K-L transform A subtle point: SVD will give vectors that go through the origin Q: how to find v 1 ’ ? A: ‘centered’ PCA, ie., move the origin to center of gravity v1 v1’v1’

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)104 K-L transform A subtle point: SVD will give vectors that go through the origin Q: how to find v 1 ’ ? A: ‘centered’ PCA, ie., move the origin to center of gravity and THEN do SVD v1’v1’

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)105 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)106 SVD - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)107 Kleinberg’s algorithm Problem dfn: given the web and a query find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms Step 1: expand by one move forward and backward

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)108 Kleinberg’s algorithm Step 1: expand by one move forward and backward

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)109 Kleinberg’s algorithm on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to give high importance score (‘hubs’) to nodes that point to good ‘authorities’) hubsauthorities

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)110 Kleinberg’s algorithm observations recursive definition! each node (say, ‘i’-th node) has both an authoritativeness score a i and a hubness score h i

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)111 Kleinberg’s algorithm Let A be the adjacency matrix: the (i,j) entry is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores. Then:

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)112 Kleinberg’s algorithm Then: a i = h k + h l + h m that is a i = Sum (h j ) over all j that (j,i) edge exists or a = A T h k l m i

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)113 Kleinberg’s algorithm symmetrically, for the ‘hubness’: h i = a n + a p + a q that is h i = Sum (q j ) over all j that (i,j) edge exists or h = A a p n q i

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)114 Kleinberg’s algorithm In conclusion, we want vectors h and a such that: h = A a a = A T h That is: a = A T A a

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)115 Kleinberg’s algorithm a is a right- eigenvector of the adjacency matrix A (by dfn!) Starting from random a’ and iterating, we’ll eventually converge (Q: to which of all the eigenvectors? why?)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)116 Kleinberg’s algorithm (Q: to which of all the eigenvectors? why?) A: to the one of the strongest eigenvalue, because of [Property 2]: (A T A ) k v’ ~ (constant) v 1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)117 Kleinberg’s algorithm - results Eg., for the query ‘java’: java.sun.com (“the java developer”)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)118 Kleinberg’s algorithm - discussion ‘authority’ score can be used to find ‘similar pages’ (how?) closely related to ‘citation analysis’, social networks / ‘small world’ phenomena

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)119 google/page-rank algorithm closely related: imagine a particle randomly moving along the edges (*) compute its steady-state probabilities (*) with occasional random jumps

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)120 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)121 SVD - conclusions SVD: a valuable tool - two major properties: (A) it gives the optimal low-rank approximation (= least squares best hyper- plane) –it finds ‘concepts’ = principal components = features; (linear) correlations –dim. reduction; visualization –(solves over- and under-constraint problems)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)122 SVD - conclusions - cont’d (B) eigenvectors: fixed points A T A v i = i v i –steady-state prob. in Markov Chains / graphs –hubs/authorities

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)123 Conclusions - in detail: given a (document-term) matrix, it finds ‘concepts’ (LSI)... and can reduce dimensionality (KL)... and can find rules (PCA; RatioRules)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)124 Conclusions cont’d... and can find fixed-points or steady-state probabilities (Kleinberg / google/ Markov Chains) (... and can solve optimally over- and under- constraint linear systems - see Appendix, (Eq. C1) - Moore/Penrose pseudo-inverse)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)125 References Berry, Michael: Brin, S. and L. Page (1998). Anatomy of a Large- Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf. Chen, C. M. and N. Roussopoulos (May 1994). Adaptive Selectivity Estimation Using Query Feedback. Proc. of the ACM-SIGMOD, Minneapolis, MN.

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)126 References Duda, R. O. and P. E. Hart (1973). Pattern Classification and Scene Analysis. New York, Wiley. Faloutsos, C. (1996). Searching Multimedia Databases by Content, Kluwer Academic Inc. Foltz, P. W. and S. T. Dumais (Dec. 1992). "Personalized Information Delivery: An Analysis of Information Filtering Methods." Comm. of ACM (CACM) 35(12):

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)127 References Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. Jolliffe, I. T. (1986). Principal Component Analysis, Springer Verlag. Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)128 References Korn, F., H. V. Jagadish, et al. (May 13-15, 1997). Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. ACM SIGMOD, Tucson, AZ. Korn, F., A. Labrinidis, et al. (1998). Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining. VLDB, New York, NY.

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)129 References Korn, F., A. Labrinidis, et al. (2000). "Quantifiable Data Mining Using Ratio Rules." VLDB Journal 8(3-4): Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press. Strang, G. (1980). Linear Algebra and Its Applications, Academic Press.

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)130 Appendix: SVD properties (A): obvious (B): less obvious (C): least obvious (and most powerful!)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)131 Properties - by defn.: A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] A(1): U T [r x n] U [n x r ] = I [r x r ] (identity matrix) A(2): V T [r x n] V [n x r ] = I [r x r ] A(3):  k = diag( 1 k, 2 k,... r k ) (k: ANY real number) A(4): A T = V  U T

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)132 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = ??

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)133 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T symmetric; Intuition?

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)134 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T symmetric; Intuition? ‘document-to-document’ similarity matrix B(2): symmetrically, for ‘V’ (A T ) [m x n] A [n x m] = V  2 V T Intuition?

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)135 Less obvious properties A: term-to-term similarity matrix B(3): ( (A T ) [m x n] A [n x m] ) k = V  2k V T and B(4): (A T A ) k ~ v 1 1 2k v 1 T for k>>1 where v 1 : [m x 1] first column (eigenvector) of V 1 : strongest eigenvalue

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)136 Less obvious properties B(4): (A T A ) k ~ v 1 1 2k v 1 T for k>>1 B(5): (A T A ) k v’ ~ (constant) v 1 ie., for (almost) any v’, it converges to a vector parallel to v 1 Thus, useful to compute first eigenvector/value (as well as the next ones, too...)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)137 Less obvious properties - repeated: A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T B(2):(A T ) [m x n] A [n x m] = V  2 V T B(3): ( (A T ) [m x n] A [n x m] ) k = V  2k V T B(4): (A T A ) k ~ v 1 1 2k v 1 T B(5): (A T A ) k v’ ~ (constant) v 1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)138 Least obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(1): A [n x m] x [m x 1] = b [n x 1] let x 0 = V  (-1) U T b if under-specified, x 0 gives ‘shortest’ solution if over-specified, it gives the ‘solution’ with the smallest least squares error (see Num. Recipes, p. 62)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)139 Least obvious properties Illustration: under-specified, eg [1 2] [w z] T = 4 (ie, 1 w + 2 z = 4) all possible solutions x0x0 w z shortest-length solution

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)140 Least obvious properties Illustration: over-specified, eg [3 2] T [w] = [1 2] T (ie, 3 w = 1; 2 w = 2 ) reachable points (3w, 2w) desirable point b

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)141 Least obvious properties - cont’d A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(2): A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] where v 1, u 1 the first (column) vectors of V, U. (v 1 == right-eigenvector) C(3): symmetrically: u 1 T A = 1 v 1 T u 1 == left-eigenvector Therefore:

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)142 Least obvious properties - cont’d A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(4): A T A v 1 = 1 2 v 1 (fixed point - the usual dfn of eigenvector for a symmetric matrix)

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)143 Least obvious properties - altogether A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(1): A [n x m] x [m x 1] = b [n x 1] then, x 0 = V  (-1) U T b: shortest, actual or least- squares solution C(2): A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] C(3): u 1 T A = 1 v 1 T C(4): A T A v 1 = 1 2 v 1

Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)144 Properties - conclusions A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(5): (A T A ) k v’ ~ (constant) v 1 C(1): A [n x m] x [m x 1] = b [n x 1] then, x 0 = V  (-1) U T b: shortest, actual or least- squares solution C(4): A T A v 1 = 1 2 v 1