Presentation is loading. Please wait.

Presentation is loading. Please wait.

Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University.

Similar presentations


Presentation on theme: "Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University."— Presentation transcript:

1 Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University

2 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)2 Part I: Singular Value Decomposition (SVD) = Eigenvalue decomposition www.cs.cmu.edu/~christos/

3 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)3 SVD - outline Motivation Definition - properties Interpretation; more properties Complexity Applications - success stories Conclusions

4 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)4 SVD - Motivation problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction

5 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)5 SVD - Motivation problem #1: text - LSI: find ‘concepts’

6 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)6 SVD - Motivation problem #2: compress / reduce dimensionality

7 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)7 Problem - specs ~10**6 rows; ~10**3 columns; no updates; Compress / extract features

8 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)8 SVD - Motivation

9 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)9 SVD - Motivation

10 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)10 SVD - outline Motivation Definition - properties Interpretation; more properties Complexity Applications - success stories Conclusions

11 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)11 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 =

12 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)12 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

13 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)13 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

14 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)14 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

15 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)15 SVD - Definition (reminder: matrix multiplication x=

16 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)16 SVD - notation Conventions: bold capitals -> matrix (eg. A, U, , V) bold lower-case -> column vector (eg., x, v 1, u 3 ) regular lower-case -> scalars (eg., 1, r )

17 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)17 SVD - Definition A [n x m] = U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

18 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)18 SVD - Definition A = U  V T - example:

19 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)19 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U  V T, where U,  V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) –U T U = I; V T V = I (I: identity matrix)  : eigenvalues are positive, and sorted in decreasing order

20 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)20 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx

21 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)21 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept

22 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)22 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix

23 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)23 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept

24 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)24 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

25 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)25 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

26 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)26 SVD - outline Motivation Definition - properties Interpretation; more properties Complexity Applications - success stories Conclusions

27 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)27 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: [Foltz+92] U: document-to-concept similarity matrix V: term-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept

28 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)28 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? Q: A A T ?

29 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)29 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? A: term-to-term ([m x m]) similarity matrix Q: A A T ? A: document-to-document ([n x n]) similarity matrix

30 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)30 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

31 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)31 SVD - Motivation

32 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)32 SVD - interpretation #2 minimum RMS error SVD: gives best axis to project v1

33 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)33 SVD - Interpretation #2

34 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)34 SVD - Interpretation #2 A = U  V T - example: = xx v1

35 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)35 SVD - Interpretation #2 A = U  V T - example: = xx variance (‘spread’) on the v1 axis

36 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)36 SVD - Interpretation #2 A = U  V T - example: –U  gives the coordinates of the points in the projection axis = xx

37 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)37 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? = xx

38 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)38 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? A: set the smallest eigenvalues to zero: = xx

39 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)39 SVD - Interpretation #2 ~ xx

40 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)40 SVD - Interpretation #2 ~ xx

41 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)41 SVD - Interpretation #2 ~ xx

42 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)42 SVD - Interpretation #2 ~

43 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)43 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx

44 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)44 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2

45 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)45 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m

46 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)46 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m n x 1 1 x m r terms

47 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)47 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

48 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)48 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

49 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)49 SVD - outline Motivation Definition - properties Interpretation –#1: documents/terms/concepts –#2: dim. reduction –#3: picking non-zero, rectangular ‘blobs’ –#4: fixed points - graph interpretation Complexity...

50 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)50 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

51 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)51 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

52 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)52 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

53 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)53 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

54 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)54 SVD - Interpretation #4 A as vector transformation Axx’ = x

55 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)55 SVD - Interpretation #4 if A is symmetric, then, by defn., its eigenvectors remain parallel to themselves (‘fixed points’) Av1v1 v1v1 = 3.62 * 1

56 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)56 SVD - Interpretation #4 fixed point operation (A T A is symmetric) –A T A v i = i 2 v i (i=1,.., r) [Property 1] –eg., A: transition matrix of a Markov Chain; then v 1 is related to the steady state probability distribution (= a particle, doing a random walk on the graph) convergence: for (almost) any v’: –(A T A) k v’ ~ (constant) v 1 [Property 2] –where v 1 : strongest ‘right’ eigenvector

57 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)57 SVD - Interpretation #4 convergence - illustration:

58 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)58 SVD - Interpretation #4 convergence - illustration:

59 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)59 SVD - Interpretation #4 v 1... v r : ‘right’ eigenvectors symmetrically: u 1... u r : ‘left’ eigenvectors for matrix A (= U  V T ): A v i = i u i u i T A = i v i T (right eigenvectors -> ‘authority’ scores in Kleinberg’s algo; left ones -> ‘hub’ scores)

60 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)60 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

61 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)61 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less) less work, if we just want eigenvalues... or if we want first k eigenvectors... or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica...)

62 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)62 SVD - conclusions so far SVD: A= U  V T : unique (*) U: document-to-concept similarities V: term-to-concept similarities  : strength of each concept

63 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)63 SVD - conclusions so far dim. reduction: keep the first few strongest eigenvalues (80-90% of ‘energy’) SVD: picks up linear correlations SVD: picks up non-zero ‘blobs’ v 1 : fixed point (-> steady-state prob.)

64 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)64 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

65 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)65 SVD - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

66 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)66 Case study - LSI Q1: How to do queries with LSI? Q2: multi-lingual IR (english query, on spanish text?)

67 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)67 Case study - LSI Q1: How to do queries with LSI? Problem: Eg., find documents with ‘data’ data inf. retrieval brain lung = CS MD xx

68 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)68 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung = CS MD xx

69 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)69 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung qT=qT= term1 term2 v1 q v2

70 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)70 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung term1 term2 v1 q v2 A: inner product (cosine similarity) with each ‘concept’ vector v i qT=qT=

71 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)71 Case study - LSI Q1: How to do queries with LSI? A: map query vectors into ‘concept space’ – how? data inf. retrieval brain lung term1 term2 v1 q v2 A: inner product (cosine similarity) with each ‘concept’ vector v i q o v1 qT=qT=

72 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)72 Case study - LSI compactly, we have: q T concept = q T V Eg: data inf. retrieval brain lung qT=qT= term-to-concept similarities = CS-concept

73 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)73 Case study - LSI Drill: how would the document (‘information’, ‘retrieval’) handled by LSI?

74 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)74 Case study - LSI Drill: how would the document (‘information’, ‘retrieval’) handled by LSI? A: SAME: d T concept = d T V Eg: data inf. retrieval brain lung dT=dT= term-to-concept similarities = CS-concept

75 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)75 Case study - LSI Observation: document (‘information’, ‘retrieval’) will be retrieved by query (‘data’), although it does not contain ‘data’!! data inf. retrieval brain lung dT=dT= CS-concept qT=qT=

76 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)76 Case study - LSI Q1: How to do queries with LSI? Q2: multi-lingual IR (english query, on spanish text?)

77 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)77 Case study - LSI Problem: –given many documents, translated to both languages (eg., English and Spanish) –answer queries across languages

78 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)78 Case study - LSI Solution: ~ LSI data inf. retrieval brain lung CS MD datos informacion

79 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)79 Case study - LSI Solution: ~ LSI data inf. retrieval brain lung CS MD datos informacion

80 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)80 SVD - outline - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

81 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)81 Case study: compression [Korn+97] Problem: given a matrix compress it, but maintain ‘random access’

82 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)82 Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK

83 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)83 Idea

84 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)84 SVD - reminder space savings: 2:1 minimum RMS error

85 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)85 Performance - scaleup space error

86 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)86 SVD - outline - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

87 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)87 PCA - ‘Ratio Rules’ [Korn+00] Typically: ‘Association Rules’ (eg., {bread, milk} -> {butter} But: –which set of rules is ‘better’? –how to reconstruct missing/corrupted values?

88 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)88 PCA - ‘Ratio Rules’ Idea: try to find ‘concepts’: Eigenvectors dictate rules about ratios: bread:milk:butter = 2:4:3 $ on bread $ on milk $ on butter 2 3 4

89 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)89 PCA - ‘Ratio Rules’ [Korn+00] Typically: ‘Association Rules’ (eg., {bread, milk} -> {butter} But: –how to reconstruct missing/corrupted values? –which set of rules is ‘better’?

90 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)90 PCA - ‘Ratio Rules’ Moreover: visual data mining, for free:

91 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)91 PCA - Ratio Rules no Gaussian clusters; Zipf-like distribution phonecalls stocks

92 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)92 PCA - Ratio Rules NBA dataset ~500 players; ~30 attributes

93 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)93 PCA - Ratio Rules PCA: get eigenvectors v1, v2,... ignore entries with small abs. value try to interpret the rest

94 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)94 PCA - Ratio Rules NBA dataset - V matrix (term to ‘concept’ similarities) v1

95 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)95 Ratio Rules - example RR1: minutes:points = 2:1 corresponding concept? v1

96 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)96 Ratio Rules - example RR1: minutes:points = 2:1 corresponding concept? A: ‘goodness’ of player

97 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)97 Ratio Rules - example RR2: points: rebounds negatively correlated(!)

98 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)98 Ratio Rules - example RR2: points: rebounds negatively correlated(!) - concept? v2

99 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)99 Ratio Rules - example RR2: points: rebounds negatively correlated(!) - concept? A: position: offensive/defensive

100 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)100 SVD - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

101 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)101 K-L transform [Duda & Hart]; [Fukunaga] A subtle point: SVD will give vectors that go through the origin v1

102 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)102 K-L transform A subtle point: SVD will give vectors that go through the origin Q: how to find v 1 ’ ? v1 v1’v1’

103 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)103 K-L transform A subtle point: SVD will give vectors that go through the origin Q: how to find v 1 ’ ? A: ‘centered’ PCA, ie., move the origin to center of gravity v1 v1’v1’

104 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)104 K-L transform A subtle point: SVD will give vectors that go through the origin Q: how to find v 1 ’ ? A: ‘centered’ PCA, ie., move the origin to center of gravity and THEN do SVD v1’v1’

105 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)105 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

106 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)106 SVD - Case studies... Applications - success stories –multi-lingual IR; LSI queries –compression –PCA - ‘ratio rules’ –Karhunen-Lowe transform –Kleinberg/google algorithm

107 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)107 Kleinberg’s algorithm Problem dfn: given the web and a query find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms Step 1: expand by one move forward and backward

108 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)108 Kleinberg’s algorithm Step 1: expand by one move forward and backward

109 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)109 Kleinberg’s algorithm on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to give high importance score (‘hubs’) to nodes that point to good ‘authorities’) hubsauthorities

110 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)110 Kleinberg’s algorithm observations recursive definition! each node (say, ‘i’-th node) has both an authoritativeness score a i and a hubness score h i

111 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)111 Kleinberg’s algorithm Let A be the adjacency matrix: the (i,j) entry is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores. Then:

112 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)112 Kleinberg’s algorithm Then: a i = h k + h l + h m that is a i = Sum (h j ) over all j that (j,i) edge exists or a = A T h k l m i

113 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)113 Kleinberg’s algorithm symmetrically, for the ‘hubness’: h i = a n + a p + a q that is h i = Sum (q j ) over all j that (i,j) edge exists or h = A a p n q i

114 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)114 Kleinberg’s algorithm In conclusion, we want vectors h and a such that: h = A a a = A T h That is: a = A T A a

115 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)115 Kleinberg’s algorithm a is a right- eigenvector of the adjacency matrix A (by dfn!) Starting from random a’ and iterating, we’ll eventually converge (Q: to which of all the eigenvectors? why?)

116 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)116 Kleinberg’s algorithm (Q: to which of all the eigenvectors? why?) A: to the one of the strongest eigenvalue, because of [Property 2]: (A T A ) k v’ ~ (constant) v 1

117 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)117 Kleinberg’s algorithm - results Eg., for the query ‘java’: 0.328 www.gamelan.com 0.251 java.sun.com 0.190 www.digitalfocus.com (“the java developer”)

118 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)118 Kleinberg’s algorithm - discussion ‘authority’ score can be used to find ‘similar pages’ (how?) closely related to ‘citation analysis’, social networks / ‘small world’ phenomena

119 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)119 google/page-rank algorithm closely related: imagine a particle randomly moving along the edges (*) compute its steady-state probabilities (*) with occasional random jumps

120 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)120 SVD - outline Motivation Definition - properties Interpretation Complexity Applications - success stories Conclusions

121 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)121 SVD - conclusions SVD: a valuable tool - two major properties: (A) it gives the optimal low-rank approximation (= least squares best hyper- plane) –it finds ‘concepts’ = principal components = features; (linear) correlations –dim. reduction; visualization –(solves over- and under-constraint problems)

122 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)122 SVD - conclusions - cont’d (B) eigenvectors: fixed points A T A v i = i v i –steady-state prob. in Markov Chains / graphs –hubs/authorities

123 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)123 Conclusions - in detail: given a (document-term) matrix, it finds ‘concepts’ (LSI)... and can reduce dimensionality (KL)... and can find rules (PCA; RatioRules)

124 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)124 Conclusions cont’d... and can find fixed-points or steady-state probabilities (Kleinberg / google/ Markov Chains) (... and can solve optimally over- and under- constraint linear systems - see Appendix, (Eq. C1) - Moore/Penrose pseudo-inverse)

125 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)125 References Berry, Michael: http://www.cs.utk.edu/~lsi/ Brin, S. and L. Page (1998). Anatomy of a Large- Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf. Chen, C. M. and N. Roussopoulos (May 1994). Adaptive Selectivity Estimation Using Query Feedback. Proc. of the ACM-SIGMOD, Minneapolis, MN.

126 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)126 References Duda, R. O. and P. E. Hart (1973). Pattern Classification and Scene Analysis. New York, Wiley. Faloutsos, C. (1996). Searching Multimedia Databases by Content, Kluwer Academic Inc. Foltz, P. W. and S. T. Dumais (Dec. 1992). "Personalized Information Delivery: An Analysis of Information Filtering Methods." Comm. of ACM (CACM) 35(12): 51-60.

127 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)127 References Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. Jolliffe, I. T. (1986). Principal Component Analysis, Springer Verlag. Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

128 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)128 References Korn, F., H. V. Jagadish, et al. (May 13-15, 1997). Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences. ACM SIGMOD, Tucson, AZ. Korn, F., A. Labrinidis, et al. (1998). Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining. VLDB, New York, NY.

129 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)129 References Korn, F., A. Labrinidis, et al. (2000). "Quantifiable Data Mining Using Ratio Rules." VLDB Journal 8(3-4): 254-266. Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press. Strang, G. (1980). Linear Algebra and Its Applications, Academic Press.

130 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)130 Appendix: SVD properties (A): obvious (B): less obvious (C): least obvious (and most powerful!)

131 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)131 Properties - by defn.: A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] A(1): U T [r x n] U [n x r ] = I [r x r ] (identity matrix) A(2): V T [r x n] V [n x r ] = I [r x r ] A(3):  k = diag( 1 k, 2 k,... r k ) (k: ANY real number) A(4): A T = V  U T

132 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)132 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = ??

133 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)133 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T symmetric; Intuition?

134 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)134 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T symmetric; Intuition? ‘document-to-document’ similarity matrix B(2): symmetrically, for ‘V’ (A T ) [m x n] A [n x m] = V  2 V T Intuition?

135 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)135 Less obvious properties A: term-to-term similarity matrix B(3): ( (A T ) [m x n] A [n x m] ) k = V  2k V T and B(4): (A T A ) k ~ v 1 1 2k v 1 T for k>>1 where v 1 : [m x 1] first column (eigenvector) of V 1 : strongest eigenvalue

136 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)136 Less obvious properties B(4): (A T A ) k ~ v 1 1 2k v 1 T for k>>1 B(5): (A T A ) k v’ ~ (constant) v 1 ie., for (almost) any v’, it converges to a vector parallel to v 1 Thus, useful to compute first eigenvector/value (as well as the next ones, too...)

137 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)137 Less obvious properties - repeated: A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T B(2):(A T ) [m x n] A [n x m] = V  2 V T B(3): ( (A T ) [m x n] A [n x m] ) k = V  2k V T B(4): (A T A ) k ~ v 1 1 2k v 1 T B(5): (A T A ) k v’ ~ (constant) v 1

138 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)138 Least obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(1): A [n x m] x [m x 1] = b [n x 1] let x 0 = V  (-1) U T b if under-specified, x 0 gives ‘shortest’ solution if over-specified, it gives the ‘solution’ with the smallest least squares error (see Num. Recipes, p. 62)

139 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)139 Least obvious properties Illustration: under-specified, eg [1 2] [w z] T = 4 (ie, 1 w + 2 z = 4) 1 234 1 2 all possible solutions x0x0 w z shortest-length solution

140 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)140 Least obvious properties Illustration: over-specified, eg [3 2] T [w] = [1 2] T (ie, 3 w = 1; 2 w = 2 ) 1 234 1 2 reachable points (3w, 2w) desirable point b

141 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)141 Least obvious properties - cont’d A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(2): A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] where v 1, u 1 the first (column) vectors of V, U. (v 1 == right-eigenvector) C(3): symmetrically: u 1 T A = 1 v 1 T u 1 == left-eigenvector Therefore:

142 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)142 Least obvious properties - cont’d A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(4): A T A v 1 = 1 2 v 1 (fixed point - the usual dfn of eigenvector for a symmetric matrix)

143 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)143 Least obvious properties - altogether A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(1): A [n x m] x [m x 1] = b [n x 1] then, x 0 = V  (-1) U T b: shortest, actual or least- squares solution C(2): A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] C(3): u 1 T A = 1 v 1 T C(4): A T A v 1 = 1 2 v 1

144 Carnegie Mellon MDIC-04Copyright: C. Faloutsos (2004)144 Properties - conclusions A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(5): (A T A ) k v’ ~ (constant) v 1 C(1): A [n x m] x [m x 1] = b [n x 1] then, x 0 = V  (-1) U T b: shortest, actual or least- squares solution C(4): A T A v 1 = 1 2 v 1


Download ppt "Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University."

Similar presentations


Ads by Google