Download presentation
Presentation is loading. Please wait.
1
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos
2
Multi DB and D.M.Copyright: C. Faloutsos (2001)2 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining
3
Multi DB and D.M.Copyright: C. Faloutsos (2001)3 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods fractals text Singular Value Decomposition (SVD) multimedia...
4
Multi DB and D.M.Copyright: C. Faloutsos (2001)4 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties
5
Multi DB and D.M.Copyright: C. Faloutsos (2001)5 SVD - Motivation problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction
6
Multi DB and D.M.Copyright: C. Faloutsos (2001)6 SVD - Motivation problem #1: text - LSI: find ‘concepts’
7
Multi DB and D.M.Copyright: C. Faloutsos (2001)7 SVD - Motivation problem #2: compress / reduce dimensionality
8
Multi DB and D.M.Copyright: C. Faloutsos (2001)8 Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK
9
Multi DB and D.M.Copyright: C. Faloutsos (2001)9 SVD - Motivation
10
Multi DB and D.M.Copyright: C. Faloutsos (2001)10 SVD - Motivation
11
Multi DB and D.M.Copyright: C. Faloutsos (2001)11 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties
12
Multi DB and D.M.Copyright: C. Faloutsos (2001)12 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 =
13
Multi DB and D.M.Copyright: C. Faloutsos (2001)13 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1
14
Multi DB and D.M.Copyright: C. Faloutsos (2001)14 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1
15
Multi DB and D.M.Copyright: C. Faloutsos (2001)15 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1
16
Multi DB and D.M.Copyright: C. Faloutsos (2001)16 SVD - Definition (reminder: matrix multiplication x=
17
Multi DB and D.M.Copyright: C. Faloutsos (2001)17 SVD - Definition A [n x m] = U [n x r] r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts) : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)
18
Multi DB and D.M.Copyright: C. Faloutsos (2001)18 SVD - Definition A = U V T - example:
19
Multi DB and D.M.Copyright: C. Faloutsos (2001)19 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U V T, where U, V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) –U T U = I; V T V = I (I: identity matrix) : eigenvalues are positive, and sorted in decreasing order
20
Multi DB and D.M.Copyright: C. Faloutsos (2001)20 SVD - Example A = U V T - example: data inf. retrieval brain lung = CS MD xx
21
Multi DB and D.M.Copyright: C. Faloutsos (2001)21 SVD - Example A = U V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept
22
Multi DB and D.M.Copyright: C. Faloutsos (2001)22 SVD - Example A = U V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix
23
Multi DB and D.M.Copyright: C. Faloutsos (2001)23 SVD - Example A = U V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept
24
Multi DB and D.M.Copyright: C. Faloutsos (2001)24 SVD - Example A = U V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept
25
Multi DB and D.M.Copyright: C. Faloutsos (2001)25 SVD - Example A = U V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept
26
Multi DB and D.M.Copyright: C. Faloutsos (2001)26 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties
27
Multi DB and D.M.Copyright: C. Faloutsos (2001)27 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept sim. matrix : its diagonal elements: ‘strength’ of each concept
28
Multi DB and D.M.Copyright: C. Faloutsos (2001)28 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)
29
Multi DB and D.M.Copyright: C. Faloutsos (2001)29 SVD - Motivation
30
Multi DB and D.M.Copyright: C. Faloutsos (2001)30 SVD - interpretation #2 minimum RMS error SVD: gives best axis to project v1
31
Multi DB and D.M.Copyright: C. Faloutsos (2001)31 SVD - Interpretation #2
32
Multi DB and D.M.Copyright: C. Faloutsos (2001)32 SVD - Interpretation #2 A = U V T - example: = xx v1
33
Multi DB and D.M.Copyright: C. Faloutsos (2001)33 SVD - Interpretation #2 A = U V T - example: = xx variance (‘spread’) on the v1 axis
34
Multi DB and D.M.Copyright: C. Faloutsos (2001)34 SVD - Interpretation #2 A = U V T - example: –U gives the coordinates of the points in the projection axis = xx
35
Multi DB and D.M.Copyright: C. Faloutsos (2001)35 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? = xx
36
Multi DB and D.M.Copyright: C. Faloutsos (2001)36 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? A: set the smallest eigenvalues to zero: = xx
37
Multi DB and D.M.Copyright: C. Faloutsos (2001)37 SVD - Interpretation #2 ~ xx
38
Multi DB and D.M.Copyright: C. Faloutsos (2001)38 SVD - Interpretation #2 ~ xx
39
Multi DB and D.M.Copyright: C. Faloutsos (2001)39 SVD - Interpretation #2 ~ xx
40
Multi DB and D.M.Copyright: C. Faloutsos (2001)40 SVD - Interpretation #2 ~
41
Multi DB and D.M.Copyright: C. Faloutsos (2001)41 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx
42
Multi DB and D.M.Copyright: C. Faloutsos (2001)42 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2
43
Multi DB and D.M.Copyright: C. Faloutsos (2001)43 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m
44
Multi DB and D.M.Copyright: C. Faloutsos (2001)44 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m n x 1 1 x m r terms
45
Multi DB and D.M.Copyright: C. Faloutsos (2001)45 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...
46
Multi DB and D.M.Copyright: C. Faloutsos (2001)46 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...
47
Multi DB and D.M.Copyright: C. Faloutsos (2001)47 SVD - Detailed outline Motivation Definition - properties Interpretation –#1: documents/terms/concepts –#2: dim. reduction –#3: picking non-zero, rectangular ‘blobs’ Complexity Case studies Additional properties
48
Multi DB and D.M.Copyright: C. Faloutsos (2001)48 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx
49
Multi DB and D.M.Copyright: C. Faloutsos (2001)49 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx
50
Multi DB and D.M.Copyright: C. Faloutsos (2001)50 SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’! Q: rank = ?? = xx??
51
Multi DB and D.M.Copyright: C. Faloutsos (2001)51 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx??
52
Multi DB and D.M.Copyright: C. Faloutsos (2001)52 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx orthogonal??
53
Multi DB and D.M.Copyright: C. Faloutsos (2001)53 SVD - Interpretation #3 column vectors: are orthogonal - but not unit vectors: = xx
54
Multi DB and D.M.Copyright: C. Faloutsos (2001)54 SVD - Interpretation #3 and the eigenvalues are: = xx
55
Multi DB and D.M.Copyright: C. Faloutsos (2001)55 SVD - Interpretation #3 Q: How to check we are correct? = xx
56
Multi DB and D.M.Copyright: C. Faloutsos (2001)56 SVD - Interpretation #3 A: SVD properties: –matrix product should give back matrix A –matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other –ditto for matrix V –matrix should be diagonal, with positive values
57
Multi DB and D.M.Copyright: C. Faloutsos (2001)57 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties
58
Multi DB and D.M.Copyright: C. Faloutsos (2001)58 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less) less work, if we just want eigenvalues or if we want first k eigenvectors or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica...)
59
Multi DB and D.M.Copyright: C. Faloutsos (2001)59 SVD - conclusions so far SVD: A= U V T : unique (*) U: document-to-concept similarities V: term-to-concept similarities : strength of each concept dim. reduction: keep the first few strongest eigenvalues (80-90% of ‘energy’) –SVD: picks up linear correlations SVD: picks up non-zero ‘blobs’
60
Multi DB and D.M.Copyright: C. Faloutsos (2001)60 References Berry, Michael: http://www.cs.utk.edu/~lsi/ Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.