Presentation is loading. Please wait.

Presentation is loading. Please wait.

Singular Value Decomposition and Data Management

Similar presentations


Presentation on theme: "Singular Value Decomposition and Data Management"— Presentation transcript:

1 Singular Value Decomposition and Data Management

2 SVD - Detailed outline Motivation Definition - properties
Interpretation Complexity Case studies Additional properties

3 SVD - Motivation problem #1: text - LSI: find ‘concepts’
problem #2: compression / dim. reduction

4 SVD - Motivation problem #1: text - LSI: find ‘concepts’

5 SVD - Motivation problem #2: compress / reduce dimensionality

6 Problem - specs ~10**6 rows; ~10**3 columns; no updates;
random access to any cell(s) ; small error: OK

7 SVD - Motivation

8 SVD - Motivation

9 SVD - Definition A[n x m] = U[n x r] L [ r x r] (V[m x r])T
A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts) L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

10 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U L VT , where U, L, V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) UT U = I; VT V = I (I: identity matrix) L: singular values, non-negative and sorted in decreasing order

11 SVD - Example A = U L VT - example: CS x x = MD retrieval inf. lung
brain data CS x x = MD

12 SVD - Example A = U L VT - example: CS-concept MD-concept CS x x = MD
retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

13 SVD - Example A = U L VT - example: doc-to-concept similarity matrix
retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

14 ‘strength’ of CS-concept
SVD - Example A = U L VT - example: retrieval ‘strength’ of CS-concept inf. lung brain data CS x x = MD

15 SVD - Example A = U L VT - example: term-to-concept similarity matrix
retrieval inf. lung brain data CS-concept CS x x = MD

16 SVD - Example A = U L VT - example: term-to-concept similarity matrix
retrieval inf. lung brain data CS-concept CS x x = MD

17 SVD - Detailed outline Motivation Definition - properties
Interpretation Complexity Case studies Additional properties

18 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’:
U: document-to-concept similarity matrix V: term-to-concept sim. matrix L: its diagonal elements: ‘strength’ of each concept

19 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

20 SVD - Motivation

21 SVD - interpretation #2 SVD: gives best axis to project v1
minimum RMS error

22 SVD - Interpretation #2

23 SVD - Interpretation #2 A = U L VT - example: = x v1

24 variance (‘spread’) on the v1 axis
SVD - Interpretation #2 A = U L VT - example: variance (‘spread’) on the v1 axis x x =

25 SVD - Interpretation #2 A = U L VT - example:
U L gives the coordinates of the points in the projection axis x x =

26 SVD - Interpretation #2 More details
Q: how exactly is dim. reduction done? = x

27 SVD - Interpretation #2 More details
Q: how exactly is dim. reduction done? A: set the smallest singular values to zero: x x =

28 SVD - Interpretation #2 x x ~

29 SVD - Interpretation #2 x x ~

30 SVD - Interpretation #2 x x ~

31 SVD - Interpretation #2 ~

32 SVD - Interpretation #2 Equivalent:
‘spectral decomposition’ of the matrix: x x =

33 SVD - Interpretation #2 Equivalent:
‘spectral decomposition’ of the matrix: l1 x x = u1 u2 l2 v1 v2

34 SVD - Interpretation #2 Equivalent:
‘spectral decomposition’ of the matrix: m = u1 l1 vT1 + u2 l2 vT2 +... n

35 SVD - Interpretation #2 ‘spectral decomposition’ of the matrix: m
r terms = u1 l1 vT1 + u2 l2 vT2 +... n n x 1 1 x m

36 SVD - Interpretation #2 approximation / dim. reduction:
by keeping the first few terms (Q: how many?) m To do the mapping you use VT X’ = VT X = u1 l1 vT1 + u2 l2 vT2 +... n assume: l1 >= l2 >= ...

37 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s) m = u1 l1 vT1 + u2 l2 vT2 +... n assume: l1 >= l2 >= ...

38 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix x x =

39 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix x x =

40 SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’!
Q: rank = ?? ?? x x = ?? ??

41 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols)
?? x x = ?? ?? ??

42 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols)
x x = orthogonal??

43 SVD - Interpretation #3 column vectors: are orthogonal - but not unit vectors: x x =

44 SVD - Interpretation #3 and the singular values are: x x =

45 SVD - Interpretation #3 A: SVD properties:
matrix product should give back matrix A matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other ditto for matrix V matrix L should be diagonal, with positive values

46 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less)
less work, if we just want singular values or if we want first k left singular vectors or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica ...)

47 Optimality of SVD Def: The Frobenius norm of a n x m matrix M is
(reminder) The rank of a matrix M is the number of independent rows (or columns) of M Let A=ULVT and Ak = Uk Lk VkT (SVD approximation of A) Ak is an nxm matrix, Uk an nxk, Lk kxk, and Vk mxk Theorem: [Eckart and Young] Among all n x m matrices C of rank at most k, we have that:

48 Kleinberg’s Algorithm
Main idea: In many cases, when you search the web using some terms, the most relevant pages may not contain this term (or contain the term only a few times) Harvard : Search Engines: yahoo, google, altavista Authorities and hubs

49 Kleinberg’s algorithm
Problem dfn: given the web and a query find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms (root set) Step 1: expand by one move forward and backward (base set)

50 Kleinberg’s algorithm
Step 1: expand by one move forward and backward

51 Kleinberg’s algorithm
on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to give high importance score (‘hubs’) to nodes that point to good ‘authorities’) hubs authorities

52 Kleinberg’s algorithm
observations recursive definition! each node (say, ‘i’-th node) has both an authoritativeness score ai and a hubness score hi

53 Kleinberg’s algorithm
Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores. Then:

54 Kleinberg’s algorithm
Then: ai = hk + hl + hm that is ai = Sum (hj) over all j that (j,i) edge exists or a = AT h k i l m

55 Kleinberg’s algorithm
symmetrically, for the ‘hubness’: hi = an + ap + aq that is hi = Sum (qj) over all j that (i,j) edge exists or h = A a i n p q

56 Kleinberg’s algorithm
In conclusion, we want vectors h and a such that: h = A a a = AT h Recall properties: C(2): A [n x m] v1 [m x 1] = l1 u1 [n x 1] C(3): u1T A = l1 v1T

57 Kleinberg’s algorithm
In short, the solutions to h = A a a = AT h are the left- and right- eigenvectors of the adjacency matrix A. Starting from random a’ and iterating, we’ll eventually converge (Q: to which of all the eigenvectors? why?)

58 Kleinberg’s algorithm
(Q: to which of all the eigenvectors? why?) A: to the ones of the strongest eigenvalue, because of property B(5): B(5): (AT A ) k v’ ~ (constant) v1

59 Kleinberg’s algorithm - results
Eg., for the query ‘java’: 0.251 java.sun.com (“the java developer”)

60 Kleinberg’s algorithm - discussion
‘authority’ score can be used to find ‘similar pages’ to page p closely related to ‘citation analysis’, social networs / ‘small world’ phenomena

61 google/page-rank algorithm
closely related: The Web is a directed graph of connected nodes imagine a particle randomly moving along the edges (*) compute its steady-state probabilities. That gives the PageRank of each pages (the importance of this page) (*) with occasional random jumps

62 PageRank Definition Assume a page A and pages T1, T2, …, Tm that point to A. Let d is a damping factor. PR(A) the pagerank of A. C(A) the out-degree of A. Then:

63 google/page-rank algorithm
Compute the PR of each page~identical problem: given a Markov Chain, compute the steady state probabilities p1 ... p5 2 1 3 4 5

64 Computing PageRank Iterative procedure
Also, … navigate the web by randomly follow links or with prob p jump to a random page. Let A the adjacency matrix (n x n), di out-degree of page i Prob(Ai->Aj) = pn-1+(1-p)di–1Aij A’[i,j] = Prob(Ai->Aj)

65 google/page-rank algorithm
Let A’ be the transition matrix (= adjacency matrix, row-normalized : sum of each row = 1) 2 1 3 = 4 5

66 google/page-rank algorithm
A p = p A p = p 2 1 3 = 4 5

67 google/page-rank algorithm
A p = p thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is row-normalized)

68 Kleinberg/google - conclusions
SVD helps in graph analysis: hub/authority scores: strongest left- and right- eigenvectors of the adjacency matrix random walk on a graph: steady state probabilities are given by the strongest eigenvector of the transition matrix

69 Conclusions – so far SVD: a valuable tool
given a document-term matrix, it finds ‘concepts’ (LSI) ... and can reduce dimensionality (KL)

70 Conclusions cont’d ... and can find fixed-points or steady-state probabilities (google/ Kleinberg/ Markov Chains) ... and can solve optimally over- and under-constraint linear systems (least squares)


Download ppt "Singular Value Decomposition and Data Management"

Similar presentations


Ads by Google