Singular Value Decomposition and Data Management

Name: Singular Value Decomposition and Data Management
Uploaded: 2017-07-07T04:48:11+00:00
Duration: PTM20S44
Description: Singular Value Decomposition and Data Management

Singular Value Decomposition and Data Management

SVD - Detailed outline Motivation Definition - properties
Interpretation Complexity Case studies Additional properties

SVD - Motivation problem #1: text - LSI: find ‘concepts’
problem #2: compression / dim. reduction

SVD - Motivation problem #1: text - LSI: find ‘concepts’

SVD - Motivation problem #2: compress / reduce dimensionality

Problem - specs ~10**6 rows; ~10**3 columns; no updates;
random access to any cell(s) ; small error: OK

SVD - Motivation

SVD - Definition A[n x m] = U[n x r] L [ r x r] (V[m x r])T
A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts) L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U L VT , where U, L, V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) UT U = I; VT V = I (I: identity matrix) L: singular values, non-negative and sorted in decreasing order

SVD - Example A = U L VT - example: CS x x = MD retrieval inf. lung
brain data CS x x = MD

SVD - Example A = U L VT - example: CS-concept MD-concept CS x x = MD
retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

SVD - Example A = U L VT - example: doc-to-concept similarity matrix
retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

‘strength’ of CS-concept
SVD - Example A = U L VT - example: retrieval ‘strength’ of CS-concept inf. lung brain data CS x x = MD

SVD - Example A = U L VT - example: term-to-concept similarity matrix
retrieval inf. lung brain data CS-concept CS x x = MD

SVD - Detailed outline Motivation Definition - properties
Interpretation Complexity Case studies Additional properties

SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’:
U: document-to-concept similarity matrix V: term-to-concept sim. matrix L: its diagonal elements: ‘strength’ of each concept

SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

SVD - Motivation

SVD - interpretation #2 SVD: gives best axis to project v1
minimum RMS error

SVD - Interpretation #2

SVD - Interpretation #2 A = U L VT - example: = x v1

variance (‘spread’) on the v1 axis
SVD - Interpretation #2 A = U L VT - example: variance (‘spread’) on the v1 axis x x =

SVD - Interpretation #2 A = U L VT - example:
U L gives the coordinates of the points in the projection axis x x =

SVD - Interpretation #2 More details
Q: how exactly is dim. reduction done? = x

SVD - Interpretation #2 More details
Q: how exactly is dim. reduction done? A: set the smallest singular values to zero: x x =

SVD - Interpretation #2 x x ~

SVD - Interpretation #2 ~

SVD - Interpretation #2 Equivalent:
‘spectral decomposition’ of the matrix: x x =

‘spectral decomposition’ of the matrix: l1 x x = u1 u2 l2 v1 v2

‘spectral decomposition’ of the matrix: m = u1 l1 vT1 + u2 l2 vT2 +... n

SVD - Interpretation #2 ‘spectral decomposition’ of the matrix: m
r terms = u1 l1 vT1 + u2 l2 vT2 +... n n x 1 1 x m

SVD - Interpretation #2 approximation / dim. reduction:
by keeping the first few terms (Q: how many?) m To do the mapping you use VT X’ = VT X = u1 l1 vT1 + u2 l2 vT2 +... n assume: l1 >= l2 >= ...

SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s) m = u1 l1 vT1 + u2 l2 vT2 +... n assume: l1 >= l2 >= ...

SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix x x =

SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’!
Q: rank = ?? ?? x x = ?? ??

SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols)
?? x x = ?? ?? ??

SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols)
x x = orthogonal??

SVD - Interpretation #3 column vectors: are orthogonal - but not unit vectors: x x =

SVD - Interpretation #3 and the singular values are: x x =

SVD - Interpretation #3 A: SVD properties:
matrix product should give back matrix A matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other ditto for matrix V matrix L should be diagonal, with positive values

SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less)
less work, if we just want singular values or if we want first k left singular vectors or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica ...)

Optimality of SVD Def: The Frobenius norm of a n x m matrix M is
(reminder) The rank of a matrix M is the number of independent rows (or columns) of M Let A=ULVT and Ak = Uk Lk VkT (SVD approximation of A) Ak is an nxm matrix, Uk an nxk, Lk kxk, and Vk mxk Theorem: [Eckart and Young] Among all n x m matrices C of rank at most k, we have that:

Kleinberg’s Algorithm
Main idea: In many cases, when you search the web using some terms, the most relevant pages may not contain this term (or contain the term only a few times) Harvard : Search Engines: yahoo, google, altavista Authorities and hubs

Kleinberg’s algorithm
Problem dfn: given the web and a query find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms (root set) Step 1: expand by one move forward and backward (base set)

Step 1: expand by one move forward and backward

on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to give high importance score (‘hubs’) to nodes that point to good ‘authorities’) hubs authorities

observations recursive definition! each node (say, ‘i’-th node) has both an authoritativeness score ai and a hubness score hi

Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores. Then:

Then: ai = hk + hl + hm that is ai = Sum (hj) over all j that (j,i) edge exists or a = AT h k i l m

symmetrically, for the ‘hubness’: hi = an + ap + aq that is hi = Sum (qj) over all j that (i,j) edge exists or h = A a i n p q

In conclusion, we want vectors h and a such that: h = A a a = AT h Recall properties: C(2): A [n x m] v1 [m x 1] = l1 u1 [n x 1] C(3): u1T A = l1 v1T

In short, the solutions to h = A a a = AT h are the left- and right- eigenvectors of the adjacency matrix A. Starting from random a’ and iterating, we’ll eventually converge (Q: to which of all the eigenvectors? why?)

(Q: to which of all the eigenvectors? why?) A: to the ones of the strongest eigenvalue, because of property B(5): B(5): (AT A ) k v’ ~ (constant) v1

Kleinberg’s algorithm - results
Eg., for the query ‘java’: 0.251 java.sun.com (“the java developer”)

Kleinberg’s algorithm - discussion
‘authority’ score can be used to find ‘similar pages’ to page p closely related to ‘citation analysis’, social networs / ‘small world’ phenomena

google/page-rank algorithm
closely related: The Web is a directed graph of connected nodes imagine a particle randomly moving along the edges (*) compute its steady-state probabilities. That gives the PageRank of each pages (the importance of this page) (*) with occasional random jumps

PageRank Definition Assume a page A and pages T1, T2, …, Tm that point to A. Let d is a damping factor. PR(A) the pagerank of A. C(A) the out-degree of A. Then:

Compute the PR of each page~identical problem: given a Markov Chain, compute the steady state probabilities p1 ... p5 2 1 3 4 5

Computing PageRank Iterative procedure
Also, … navigate the web by randomly follow links or with prob p jump to a random page. Let A the adjacency matrix (n x n), di out-degree of page i Prob(Ai->Aj) = pn-1+(1-p)di–1Aij A’[i,j] = Prob(Ai->Aj)

Let A’ be the transition matrix (= adjacency matrix, row-normalized : sum of each row = 1) 2 1 3 = 4 5

A p = p A p = p 2 1 3 = 4 5

A p = p thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is row-normalized)

Kleinberg/google - conclusions
SVD helps in graph analysis: hub/authority scores: strongest left- and right- eigenvectors of the adjacency matrix random walk on a graph: steady state probabilities are given by the strongest eigenvector of the transition matrix

Conclusions – so far SVD: a valuable tool
given a document-term matrix, it finds ‘concepts’ (LSI) ... and can reduce dimensionality (KL)

Conclusions cont’d ... and can find fixed-points or steady-state probabilities (google/ Kleinberg/ Markov Chains) ... and can solve optimally over- and under-constraint linear systems (least squares)

Singular Value Decomposition and Data Management

Similar presentations

Presentation on theme: "Singular Value Decomposition and Data Management"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Singular Value Decomposition and Data Management

Similar presentations

Presentation on theme: "Singular Value Decomposition and Data Management"— Presentation transcript:

Similar presentations

About project

Feedback