The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

3D Geometry for Computer Graphics
Covariance Matrix Applications
Dimensionality Reduction PCA -- SVD
PCA + SVD.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
Computer Graphics Recitation 5.
3D Geometry for Computer Graphics
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Ch 7.3: Systems of Linear Equations, Linear Independence, Eigenvalues
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Vector Space Model Any text object can be represented by a term vector Examples: Documents, queries, sentences, …. A query is viewed as a short document.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
3D Geometry for Computer Graphics
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Ordinary least squares regression (OLS)
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Boot Camp in Linear Algebra Joel Barajas Karla L Caballero University of California Silicon Valley Center October 8th, 2008.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Matrices CS485/685 Computer Vision Dr. George Bebis.
5.1 Orthogonality.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
SVD(Singular Value Decomposition) and Its Applications
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
Linear Least Squares Approximation. 2 Definition (point set case) Given a point set x 1, x 2, …, x n  R d, linear least squares fitting amounts to find.
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Chapter 2 Dimensionality Reduction. Linear Methods
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 February 24 Matrices 3.2 Matrices; Row reduction Standard form of a set of linear equations: Chapter 3 Linear Algebra Matrix of coefficients: Augmented.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
8.1 Vector spaces A set of vector is said to form a linear vector space V Chapter 8 Matrices and vector spaces.
Linear algebra: matrix Eigen-value Problems
Elementary Linear Algebra Anton & Rorres, 9th Edition
Elementary Linear Algebra Anton & Rorres, 9 th Edition Lecture Set – 07 Chapter 7: Eigenvalues, Eigenvectors.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
Review of Linear Algebra Optimization 1/16/08 Recitation Joseph Bradley.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Feature Extraction 主講人:虞台文.
Unsupervised Learning II Feature Extraction
Boot Camp in Linear Algebra TIM 209 Prof. Ram Akella.
Unsupervised Learning II Feature Extraction
CS246 Linear Algebra Review. A Brief Review of Linear Algebra Vector and a list of numbers Addition Scalar multiplication Dot product Dot product as a.
CSE 554 Lecture 8: Alignment
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Review of Linear Algebra
CS479/679 Pattern Recognition Dr. George Bebis
Matrices and vector spaces
Some useful linear algebra
CS485/685 Computer Vision Dr. George Bebis
Chapter 3 Linear Algebra
Symmetric Matrices and Quadratic Forms
Principal Components What matters most?.
Maths for Signals and Systems Linear Algebra in Engineering Lecture 18, Friday 18th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL.
Principal Component Analysis
Eigenvalues and Eigenvectors
Subject :- Applied Mathematics
Symmetric Matrices and Quadratic Forms
Presentation transcript:

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product Eigenvalue, Eigenvector Projection

Least Squares Problem:  The normal equation for LS problem: Finding the projection of onto the  The projection matrix:  Let be a matrix with full column rank  If has orthonormal columns, then the LS problem becomes easy:  Think of orthonormal axis system

Matrix Factorization  LU-Factorization:  QR-Factorization:  Very useful for solving linear system equations  Some row exchanges are required Every matrix with linearly independent columns can be factored into. The columns of are orthonormal,and is upper triangular and invertible. When and all matrices are square, becomes an orthogonal matrix ( )

QR Factorization Simplifies Least Squares Problem  The normal equation for LS problem:  Note: The orthogonal matrix constructs the column space of matrix LS problem: Finding the projection of onto the

Motivation for Computing QR of the term-by-doc Matrix  The basis vectors of the column space of can be used to describe the semantic content of the corresponding text collection  Let be the angle between a query and the document vector  That means we can keep and instead of  QR also can be applied to dimension reduction

Recall Matrix Notations Random vector x = [X 1, X 2, …, X n ] T where each X i is a random variable to describe the value of the i -th attribute Expectation: E[x] = , covariance: E[(x –  )(x –  ) T ] =  Expectation of projection: E[w T x] = E[ ∑ i w i X i ] = ∑ i w i E[X i ] = w T E[x] = w T  Variance of projection: Var(w T x)= E[(w T x – w T  ) 2 ] = E[(w T x – w T  )(w T x – w T  )] = E[w T (x –  )(x –  ) T w] = w T E[(x –  )(x –  ) T ]w = w T  w w T : 1  n x: n  1

Principal Components Analysis (PCA) Not using the output information Find a mapping from the inputs in the original n - dimensional space to a new ( k<n )-dimensional space such that when x is projected there, information loss is minimized. The projection of x on the direction of w is: z = w T x Find w such that Var(z) is maximized (after the projection, the difference between the sample points becomes most apparent) For a unique solution, ||w|| = 1 w (||w|| = 1) x wTxwTx

The 1st Principal Component Maximize Var(z) subject to ||w||=1 Take the derivative w.r.t. w 1 and setting it to 0, we have That is, w 1 is an eigenvector of  and  the corresponding eigenvalue Also, We can choose the largest eigenvalue for Var(z) to be maximum  The 1st principal component is the eigenvector of the covariance matrix of the input sample with the largest eigenvalue, 1 = 

The 2nd Principal Component Maximize Var(z 2 ), s.t., ||w 2 || = 1 and orthogonal to w 1 Take the derivative w.r.t. w 2 and setting it equal to 0, we have Premultiply by w 1 T and we get Note that w 1 T w 2 = 0, and w 1 T  w 2 is a scalar, equal to its transpose, therefore And we have That is, w 2 is the eigenvector of  with the second largest eigenvalue 2 = , and so on.

Recall from Linear Algebra Theorem: Eigenvectors associated with different eigenvalues are orthogonal to each other Theorem: A real symmetry matrix A can be transformed into a diagonal matrix by P -1 AP = D where P has its columns as the eigenvectors of A

Recall from Linear Algebra (cont.) Def: Positive definite bilinear form: f (x, x) > 0,  x  0 E.g.: f (x, y) = x T Ay x T Ax > 0  x  0  A an n  n matrix is called a positive definite matrix Def: Positive semidefinite bilinear form: f (x, x)  0,  x E.g.: x T Ax  0,  x  A is called a positive semidefinite matrix Theorem: Matrix A is positive definite if and only if all the eigenvalues of A are positives

What PCA does Consider an R n  R k transformation where the k columns of W are the k leading eigenvectors of S (the estimator to  ), and m is the sample mean Note: if k = n, WW T = W T W = I, so W -1 = W T, or W T W = I k  k, if k < n The transformation will center the data at the origin and rotates the axes to those eigenvectors, and the variances over the new dimensions are equal to the eigenvalues z = W T (x – m) (just like z = w 1 T x, z = w 2 T x, … )

Singular Value Decomposition (SVD) The columns of are eigenvectors of and the columns of are eigenvectors of eigenvalues of both and are square roots of the nonzero

Singular Value Decomposition (SVD)

Latent Semantic Indexing (LSI) Basic idea: explore the correlation between words and documents Two words are correlated when they co-occur together many times Two documents are correlated when they have many words

Latent Semantic Indexing (LSI) Computation: using single value decomposition (SVD) Concept Space m is the number of concepts Rep. of Concepts in term space Concept Rep. of concepts in document space m: number of concepts/topics 

XX SVD: Example: m=2

XX

XX

XX

SVD: Eigenvalues Determining m is usually difficult

SVD: Orthogonality XX u 1 u 2 · = 0 v1v1 v2v2 v 1 · v 2 = 0

XX SVD: Properties rank(S): the maximum number of either row or column vectors within matrix S that are linearly independent. SVD produces the best low rank approximation  X’: rank(X’) = 2 X: rank(X) = 9

SVD: Visualization X =

SVD tries to preserve the Euclidean distance of document vectors