Principal Components Analysis Gang Ren 2015.11.25
Background Advances in data collection and storage capabilities during the past decades have led to an information overload in most sciences. Meanwhile, great number of datasets and High-dimensional datasets present many challenges as well as some opportunities for our analysis. One of the problems with high-dimensional datasets is that, in many cases, not all the measured features are “ important” for understanding the underlying phenomena of interest. So, we should take some measures to implement dimension reduction.
Dimension reduction f: x->y In machine learning and statistics , dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration. Formalize: f: x->y (the dimensions of y is lower than x)
Main dimension reduction techniques Principal component analysis(PCA)(主成分分析) Linear discriminant analysis(LDA)(线性判别分析) Local Linear Embedding(LLE) (局部线性嵌入) Laplacian Eigenmaps (拉普拉斯特征映射)
PCA probably one of the most widely-used and well-known of dimension reduction methods, invented by Pearson (1901) and Hotelling (1933). In essence, PCA seeks to reduce the dimension of the data by finding a few orthogonal linear combinations( the PCs ) of the original variables with the largest variance. PCA is a useful statistical technique that has found application in fields such as face recognition, Gene expression analysis and image compression.
Background mathematics Before getting to a description of PCA, we first introduce mathematical concepts that will be used in PCA. standard deviation Variance Covariance The covariance Matrix Eigenvectors and Eigenvalues
Standard deviation The centroid of the points is defined by the mean of each dimension But the mean doesn’t tell us a lot about the data except for a sort of middle point. Here’s an example: A=[0,8,12,20] B=[8,9,11,12] (these two data sets have exactly the same mean (10), but are obviously quite different)
Standard deviation The Standard Deviation (SD) of a data set is a measure of how spread out the data is. And so, as expected, the first set has a much larger standard deviation due to the fact that the data is much more spread out from the mean.
Variance Also present the spread of the data
Covariance However many data sets have more than one dimension, and the aim of the statistical analysis of these data sets is usually to see if there is any relationship between the dimensions. Covariance is always measured between 2 dimensions. If you calculate the covariance between one dimension and itself, you get the variance. Like this:
Covariance If X and Y are two different dimensions ,here is the formula for covariance: Covariance of variables X and Y Mean of variable Y Sum over all n objects Mean of variable X measures the correlation between X and Y cov(X,Y)=0: independent cov(X,Y)>0: move same dir cov(X,Y)<0: move oppo dir
The covariance Matrix Recall that covariance is always measured between 2 dimensions. If we have a data set with more than 2 dimensions, there is more than one covariance measurement that can be calculated. So, the definition for the covariance matrix for a set of data with n dimensions is:
The covariance Matrix An example: We’ll make up the covariance matrix for an imaginary 3 dimensional data set, using the usual dimensions x, y and z. Then, the covariance matrix has 3 rows and 3 columns, and the values are this:
Eigenvectors and Eigenvalues In linear algebra, an eigenvector or characteristic vector of a square matrix is a vector that does not change its direction under the associated linear transformation. In other words—if x is a vector that is not zero, then it is an eigenvector of a square matrix A if Ax is a scalar multiple of x. This condition could be written as the equation: where λ is a number (also called a scalar) known as the eigenvalue or characteristic value associated with the eigenvector x.
Eigenvectors and Eigenvalues identity matrix Solution:
Change of basis In linear algebra, a basis for a vector space of dimension n is a sequence of n vectors (α1, …, αn) with the property that every vector in the space can be expressed uniquely as a linear combination of the basis vectors. A AB B Of which, pi is a row vector to denote the i-th basis, aj is a column vector to denote j-th original data record.
Change of basis Example: (3,2) Basis: (1,0) (0,1)
Example for 2-D to 1-D Example : mean centering
Example for 2-D to 1-D Desirable outcome: How can we get it ? Key observation: variance = largest!
Example for 2-D to 1-D Process: covariance matrix = =
Example for 2-D to 1-D , Eigenvector Eigenvalue
Example for 2-D to 1-D Result:
Example for 2-D to 1-D For this dataset: λ1=2 , λ2=2/5; λ1=2 , λ2=2/5; C1=[1,1]’ , C2=[-1,1]’ ;
Steps of PCA Let be the mean vector (taking the mean of all rows) Adjust the original data by the mean X’ = X – Compute the covariance matrix C of adjusted X Find the eigenvectors and eigenvalues of C Get a matrix P consisted of k ordered eigenvectors Y=PX is the result that want to get
What are the assumptions of PCA? Assume relationships among variables are LINEAR cloud of points in p-dimensional space has linear dimensions that can be effectively summarized by the principal axes. If the structure in the data is NONLINEAR (the cloud of points twists and curves its way through p-dimensional space), the principal axes will not be an efficient and informative summary of the data.
References 1.Carreira-Perpinán M A. A review of dimension reduction techniques[J]. Department of Computer Science. University of Sheffield. Tech. Rep. CS-96-09, 1997, 9: 1-69. 2.Fodor I K. A survey of dimension reduction techniques[J]. 2002. 3.Smith L I. A tutorial on principal components analysis[J]. Cornell University, USA, 2002, 51: 52. 4. Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern classification. John Wiley & Sons, 2012. 5. http://blog.codinglabs.org/articles/pca-tutorial.html 6. http://m.blog.csdn.net/blog/zhang11wu4/8584305
Thanks for the Attention!