Dynamic graphics, Principal Component Analysis 11/26/2018 Ker-Chau Li UCLA department of Statistics Dynamic graphics, Principal Component Analysis
Xlisp-stat (demo) (plot-points x y) 11/26/2018 Xlisp-stat (demo) (plot-points x y) (scatterplot-matrix (list x y z u w)) (spin-plot (list x y z)) Link, remove, select, rescale Examples : (1) simulated data (2) Iris data (3) Boston Housing data
PCA(principal component analysis) 11/26/2018 PCA(principal component analysis) A fundamental tool for reducing dimensionality by finding projections with largest variance (1)Data version (2) Population version Each has a number of variations (3) Let’s begin with an illustration using (pca-model (list x y z))
Find a 2-D plane in 4-D space 11/26/2018 Find a 2-D plane in 4-D space Generate 100 cases of u from uniform(0,1) Generate 100 cases of v from uniform(0,1) Define x = u + v, y= u-v, Apply PCA-model to (x, y,u,v); demo. It still works with small errors (e ~N(0,1)) present: x = u + v + .01 e_1 ; y=u - v +.01e_2 Define x = u + v^2 , y= u - v^2, z = v^2 Apply PCA-model to (x, y, z, u); works fine But not so well with Nonlinear manifold; try ( pca-model (list x y u v))
11/26/2018 Other examples 1-D from 2-D rings Ying and Yang
Data version 1. Construct the sample variance-covariance matrix 11/26/2018 Data version 1. Construct the sample variance-covariance matrix 2. Find the eigenvectors 3. Projection : use each eigenvector to form a linear combination of original variables 4. The larger, the better : the k-th principal component is the projection with the k-th largest eigenvalue
Data Version(alternative view) 11/26/2018 Data Version(alternative view) 1-D data matrix : rank 1 2-D data matrix :rank 2 K-D data matrix : rank k Eigenvectors for 1-D sample covariance matrix: rank 1 Eigenvectors for 2-D sample covariance matrix: rank 2 Eigenvectors for k-D sample matrix Adding i.i.d. noise Connection with automatic basis curve finding (to be discussed later)
Population version Let the sample size tend to the infinity 11/26/2018 Population version Let the sample size tend to the infinity Sample covariance-matrix converges to a matrix which is the population covariance-matrix (due to law of large number) The rest of steps remain the same We shall use the population version for theoretical discussion
Some Basic facts Variance of linear combination of random variables 11/26/2018 Some Basic facts Variance of linear combination of random variables var(a x + b y)= a^2 var(x) + b^2 var(y) + 2 a b cov(x,y) Easier if using matrix representation : (B.1) var ( m’ X)= m’ Cov(X) m here m is a p-vector, X consists of p random variables (x_1, …,x_p)’ From (B.1), it follows that
11/26/2018 Basic facts (Cont.) Maximizing var(m’x) subject to ||m||=1 is the same as Max m’cov(X)m subject to ||m||=1 (here ||m|| denotes the length of the vector m) Eigenvalue decomposition : (B.2) M vi = i vi, where 1 ≥ 2 ≥ …. ≥ p Basic linear algebra tells us that the first eigenvector will do : Solution of max m’ M m subject to ||m||=1 must satisfy M m= 1 m
11/26/2018 Basic facts(cont.) Covariance matrix is degenerated (I.e, some eigenvalues are zero) if data are confined to a lower dimensional space S Rank of covariance matrix = number of non-zero eigenvalues = dim. of the space S This explain why pca works for our first example Why small errors can be tolerated ? Large i.i.d. errors are fine too Heterogeneity is harmful, correlated errors too
11/26/2018 Further discussion No guarantee of finding nonlinear structure like clusters , curves, etc. In fact, sampling properties for pca are mostly developed for normal data Still useful Scaling problem Projection pursuit: guided; random