Download presentation
Presentation is loading. Please wait.
Published byἈδράστεια Μπλέτσας Modified over 6 years ago
1
Dynamic graphics, Principal Component Analysis
11/26/2018 Ker-Chau Li UCLA department of Statistics Dynamic graphics, Principal Component Analysis
2
Xlisp-stat (demo) (plot-points x y)
11/26/2018 Xlisp-stat (demo) (plot-points x y) (scatterplot-matrix (list x y z u w)) (spin-plot (list x y z)) Link, remove, select, rescale Examples : (1) simulated data (2) Iris data (3) Boston Housing data
3
PCA(principal component analysis)
11/26/2018 PCA(principal component analysis) A fundamental tool for reducing dimensionality by finding projections with largest variance (1)Data version (2) Population version Each has a number of variations (3) Let’s begin with an illustration using (pca-model (list x y z))
4
Find a 2-D plane in 4-D space
11/26/2018 Find a 2-D plane in 4-D space Generate 100 cases of u from uniform(0,1) Generate 100 cases of v from uniform(0,1) Define x = u + v, y= u-v, Apply PCA-model to (x, y,u,v); demo. It still works with small errors (e ~N(0,1)) present: x = u + v e_1 ; y=u - v +.01e_2 Define x = u + v^2 , y= u - v^2, z = v^2 Apply PCA-model to (x, y, z, u); works fine But not so well with Nonlinear manifold; try ( pca-model (list x y u v))
5
11/26/2018 Other examples 1-D from 2-D rings Ying and Yang
6
Data version 1. Construct the sample variance-covariance matrix
11/26/2018 Data version 1. Construct the sample variance-covariance matrix 2. Find the eigenvectors 3. Projection : use each eigenvector to form a linear combination of original variables 4. The larger, the better : the k-th principal component is the projection with the k-th largest eigenvalue
7
Data Version(alternative view)
11/26/2018 Data Version(alternative view) 1-D data matrix : rank 1 2-D data matrix :rank 2 K-D data matrix : rank k Eigenvectors for 1-D sample covariance matrix: rank 1 Eigenvectors for 2-D sample covariance matrix: rank 2 Eigenvectors for k-D sample matrix Adding i.i.d. noise Connection with automatic basis curve finding (to be discussed later)
8
Population version Let the sample size tend to the infinity
11/26/2018 Population version Let the sample size tend to the infinity Sample covariance-matrix converges to a matrix which is the population covariance-matrix (due to law of large number) The rest of steps remain the same We shall use the population version for theoretical discussion
9
Some Basic facts Variance of linear combination of random variables
11/26/2018 Some Basic facts Variance of linear combination of random variables var(a x + b y)= a^2 var(x) + b^2 var(y) + 2 a b cov(x,y) Easier if using matrix representation : (B.1) var ( m’ X)= m’ Cov(X) m here m is a p-vector, X consists of p random variables (x_1, …,x_p)’ From (B.1), it follows that
10
11/26/2018 Basic facts (Cont.) Maximizing var(m’x) subject to ||m||=1 is the same as Max m’cov(X)m subject to ||m||=1 (here ||m|| denotes the length of the vector m) Eigenvalue decomposition : (B.2) M vi = i vi, where 1 ≥ 2 ≥ …. ≥ p Basic linear algebra tells us that the first eigenvector will do : Solution of max m’ M m subject to ||m||=1 must satisfy M m= 1 m
11
11/26/2018 Basic facts(cont.) Covariance matrix is degenerated (I.e, some eigenvalues are zero) if data are confined to a lower dimensional space S Rank of covariance matrix = number of non-zero eigenvalues = dim. of the space S This explain why pca works for our first example Why small errors can be tolerated ? Large i.i.d. errors are fine too Heterogeneity is harmful, correlated errors too
12
11/26/2018 Further discussion No guarantee of finding nonlinear structure like clusters , curves, etc. In fact, sampling properties for pca are mostly developed for normal data Still useful Scaling problem Projection pursuit: guided; random
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.