Download presentation
Presentation is loading. Please wait.
1
Multivariate Statistical Methods
Principal Components Analysis (PCA) By Jen-pei Liu, PhD Division of Biometry, Department of Agronomy, National Taiwan University and Division of Biostatistics and Bioinformatics National Health Research Institutes 2019/2/24 Copyright by Jen-pei Liu, PhD
2
Principal Components Analysis
Introduction Procedures Properties Examples Summary 2019/2/24 Copyright by Jen-pei Liu, PhD
3
Copyright by Jen-pei Liu, PhD
Introduction Described by K. Pearson (1901) Computing methods by Hotelling (1933) Objective To transform the original variables X1,…,Xp into index variables Z1,…,Zp Z1,…,Zp are linear combinations of X1,…,Xp Z1,…,Zp are independent and are in order of important To describe the variation in the data 2019/2/24 Copyright by Jen-pei Liu, PhD
4
Copyright by Jen-pei Liu, PhD
Introduction Lack of correlation index variables measure different dimensions (domains) Lack of correlation only consider the variance of index variables and do not have to take covariance into consideration Ordering Var(Z1) Var(Z2) … Var(Zp) The Z index variables are called the principal components 2019/2/24 Copyright by Jen-pei Liu, PhD
5
Copyright by Jen-pei Liu, PhD
Introduction Variance of the variation in the full data set can be adequately describe by the few Z index variables Reduction of dimension from 2-digit number to just 2 to 4 principal compoents High correlations in the original variables 2019/2/24 Copyright by Jen-pei Liu, PhD
6
Copyright by Jen-pei Liu, PhD
Introduction 2019/2/24 Copyright by Jen-pei Liu, PhD
7
Copyright by Jen-pei Liu, PhD
Introduction Correlations of Female Sparrows X X X X X5 Total length (X1) Alar length (X2) Length of beak and Head (X3) Length of humerus (X4) Length of keel of sternum (X5) 2019/2/24 Copyright by Jen-pei Liu, PhD
8
Copyright by Jen-pei Liu, PhD
Introduction Coefficients for Components Component Variance X1 X2 X3 X4 X5 2019/2/24 Copyright by Jen-pei Liu, PhD
9
Copyright by Jen-pei Liu, PhD
Introduction Z1=0.452X X X X X5 Variance of Z1 is 3.62 Variance of Z1 accounts for 72.3% (3.62/5.00) of the total variation All coefficients of Z1 are smaller than 1 and sum of squares of these coefficients is equal to 1 Z1 is in fact as the average (or sum) of X1, X2, X3, X4, and X5 Z1 can be interpreted as the index for the size of the sparrow 2019/2/24 Copyright by Jen-pei Liu, PhD
10
Copyright by Jen-pei Liu, PhD
Procedures Data Structure Case X1 X2 … Xp 1 x11 x12 … x1p 2 x21 x22 … x2p . N xn1 xn2 … xnp 2019/2/24 Copyright by Jen-pei Liu, PhD
11
Copyright by Jen-pei Liu, PhD
Procedures The First Component The first component is a linear combination of X1, X2, …, Xp Z1= a11X1+a12X2+…+a1pXp Var(Z1) is as large as possible subject to condition that a112+a122+…+a1p2=1 2019/2/24 Copyright by Jen-pei Liu, PhD
12
Copyright by Jen-pei Liu, PhD
Procedures 2019/2/24 Copyright by Jen-pei Liu, PhD
13
Copyright by Jen-pei Liu, PhD
Procedures The second Component The second component is also a linear combination of X1, X2, …, and Xp Z1= a21X1+a22X2+…+a2pXp Var(Z2) is as large as possible subject to condition that a212+a222+…+a2p2=1, Var(Z2) is the second largest, Z1 and Z2 are not correlated 2019/2/24 Copyright by Jen-pei Liu, PhD
14
Copyright by Jen-pei Liu, PhD
Procedures The third Component The third component is also a linear combination of X1, X2, …, and Xp Z1= a31X1+a32X2+…+a3pXp Var(Z2) is as large as possible subject to condition that a312+a322+…+a3p2=1, Var(Z3) is the second largest, Z1, Z2 and Z3 are not correlated 2019/2/24 Copyright by Jen-pei Liu, PhD
15
Copyright by Jen-pei Liu, PhD
Procedures Continue until all p principal components are computed Covariance matrix of p variables 2019/2/24 Copyright by Jen-pei Liu, PhD
16
Copyright by Jen-pei Liu, PhD
Procedures 2019/2/24 Copyright by Jen-pei Liu, PhD
17
Copyright by Jen-pei Liu, PhD
Procedures 2019/2/24 Copyright by Jen-pei Liu, PhD
18
Copyright by Jen-pei Liu, PhD
Procedures Different variables might have different units and magnitudes PCA might be influenced by these magnitudes and units Standardization to have zero mean and unit variance Covariance on standardized variables is the correlation matrix 2019/2/24 Copyright by Jen-pei Liu, PhD
19
Copyright by Jen-pei Liu, PhD
Procedures Steps of (PCA) Standardizing variables X1, X2,…,Xp to have zero means and unit variances unless that the importance of variables is reflected in their variances Calculate the covariance matrix (correlation matrix) 2019/2/24 Copyright by Jen-pei Liu, PhD
20
Copyright by Jen-pei Liu, PhD
Procedures Steps of (PCA) Find the eigenvalues 1, 2,…, p and their corresponding eigenvectors a1, a2, …, ap The coefficients of the ith principal component Zi is the element of ai and i the variance of Zi Discard any components that accounts for only a small proportion of the variation in the data 2019/2/24 Copyright by Jen-pei Liu, PhD
21
Copyright by Jen-pei Liu, PhD
Properties 2019/2/24 Copyright by Jen-pei Liu, PhD
22
Copyright by Jen-pei Liu, PhD
Properties E(Z)=A V(Z)=AA’=diag{I, i=1,…,p} Cov(Zi,Xj)=aiji Corr(Zi,Xj)=aiji/cjj Corr(Zi,Xj)=aiji, if correlation matrix is used 2019/2/24 Copyright by Jen-pei Liu, PhD
23
Copyright by Jen-pei Liu, PhD
Examples Determination of the number of principal components Depends upon the needs of practitioners The proportion of the total variation explained by the selected principal components is high, e.g., at least 80% If correlation matrix is used, select the principal component with the variance greater than 1 because they accounts for more variation than the original variables (=1) Use scree plot 2019/2/24 Copyright by Jen-pei Liu, PhD
24
Copyright by Jen-pei Liu, PhD
Examples Evaluation of Statistics Course 16 students for 11 items (variables) Evaluation scales: 1(poor or not at all) to 5(excellent, strongly, or difficult) The first two principal components explain 76.0% of total variation and the last four principal components explain only 2.2% 2019/2/24 Copyright by Jen-pei Liu, PhD
25
Copyright by Jen-pei Liu, PhD
Examples 2019/2/24 Copyright by Jen-pei Liu, PhD
26
Copyright by Jen-pei Liu, PhD
Examples Test scores of 10 students in 4 subjects Student Subject Chinese(X1) English(X2) Math(X3) Social(X4) Source: Shen (1998) 2019/2/24 Copyright by Jen-pei Liu, PhD
27
Copyright by Jen-pei Liu, PhD
Examples Correlation Matrix X X X X4 X X X X 2019/2/24 Copyright by Jen-pei Liu, PhD
28
Copyright by Jen-pei Liu, PhD
Examples Eigenvalues and Eigenvectors Cum Eigenvector Eigenvalue Prop Prop X X X X4 2019/2/24 Copyright by Jen-pei Liu, PhD
29
Copyright by Jen-pei Liu, PhD
Examples Because the first two principal components account for 94.14%, we can just use these two principal components The first principal component can be interpreted as the index for the sum of Chinese, English and math The second principal component can be thought as social science 2019/2/24 Copyright by Jen-pei Liu, PhD
30
Copyright by Jen-pei Liu, PhD
Examples The above results can be also obtained by inspecting the correlation matrix Correlations among Chinese, English, and math exceed 0.8 Correlations between Chinese, English, and math with social science are below 0.3 2019/2/24 Copyright by Jen-pei Liu, PhD
31
Copyright by Jen-pei Liu, PhD
Examples Correlation between the first principal component with original variables Corr(Z1,X1)=a111 =0.5897 =0.9692 Corr(Z1,X2)=a121 =0.5682 =0.9339 Corr(Z1,X3)=a131 =0.5657 =0.9298 Corr(Z1,X4)=a14i = =0.1592 2019/2/24 Copyright by Jen-pei Liu, PhD
32
Copyright by Jen-pei Liu, PhD
Examples Correlation between the second principal component with original variables Corr(Z2,X1)=a212 =0.12541.0638=0.1294 Corr(Z2,X2)=a222 = 1.0638= Corr(Z2,X3)=a232 = 1.0638= Corr(Z2,X4)=a242 = 1.0638=0.9856 2019/2/24 Copyright by Jen-pei Liu, PhD
33
Copyright by Jen-pei Liu, PhD
Examples Student 1st Component 2nd Component 2019/2/24 Copyright by Jen-pei Liu, PhD
34
Copyright by Jen-pei Liu, PhD
Examples Correlations of Female Sparrows X X X X X5 Total length (X1) Alar length (X2) Length of beak and Head (X3) Length of humerus (X4) Length of keel of sternum (X5) 2019/2/24 Copyright by Jen-pei Liu, PhD
35
Copyright by Jen-pei Liu, PhD
Examples Coefficients for Components Component Variance X1 X2 X3 X4 X5 2019/2/24 Copyright by Jen-pei Liu, PhD
36
Copyright by Jen-pei Liu, PhD
Examples The first principal component Z1=0.452X X X X X5 An index of bird size The second principal component Z2=-0.051X X X X X5 An index of bird shape 2019/2/24 Copyright by Jen-pei Liu, PhD
37
Copyright by Jen-pei Liu, PhD
Examples The value of the first principal component for the first bird Z1=0.452(-0.542)+0.462(0.725)+0.451(0.177)+ 0.471(0.055)+0.398(-0.33) = 0.064 The value of the second principal component for the first bird Z2=-0.051(-0.542)+0.300(0.725)+0.325(0.177)+ 0.185(0.055)+(-0.877(-0.33) = 0.602 2019/2/24 Copyright by Jen-pei Liu, PhD
38
Copyright by Jen-pei Liu, PhD
Examples Mean Standard Deviation Survivor Nonsurvivor Survivor Nonsurvivor 2019/2/24 Copyright by Jen-pei Liu, PhD
39
Copyright by Jen-pei Liu, PhD
Examples Employment in European Countries AGR MIN MAN PS CON SER FIN SPC TC AGR MIN MAN PS(3) CON SER FIN SPC TC 2019/2/24 Copyright by Jen-pei Liu, PhD
40
Copyright by Jen-pei Liu, PhD
Examples 9 eigenvalues: 3.112(34.6%), 1.809(20.1%), 1.496(16.6%), 1.063(11.8%), 0.710(7.9%) 0.311(3.5%), 0.293(3.3%), 0.204(2.4%), and 0(0.0%) The sum of percent employment is 1 The columns of correlation matrix are linearly dependent The last eigenvalue is 0 2019/2/24 Copyright by Jen-pei Liu, PhD
41
Copyright by Jen-pei Liu, PhD
Examples Select the principal components with eigenvaleues greater than 1 the first 4 principal components that explain 85% of the total variation in the data If we take first two principal components which can account only for 55% of total variation 2019/2/24 Copyright by Jen-pei Liu, PhD
42
Copyright by Jen-pei Liu, PhD
Examples The first principal component Z1=0.51(AGR)+0.37(Min)-0.25(MAN)-0.31(PS)-0.22(CON)-0.38(SER)-0.13(FIN)-0.42(SPS)-0.21(TC) A contrast between AGR(agriculture, forestry, and fishing) and MIN(mining and quarrying) versus others 2019/2/24 Copyright by Jen-pei Liu, PhD
43
Copyright by Jen-pei Liu, PhD
Examples The second principal component Z1=-0.-2(AGR)+0.00(Min)+0.43(MAN) +0.11(PS)-0.24(CON)-0.41(SER) -0.55(FIN)+0.05(SPS)+0.52(TC) A contrast between MAN(manufacturing) and TC(transport and communication) versus CON(construction),SER(service industry) and FIN(finance) 2019/2/24 Copyright by Jen-pei Liu, PhD
44
Copyright by Jen-pei Liu, PhD
2019/2/24 Copyright by Jen-pei Liu, PhD
45
Copyright by Jen-pei Liu, PhD
2019/2/24 Copyright by Jen-pei Liu, PhD
46
Copyright by Jen-pei Liu, PhD
Summary A linear combination of the original variables Try to reduce a large number of variables to a few index variables Index variables are not correlated and ordered in the magnitude of variation Illustration with real examples 2019/2/24 Copyright by Jen-pei Liu, PhD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.