CS623: Introduction to Computing with Neural Nets (lecture-16) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay
Principal Component Analysis
Eaample: IRIS Data (only 3 values out of 150) IDPetal Length (a 1 ) Petal Width (a 2 ) Sepal Length (a 3 ) Sepal Width (a 4 ) Classific ation Iris- setosa ,Iris- versicol or Iris- virginica
Training and Testing Data Training: 80% of the data; 40 from each class: total 120 Testing: Remaining 30 Do we have to consider all the 4 attributes for classification? Do we have to have 4 neurons in the input layer? Less neurons in the input layer may reduce the overall size of the n/w and thereby reduce training time It will also likely increase the generalization performance (Occam Razor Hypothesis: A simpler hypothesis (i.e., the neural net) generalizes better
The multivariate data X 1 X 2 X 3 X 4 X 5 … X p x 11 x 12 x 13 x 14 x 15 … x 1p x 21 x 22 x 23 x 24 x 25 … x 2p x 31 x 32 x 33 x 34 x 35 … x 3p x 41 x 42 x 43 x 44 x 45 … x 4p … x n1 x n2 x n3 x n4 x n5 … x np
Some preliminaries Sample mean vector: For the i th variable: µ i = (Σ n j=1 x ij )/n Variance for the i th variable: σ i 2 = [Σ n j=1 (x ij - µ i ) 2 ]/ [n-1] Sample covariance: c ab = [Σ n j=1 ((x aj - µ a )(x bj - µ b ))]/ [n-1] This measures the correlation in the data In fact, the correlation coefficient r ab = c ab / σ a σ b
Standardize the variables For each variable x ij Replace the values by y ij = (x ij - µ i )/σ i 2 Correlation Matrix
Short digression: Eigenvalues and Eigenvectors AX=λX a 11 x 1 + a 12 x 2 + a 13 x 3 + … a 1p x p =λx 1 a 21 x 1 + a 22 x 2 + a 23 x 3 + … a 2p x p =λx 2 … a p1 x 1 + a p2 x 2 + a p3 x 3 + … a pp x p =λx p Here, λs are eigenvalues and the solution For each λ is the eigenvector
Short digression: To find the Eigenvalues and Eigenvectors Solve the characteristic function det(A – λI)=0 Example: Characteristic equation (-9-λ)(-6- λ)-28=0 Real eigenvalues: -13, -2 Eigenvector of eigenvalue -13: (-1, 1) Eigenvector of eigenvalue -2: (4, 7) Verify: = λ 0 I= 0 λ
Next step in finding the PCs Find the eigenvalues and eigenvectors of R
Example (from “Multivariate Statistical Methods: A Primer, by Brian Manly, 3 rd edition, 1944) 49 birds: 21 survived in a storm and 28 died. 5 body characteristics given X 1 : body length; X 2 : alar extent; X 3 : beak and head length X 4 : humerus length; X 5 : keel length Could we have predicted the fate from the body charateristic
Eigenvalues and Eigenvectors of R ComponentEigen value First Eigen- vector: V 1 V2V2 V3V3 V4V4 V5V
Which principal components are important? Total variance in the data= λ 1 + λ 2 + λ 3 + λ 4 + λ 5 = sum of diagonals of R= 5 First eigenvalue= ≈ 72% of total variance 5 Second ≈ 10.6%, Third ≈ 7.7%, Fourth ≈ 6.0% and Fifth ≈ 3.3% First PC is the most important and sufficient for studying the classification
Forming the PCs Z 1 = 0.451X X X X X 5 Z 2 = X X X X X 5 For all the 49 birds find the first two principal components This becomes the new data Classify using them
For the first bird X 1 =156, X 2 =245, X 3 =31.6, X 4 =18.5, X 5 =20.5 After standardizing Y 1 =( )/3.65=-0.54, Y 2 =( )/5.1=0.73, Y 3 =( )/0.8=0.17, Y 4 =( )/0.56=0.05, Y 5 =( )/0.99=-0.33 PC 1 for the first bird= Z 1 = 0.45X(-0.54)+ 0.46X(0.725)+0.45X(0.17)+0.47X(0.05)+0.39X(- 0.33) =0.064 Similarly, Z 2 = 0.602
Reduced Classification Data Instead of Use X1X1 X2X2 X3X3 X4X4 X5X5 49 rows Z1Z1 Z2Z2 49rows
Other Multivariate Data Analysis Procedures Factor Analysis Discriminant Analysis Cluster Analysis To be done gradually