Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Data Analysis Principal Component Analysis.

Similar presentations


Presentation on theme: "Multivariate Data Analysis Principal Component Analysis."— Presentation transcript:

1 Multivariate Data Analysis Principal Component Analysis

2 Principal Component Analysis (PCA) Singular Value Decomposition Eigenvector / eigenvalue calculation

3 Data Matrix (IxK) Reduce variables Improve projections Remove noise Find outliers Find classes X I K

4 PCA Example with 2 variables, 6 objects Find best (most informative) direction in space Describe direction Make projection

5 x1x1 x2x2

6 x1x1 x2x2

7 1st PC

8 Score Residual

9 1st PC Loading p1 Loading p2 Unit vector

10 1st PC Loading p1 = cos(  ) Loading p2 = sin (  ) Unit vector 

11 X t p I K Score vector Loading vector i

12 X t p I K Score vector Loading vector k

13 X t p I K Score vector Loading vector

14 X = t 1 p 1 ’ + t 2 p 2 ’ +... + t A p A ’ + E X=TP’+E X : properly preprocessed (IxK) T: Score matrix (IxA) P: loading matrix (KxA) E: residual matrix (IxK) t a : score vector p a : loading vector

15 The Wine Example People magazine Wise & Gallagher

16 63.5000 40.1000 2.5000 78.0000 61.1000 58.0000 25.1000 0.9000 78.0000 94.1000 46.0000 65.0000 1.7000 78.0000 106.4000 15.7000 102.1000 1.2000 78.0000 173.0000 12.2000 100.0000 1.5000 77.0000 199.7000 8.9000 87.8000 2.0000 76.0000 176.0000 2.7000 17.1000 3.8000 69.0000 373.6000 1.7000 140.0000 1.0000 73.0000 283.7000 1.0000 55.0000 2.1000 79.0000 34.7000 0.2000 50.4000 0.8000 73.0000 36.4000 France Italy Switz Austra Brit U.S.A. Russia Czech Japan Mexico Wine Beer Spirit LifeEx HeartD

17 Beer Wine Spirit LifeEx HeartD 20.9900 68.2600 1.7500 75.9000 153.8700 24.9270 38.6718 0.9132 3.2128 110.8182 Mean Standard Deviation

18 Component Singular value 1 =46% 32% 12% 8% 2%

19 Score 1 (46%) Score 2 (32%) France Italy Switz Austral Brit USA Russia Czech Japan Mex

20 Loading 1 Loading 2 Wine Beer Spirit Life exp. Heart dis.

21 Conclusions Scores = positions of objects in multivariate space Loadings = importance of original variables for new directions Try to explain a large enough portion of X (46+32 = 78%)

22 The Apricot Example Manley & Geladi

23 Wavelength, nm Pseudoabsorbance Appelkoos

24 Component number Singular value Scree plot

25 What is rank? Mathematical rank = max(min(I,K)) Gives zero residual Effective rank = A Separates model from noise

26 ANOVA 68.8269 1.2843 0.0463 0.0045 0.0007 0.0003 0.0002 0.0001 0.0000 70.1634 98.10 1.83 0.07 0.01 0.00 Comp# SSSS%SS%cum 100 98.10 99.93 100 1 2 3 4 5 6 7 8 9 10 Total

27 Score 1 (98%) Score 2 (2%)

28 ANOVA SS tot = SS 1 + SS 2 + SS 3 +...+ SS (I or K) SS tot = 1 + 2 + 3 +...+ (I or K) From largest to smallest!

29 ANOVA X = TP’ + E data = model + residual SStot = SSmod + SSres R 2 = SSmod / SStot = 1 - SSres / SStot Coefficient of determination (often in %)

30 Examples Wines R 2 = SSmod = 78% SSres = 22% 2 Comp. Apricots 1 R 2 = SSmod = 99.93% SSres = 0.07% 2 Comp. Apricots 2 R 2 = SSmod = 100% SSres = ±0.0% 3 Comp.

31 Wavelength, nm Absorbance Outliers removed

32 Singular values Component No outliers 1 =81% 16% 3%

33 Score 2 (16%) Score 3 (3%) Whole fruit No kernel Thin slice

34 Wavelength, nm Loading 2 3

35 Loading 2 Loading 3

36 More nomenclature Score = Latent Variable Loading vector = Eigenvector Effective rank = Pseudorank = Model dimensionality = Number of components SS a = Eigenvalue Singular value = SS a 1/2

37 An analysis sequence 1. Scale, mean-center data 2. Calculate a few components 3. Check scores, loadings 4. Find outliers, groupings, explain 5. Remove outliers

38 An analysis sequence 6. Scale, mean-center data 7. Calculate enough components 8. Try to detemine pseudorank 9. Check score plots 10. Check loading plots 11. Check residuals

39 Residual stdev 0 1 2 3 4 Wines

40 Residual stdev Wines 01 2 3 4


Download ppt "Multivariate Data Analysis Principal Component Analysis."

Similar presentations


Ads by Google