Presentation is loading. Please wait.

Presentation is loading. Please wait.

PLS Regression Hervé Abdi The university of Texas at Dallas

Similar presentations


Presentation on theme: "PLS Regression Hervé Abdi The university of Texas at Dallas"— Presentation transcript:

1 PLS Regression Hervé Abdi The university of Texas at Dallas herve@utdallas.edu

2 An Example: What is Mouthfeel? From Folkenberg D.M., Bredie W.L.P., Martend M., (1999). What is mouthfeel: Sensory-rheological relationship in instant hot cocoa drinks. Journal of Sensory Studies, 14, 181-195. (Data set courtoisie of Marten, H., Marten M. (2001) Multivariate Analysis of Quality: An introduction. London: Wiley. Downloaded from: www.wiley.co.uk/chemometrics Data set: Cocoa-ii.mat Goal. Predict sensory attributes (mouthfell): Dependent variables (Y set) from physical/chemical/rheological properties: Predictors / independent variables (X set)

3 An Example: What is Mouthfeel? 6 Predictors / independent variables (X set) physical/chemical/rheological properties %COCOA %SUGAR %MILK SEDIMENT COLOUR VISCOSITY 10 Dependent variables (Y set) colour cocoa-odour milk-odour thick-txtr mouthfeel smooth-txtr creamy-txtr cocoa-taste milk-taste sweet 14 Samples (n-: without stabilizer, n+: are with stabilizer) 1- 2- 3- 4- 5- 6- 7- 1+ 2+ 3+ 4+ 5+ 6+ 7+

4 20.00 30.00 50.00 2.60 44.89 1.86 20.00 43.30 36.70 2.65 42.77 1.80 20.00 50.00 30.00 2.40 41.64 1.78 26.70 30.00 43.30 3.10 42.37 2.06 26.60 36.70 36.70 3.55 41.04 1.97 33.30 36.70 30.00 4.30 39.14 2.13 40.00 30.00 30.00 4.70 38.31 2.26 20.00 30.00 50.00 0.12 44.25 48.60 20.00 43.30 36.70 0.09 41.98 44.10 20.00 50.00 30.00 0.10 41.18 43.60 26.70 30.00 43.30 0.10 41.13 47.80 26.60 36.70 36.70 0.10 40.39 50.30 33.30 36.70 30.00 0.10 38.85 51.40 40.00 30.00 30.00 0.09 37.91 54.80 X

5 1.67 6.06 7.37 5.94 7.80 8.59 6.51 6.24 6.89 8.48 3.22 6.30 5.10 6.34 8.40 9.09 7.14 7.04 5.17 9.76 4.82 7.09 4.11 6.68 8.29 8.61 6.76 7.26 4.62 10.50 4.90 7.57 3.86 6.79 8.58 5.96 5.46 8.77 3.26 6.69 7.03 7.96 2.99 6.92 8.71 6.42 5.59 8.93 2.76 7.05 10.60 10.24 1.57 6.51 9.70 4.55 4.62 11.44 1.51 5.48 11.11 11.31 1.25 7.04 9.72 3.42 4.11 12.43 0.86 3.91 3.06 6.97 5.40 9.84 9.99 10.67 9.11 7.66 5.71 8.24 6.02 8.61 3.75 10.01 9.92 10.86 8.64 7.66 4.86 8.71 7.94 8.40 2.95 9.61 9.92 10.84 8.26 8.32 4.09 9.67 9.17 9.30 2.86 10.68 11.05 10.48 8.20 10.40 2.22 6.43 10.46 10.14 1.90 10.71 10.64 9.60 7.84 11.05 2.01 7.02 12.40 11.30 1.18 10.64 11.09 7.24 7.23 11.78 1.65 5.59 13.46 11.49 1.56 11.31 11.36 7.22 6.86 12.60 1.06 4.34 Y

6 Why using PLS and PCA and MLR A short tour

7 I by J data sets: PCA, CA, Biplots, etc. I J The beauty of Euclide …

8 I by J  I by 1 (with J << I) data sets: Multiple Regression IJ1 The beauty of Euclide

9 I by J  I by K data sets: PLS, CANDIS, etc. IJK The beauty of Euclide

10 Why using PLS ? 1.To explain the similarity between the observations (here cocoa samples). 2.To detect the structure in the relationships between dependent and independent variables. 3.To get a graphical representation of the data 4.To predict the value of new observations

11 PLS combines features of Principal Component Analysis (PCA) and Multiple Linear Regression (MLR). Like PCA: PLS extracts factors from X. Like MLR: PLS predicts Y from X Combine PCA & MLR. PLS extracts factors from X in order to predict Y What is PLS Regression ?

12 When to use PLS ? To analyze two data tables describing the same I observations with J predictors and K dependent variables 1 … j … J 1...i...I1...i...I x i,j …... ……... Independent Variables Observations 1 … k … K 1...i...I1...i...I y i,k............... ……... Dependent Variables

13 General principle of PLS: 1 … j … J 1...i...I1...i...I x ij …... ……... Predictors X Observations t 1 … t ℓ... t L 1...i...I1...i...I t i,ℓ …... ……... Latent Variables t ℓ = Xw ℓ 1 … k … K 1...i...I1...i...I y i,k............... ……... Dependent Variables Predict NIPALS ℓ = t ℓ c T

14 PLS: Maps of the observations …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k............... ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T lv 2 lv 1 Observations: t ℓ I i 3 1 2 4

15 PLS: Maps of the variables …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k............... ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T lv 1 lv 2 Circle of correlations lv 2 lv 1 Common map w ℓ & c ℓ x x y x y y y

16 PLS: Predicting Y from X …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k............... ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T t ℓ = Xw ℓ & = t ℓ c T = XB pls Some Magic Here!

17 PLS: How do we explain Y from X? RESS =  (data – prediction) 2 Compare Data (Y) with Prediction (Yhat) RESS (REsidual Sum of Squares) 1 … k … K Y 1...i...I1...i...I ℓ = XB pls 1...i...I1...i...I

18 1 … k … K (-1) = X (-1) B pls 2...i...I2...i...I PLS: How do we predict Y from X? How well will we do with NEW data? Cross-validation. Here Jackknife 1 … k … K Y 1...i...I1...i...I Predict y 1 from X (-1) 1 … k … K Y (-1) 12...i...I12...i...I Predict y 2 from X (-2) … etc … Predict y I from X (-I)

19 PLS: How do we predict Y from X? How well will we do with NEW data? Cross-validation. Here Jackknife PRESS =  (data – jackknifed prediction) 2 Compare Data (Y) with Jackknifed Prediction (Y jack ) PRESS (Predicted REsidual Sum of Squares) 1 … k … K Y 1...i...I1...i...I jack = XB pls 1...i...I1...i...I

20 PLS Big Question: How Many Latent Variables? Compare RESS and PRESS, or use PRESS. Quick and Dirty: Min(PRESS) => Optimum number of Latent Variables

21 Back to cocoa Goals: Explain and Predict Sensory (Y) from Physico-Chemical (X)

22 20.00 30.00 50.00 2.60 44.89 1.86 20.00 43.30 36.70 2.65 42.77 1.80 20.00 50.00 30.00 2.40 41.64 1.78 26.70 30.00 43.30 3.10 42.37 2.06 26.60 36.70 36.70 3.55 41.04 1.97 33.30 36.70 30.00 4.30 39.14 2.13 40.00 30.00 30.00 4.70 38.31 2.26 20.00 30.00 50.00 0.12 44.25 48.60 20.00 43.30 36.70 0.09 41.98 44.10 20.00 50.00 30.00 0.10 41.18 43.60 26.70 30.00 43.30 0.10 41.13 47.80 26.60 36.70 36.70 0.10 40.39 50.30 33.30 36.70 30.00 0.10 38.85 51.40 40.00 30.00 30.00 0.09 37.91 54.80 X

23 1.67 6.06 7.37 5.94 7.80 8.59 6.51 6.24 6.89 8.48 3.22 6.30 5.10 6.34 8.40 9.09 7.14 7.04 5.17 9.76 4.82 7.09 4.11 6.68 8.29 8.61 6.76 7.26 4.62 10.50 4.90 7.57 3.86 6.79 8.58 5.96 5.46 8.77 3.26 6.69 7.03 7.96 2.99 6.92 8.71 6.42 5.59 8.93 2.76 7.05 10.60 10.24 1.57 6.51 9.70 4.55 4.62 11.44 1.51 5.48 11.11 11.31 1.25 7.04 9.72 3.42 4.11 12.43 0.86 3.91 3.06 6.97 5.40 9.84 9.99 10.67 9.11 7.66 5.71 8.24 6.02 8.61 3.75 10.01 9.92 10.86 8.64 7.66 4.86 8.71 7.94 8.40 2.95 9.61 9.92 10.84 8.26 8.32 4.09 9.67 9.17 9.30 2.86 10.68 11.05 10.48 8.20 10.40 2.22 6.43 10.46 10.14 1.90 10.71 10.64 9.60 7.84 11.05 2.01 7.02 12.40 11.30 1.18 10.64 11.09 7.24 7.23 11.78 1.65 5.59 13.46 11.49 1.56 11.31 11.36 7.22 6.86 12.60 1.06 4.34 Y

24 Correlation within the X set

25 Correlation within the Y set

26 Correlation between X and Y

27 Show The t (latent) variables -0.42 -0.19 -0.34 -0.35 -0.25 -0.17 0.22 -0.20 -0.17 -0.14 0.50 -0.22 -0.13 -0.25 -0.26 -0.11 -0.03 -0.27 0.02 0.33 0.23 -0.36 0.10 0.30 0.41 -0.42 -0.11 0.06 -0.32 0.27 -0.37 0.04 -0.15 0.27 0.19 0.14 -0.08 0.27 0.46 0.03 0.01 0.25 -0.29 0.38 0.07 0.27 -0.02 0.33 0.32 0.25 0.05 -0.22 0.51 0.23 -0.16 -0.50

28 Show w 0.61 -0.15 -0.20 -0.46 -0.22 0.09 0.77 0.08 -0.39 0.06 -0.57 0.38 0.01 -0.70 -0.00 0.41 -0.62 0.00 -0.15 -0.62 0.20 0.69 -0.10 0.28

29 Show c 0.38 0.12 0.07 0.28 0.38 0.11 -0.07 0.25 -0.37 -0.05 -0.30 -0.57 0.15 0.55 -0.18 0.18 0.27 0.41 -0.25 0.36 -0.23 0.46 0.22 0.10 -0.16 0.53 0.09 0.04 0.38 0.03 -0.28 0.30 -0.37 0.03 0.07 -0.50 -0.33 0.09 0.81 -0.16

30 B pls: X to Y (in Z-scores) -0.11 -0.05 0.63 -0.21 -0.36 -0.48 -0.31 -0.09 0.45 -0.18 -0.03 -0.09 -0.13 -0.03 -0.07 0.24 0.15 -0.17 0.04 0.41 0.14 0.15 -0.50 0.24 0.43 0.25 0.16 0.26 -0.50 -0.24 0.32 0.29 -0.80 -0.19 0.19 -0.25 -0.40 0.43 -0.78 -0.33 -1.04 -0.97 1.70 -0.56 -1.10 -0.02 0.06 -1.07 1.54 0.68 0.52 0.5 -0.77 0.71 0.83 0.40 0.42 0.49 -0.65 -0.26

31 B * pls from X to Y (original units) 79.86 43.18 -52.77 29.23 32.63 6.91 4.32 52.51 -50.26 -19.07 -0.06 -0.01 0.15 -0.06 -0.06 -0.16 -0.06 -0.03 0.12 -0.05 -0.01 -0.02 -0.03 -0.01 -0.01 0.08 0.03 -0.05 0.01 0.11 0.07 0.04 -0.12 0.06 0.07 0.08 0.03 0.08 -0.13 -0.07 0.67 0.31 -0.82 -0.22 0.12 -0.33 -0.34 0.52 -0.84 -0.37 -1.85 -0.88 1.47 -0.54 -0.6 -0.02 0.04 -1.10 1.40 0.66 0.08 0.04 -0.06 0.06 0.04 0.04 0.03 0.04 -0.05 -0.02

32 Show RESS & PRESS 1 182.39 8505.47 2 50.86 8318.84 3 30.28 8292.23 4 15.69 8286.95 5 13.00 8299.23 6 11.91 8309.38 < min PRESS for 4 Keep 4 latent variables

33 Plot w & t (1 vs 2)

34 Plot w & c (1 vs 2)

35 Show the circle of correlation

36 Conclusion Useful References (contain bibliography): Abdi (2007, 2003) see www.utd.edu/~herve


Download ppt "PLS Regression Hervé Abdi The university of Texas at Dallas"

Similar presentations


Ads by Google