Download presentation
Presentation is loading. Please wait.
Published byRuby Campbell Modified over 9 years ago
1
PLS Regression Hervé Abdi The university of Texas at Dallas herve@utdallas.edu
2
An Example: What is Mouthfeel? From Folkenberg D.M., Bredie W.L.P., Martend M., (1999). What is mouthfeel: Sensory-rheological relationship in instant hot cocoa drinks. Journal of Sensory Studies, 14, 181-195. (Data set courtoisie of Marten, H., Marten M. (2001) Multivariate Analysis of Quality: An introduction. London: Wiley. Downloaded from: www.wiley.co.uk/chemometrics Data set: Cocoa-ii.mat Goal. Predict sensory attributes (mouthfell): Dependent variables (Y set) from physical/chemical/rheological properties: Predictors / independent variables (X set)
3
An Example: What is Mouthfeel? 6 Predictors / independent variables (X set) physical/chemical/rheological properties %COCOA %SUGAR %MILK SEDIMENT COLOUR VISCOSITY 10 Dependent variables (Y set) colour cocoa-odour milk-odour thick-txtr mouthfeel smooth-txtr creamy-txtr cocoa-taste milk-taste sweet 14 Samples (n-: without stabilizer, n+: are with stabilizer) 1- 2- 3- 4- 5- 6- 7- 1+ 2+ 3+ 4+ 5+ 6+ 7+
4
20.00 30.00 50.00 2.60 44.89 1.86 20.00 43.30 36.70 2.65 42.77 1.80 20.00 50.00 30.00 2.40 41.64 1.78 26.70 30.00 43.30 3.10 42.37 2.06 26.60 36.70 36.70 3.55 41.04 1.97 33.30 36.70 30.00 4.30 39.14 2.13 40.00 30.00 30.00 4.70 38.31 2.26 20.00 30.00 50.00 0.12 44.25 48.60 20.00 43.30 36.70 0.09 41.98 44.10 20.00 50.00 30.00 0.10 41.18 43.60 26.70 30.00 43.30 0.10 41.13 47.80 26.60 36.70 36.70 0.10 40.39 50.30 33.30 36.70 30.00 0.10 38.85 51.40 40.00 30.00 30.00 0.09 37.91 54.80 X
5
1.67 6.06 7.37 5.94 7.80 8.59 6.51 6.24 6.89 8.48 3.22 6.30 5.10 6.34 8.40 9.09 7.14 7.04 5.17 9.76 4.82 7.09 4.11 6.68 8.29 8.61 6.76 7.26 4.62 10.50 4.90 7.57 3.86 6.79 8.58 5.96 5.46 8.77 3.26 6.69 7.03 7.96 2.99 6.92 8.71 6.42 5.59 8.93 2.76 7.05 10.60 10.24 1.57 6.51 9.70 4.55 4.62 11.44 1.51 5.48 11.11 11.31 1.25 7.04 9.72 3.42 4.11 12.43 0.86 3.91 3.06 6.97 5.40 9.84 9.99 10.67 9.11 7.66 5.71 8.24 6.02 8.61 3.75 10.01 9.92 10.86 8.64 7.66 4.86 8.71 7.94 8.40 2.95 9.61 9.92 10.84 8.26 8.32 4.09 9.67 9.17 9.30 2.86 10.68 11.05 10.48 8.20 10.40 2.22 6.43 10.46 10.14 1.90 10.71 10.64 9.60 7.84 11.05 2.01 7.02 12.40 11.30 1.18 10.64 11.09 7.24 7.23 11.78 1.65 5.59 13.46 11.49 1.56 11.31 11.36 7.22 6.86 12.60 1.06 4.34 Y
6
Why using PLS and PCA and MLR A short tour
7
I by J data sets: PCA, CA, Biplots, etc. I J The beauty of Euclide …
8
I by J I by 1 (with J << I) data sets: Multiple Regression IJ1 The beauty of Euclide
9
I by J I by K data sets: PLS, CANDIS, etc. IJK The beauty of Euclide
10
Why using PLS ? 1.To explain the similarity between the observations (here cocoa samples). 2.To detect the structure in the relationships between dependent and independent variables. 3.To get a graphical representation of the data 4.To predict the value of new observations
11
PLS combines features of Principal Component Analysis (PCA) and Multiple Linear Regression (MLR). Like PCA: PLS extracts factors from X. Like MLR: PLS predicts Y from X Combine PCA & MLR. PLS extracts factors from X in order to predict Y What is PLS Regression ?
12
When to use PLS ? To analyze two data tables describing the same I observations with J predictors and K dependent variables 1 … j … J 1...i...I1...i...I x i,j …... ……... Independent Variables Observations 1 … k … K 1...i...I1...i...I y i,k............... ……... Dependent Variables
13
General principle of PLS: 1 … j … J 1...i...I1...i...I x ij …... ……... Predictors X Observations t 1 … t ℓ... t L 1...i...I1...i...I t i,ℓ …... ……... Latent Variables t ℓ = Xw ℓ 1 … k … K 1...i...I1...i...I y i,k............... ……... Dependent Variables Predict NIPALS ℓ = t ℓ c T
14
PLS: Maps of the observations …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k............... ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T lv 2 lv 1 Observations: t ℓ I i 3 1 2 4
15
PLS: Maps of the variables …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k............... ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T lv 1 lv 2 Circle of correlations lv 2 lv 1 Common map w ℓ & c ℓ x x y x y y y
16
PLS: Predicting Y from X …... x ij t i,ℓ t 1 … t ℓ... t L …... ……... Latent Variables 1 … j … J 1...i...I1...i...I ……... X 1 … k … K y i,k............... ……... t ℓ = Xw ℓ NIPALS ℓ = t ℓ c T t ℓ = Xw ℓ & = t ℓ c T = XB pls Some Magic Here!
17
PLS: How do we explain Y from X? RESS = (data – prediction) 2 Compare Data (Y) with Prediction (Yhat) RESS (REsidual Sum of Squares) 1 … k … K Y 1...i...I1...i...I ℓ = XB pls 1...i...I1...i...I
18
1 … k … K (-1) = X (-1) B pls 2...i...I2...i...I PLS: How do we predict Y from X? How well will we do with NEW data? Cross-validation. Here Jackknife 1 … k … K Y 1...i...I1...i...I Predict y 1 from X (-1) 1 … k … K Y (-1) 12...i...I12...i...I Predict y 2 from X (-2) … etc … Predict y I from X (-I)
19
PLS: How do we predict Y from X? How well will we do with NEW data? Cross-validation. Here Jackknife PRESS = (data – jackknifed prediction) 2 Compare Data (Y) with Jackknifed Prediction (Y jack ) PRESS (Predicted REsidual Sum of Squares) 1 … k … K Y 1...i...I1...i...I jack = XB pls 1...i...I1...i...I
20
PLS Big Question: How Many Latent Variables? Compare RESS and PRESS, or use PRESS. Quick and Dirty: Min(PRESS) => Optimum number of Latent Variables
21
Back to cocoa Goals: Explain and Predict Sensory (Y) from Physico-Chemical (X)
22
20.00 30.00 50.00 2.60 44.89 1.86 20.00 43.30 36.70 2.65 42.77 1.80 20.00 50.00 30.00 2.40 41.64 1.78 26.70 30.00 43.30 3.10 42.37 2.06 26.60 36.70 36.70 3.55 41.04 1.97 33.30 36.70 30.00 4.30 39.14 2.13 40.00 30.00 30.00 4.70 38.31 2.26 20.00 30.00 50.00 0.12 44.25 48.60 20.00 43.30 36.70 0.09 41.98 44.10 20.00 50.00 30.00 0.10 41.18 43.60 26.70 30.00 43.30 0.10 41.13 47.80 26.60 36.70 36.70 0.10 40.39 50.30 33.30 36.70 30.00 0.10 38.85 51.40 40.00 30.00 30.00 0.09 37.91 54.80 X
23
1.67 6.06 7.37 5.94 7.80 8.59 6.51 6.24 6.89 8.48 3.22 6.30 5.10 6.34 8.40 9.09 7.14 7.04 5.17 9.76 4.82 7.09 4.11 6.68 8.29 8.61 6.76 7.26 4.62 10.50 4.90 7.57 3.86 6.79 8.58 5.96 5.46 8.77 3.26 6.69 7.03 7.96 2.99 6.92 8.71 6.42 5.59 8.93 2.76 7.05 10.60 10.24 1.57 6.51 9.70 4.55 4.62 11.44 1.51 5.48 11.11 11.31 1.25 7.04 9.72 3.42 4.11 12.43 0.86 3.91 3.06 6.97 5.40 9.84 9.99 10.67 9.11 7.66 5.71 8.24 6.02 8.61 3.75 10.01 9.92 10.86 8.64 7.66 4.86 8.71 7.94 8.40 2.95 9.61 9.92 10.84 8.26 8.32 4.09 9.67 9.17 9.30 2.86 10.68 11.05 10.48 8.20 10.40 2.22 6.43 10.46 10.14 1.90 10.71 10.64 9.60 7.84 11.05 2.01 7.02 12.40 11.30 1.18 10.64 11.09 7.24 7.23 11.78 1.65 5.59 13.46 11.49 1.56 11.31 11.36 7.22 6.86 12.60 1.06 4.34 Y
24
Correlation within the X set
25
Correlation within the Y set
26
Correlation between X and Y
27
Show The t (latent) variables -0.42 -0.19 -0.34 -0.35 -0.25 -0.17 0.22 -0.20 -0.17 -0.14 0.50 -0.22 -0.13 -0.25 -0.26 -0.11 -0.03 -0.27 0.02 0.33 0.23 -0.36 0.10 0.30 0.41 -0.42 -0.11 0.06 -0.32 0.27 -0.37 0.04 -0.15 0.27 0.19 0.14 -0.08 0.27 0.46 0.03 0.01 0.25 -0.29 0.38 0.07 0.27 -0.02 0.33 0.32 0.25 0.05 -0.22 0.51 0.23 -0.16 -0.50
28
Show w 0.61 -0.15 -0.20 -0.46 -0.22 0.09 0.77 0.08 -0.39 0.06 -0.57 0.38 0.01 -0.70 -0.00 0.41 -0.62 0.00 -0.15 -0.62 0.20 0.69 -0.10 0.28
29
Show c 0.38 0.12 0.07 0.28 0.38 0.11 -0.07 0.25 -0.37 -0.05 -0.30 -0.57 0.15 0.55 -0.18 0.18 0.27 0.41 -0.25 0.36 -0.23 0.46 0.22 0.10 -0.16 0.53 0.09 0.04 0.38 0.03 -0.28 0.30 -0.37 0.03 0.07 -0.50 -0.33 0.09 0.81 -0.16
30
B pls: X to Y (in Z-scores) -0.11 -0.05 0.63 -0.21 -0.36 -0.48 -0.31 -0.09 0.45 -0.18 -0.03 -0.09 -0.13 -0.03 -0.07 0.24 0.15 -0.17 0.04 0.41 0.14 0.15 -0.50 0.24 0.43 0.25 0.16 0.26 -0.50 -0.24 0.32 0.29 -0.80 -0.19 0.19 -0.25 -0.40 0.43 -0.78 -0.33 -1.04 -0.97 1.70 -0.56 -1.10 -0.02 0.06 -1.07 1.54 0.68 0.52 0.5 -0.77 0.71 0.83 0.40 0.42 0.49 -0.65 -0.26
31
B * pls from X to Y (original units) 79.86 43.18 -52.77 29.23 32.63 6.91 4.32 52.51 -50.26 -19.07 -0.06 -0.01 0.15 -0.06 -0.06 -0.16 -0.06 -0.03 0.12 -0.05 -0.01 -0.02 -0.03 -0.01 -0.01 0.08 0.03 -0.05 0.01 0.11 0.07 0.04 -0.12 0.06 0.07 0.08 0.03 0.08 -0.13 -0.07 0.67 0.31 -0.82 -0.22 0.12 -0.33 -0.34 0.52 -0.84 -0.37 -1.85 -0.88 1.47 -0.54 -0.6 -0.02 0.04 -1.10 1.40 0.66 0.08 0.04 -0.06 0.06 0.04 0.04 0.03 0.04 -0.05 -0.02
32
Show RESS & PRESS 1 182.39 8505.47 2 50.86 8318.84 3 30.28 8292.23 4 15.69 8286.95 5 13.00 8299.23 6 11.91 8309.38 < min PRESS for 4 Keep 4 latent variables
33
Plot w & t (1 vs 2)
34
Plot w & c (1 vs 2)
35
Show the circle of correlation
36
Conclusion Useful References (contain bibliography): Abdi (2007, 2003) see www.utd.edu/~herve
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.