Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate R e g r e s s i o n

Similar presentations


Presentation on theme: "Multivariate R e g r e s s i o n"— Presentation transcript:

1

2 Multivariate R e g r e s s i o n
PLS Squares Least Partial A Standard Tool for : Multivariate R e g r e s s i o n

3 Regression : Modeling dependent variable(s): Y
Chemical property Biological activity By predictor variables: X Chem. composition Chem. structure (Coded)

4 MLR Traditional method: If X-variables are:
few ( # X-variables < # Samples) Uncorrelated (Full Rank X) Noise Free ( when some correlation exist)

5 But ! Data … Instruments Numerous Correlated Noisy Incomplete
Spectrometers Chromatographs Sensor Arrays Numerous Correlated Noisy Incomplete

6 Correlated X : Independent Variables Predictor

7 The relation between 1 two Matrices X and Y 2 PLSR Models:
By a Linear Multivariate Regression 1 2 The Structure of both X and Y Richer results than MLR

8 PLSR is a generalization of MLR
PLSR is able to analyze Data with: Noise Collinearity (Highly Correlated Data) Numerous X-variables (> # samples) Incompleteness in both X and Y PLSR is a generalization of MLR

9 Nonlinear Iterative PArtial Least Squares
History Herman Wold (1975): Modeling of chain matrices by: Nonlinear Iterative PArtial Least Squares Regression between : - a variable matrix - a parameter vector Other parameter vector Fixed

10 Svante Wold & H. Martens (1980):
Completion and modification of Two-blocks ( X, Y ) PLS (simplest) Herman Wold (~2000): Projection to Latent Structures As a more descriptive interpretation

11 One Y-variable: a chemical property
A QSPR example : One Y-variable: a chemical property The Free Energy of unfolding of a protein Quant. description of variation in chem. structure Seven X-variables: 19 different AminoAcids in position 49 of protein Highly Correlated

12 data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 PIE 0.23 -0.48 -0.61 0.45 -0.11 -0.51 0.00 0.15 1.20 1.28 -0.77 0.90 1.56 0.38 0.17 1.85 0.89 0.71 PIF 0.31 -0.60 -0.77 1.54 -0.22 -0.64 0.00 0.13 1.80 1.70 -0.99 1.23 1.79 0.49 -0.04 0.26 2.25 0.96 1.22 DGR -0.55 0.51 1.20 -1.40 0.29 0.76 0.00 -0.25 -2.10 -2.00 0.78 -1.60 -2.60 -1.50 0.09 -0.58 -2.70 -1.70 SAC 254.2 303.6 287.9 282.9 335.0 311.6 224.9 337.2 322.6 324.0 336.6 336.3 366.1 288.5 266.7 283.9 401.8 377.8 295.1 MR 2.126 2.994 2.933 3.458 3.243 1.662 3.856 3.350 3.518 3.860 4.638 2.876 2.279 2.743 5.755 4.791 3.054 Lam -0.02 -1.24 -1.08 -0.11 -1.19 -1.43 0.03 -1.06 0.04 0.12 -2.26 -0.33 -0.05 -0.31 -0.40 -0.53 -0.84 -0.13 Vol 82.2 112.3 103.7 99.1 127.5 120.5 65.0 140.6 131.7 131.5 144.3 132.3 155.8 106.7 88.5 105.3 185.9 162.7 115.6 DDGTS 8.5 8.2 11.0 6.3 8.8 7.1 10.1 16.8 15.0 7.9 13.3 11.2 7.4 9.9 12.0 X Y

13 Symmetrical Distribution
Transformation 12.5 4235 0.2 546 100584 1.097 3.627 -0.699 2.737 5.002 log

14 Scaling Auto Scaling More weights for more informative X-variables
No Knowledge about importance of variables Auto Scaling Scale to unit variance (xi /SD). Centering (xi – xaver). Same weights for all X-variables

15 Numerically More Stable
Auto Scaling Numerically More Stable

16 Base of PLSR Model : T = X W* X-scores ta (a=1,2, …,A)
(usually linear) A few “new” variables : X-scores ta (a=1,2, …,A) Modelers of X Predictors of Y Orthogonal & Linear Combination of X-variables : T = X W* Weights

17 T (X-scores) ta (a=1,2, …,A) X = T P’ + E Y = T Q’ + F Y = XW* Q’ + F
loadings Are: X = T P’ + E Modelers of X: Predictors of Y: Y = T Q’ + F PLS-Regression Coefficients (B) Y = XW* Q’ + F

18 X = T P’ + E X - T P’ = E X - ta pa’ = Xa
Estimation of T : By stepwise subtraction of each component (tap’a) from X X = T P’ + E X - T P’ = E Residual after subtraction of ath component X - ta pa’ = Xa

19 X1 X2 X3 X0= t1p1 +t2p2+ t3p3+ t4p4+… + tapa + E Xa-1 Xa

20 Stepwise “Deflation” of X-matrix
t1 = X0w1 X1 = X – t1 p1’ t2 = X1w2 X2= X1 – t2 p2’ t3 = X2w3 . . Xa-1 = Xa-2 – ta-1 p’a-1 ta = Xa-1 wa Xa= Xa-1 – ta pa’= E

21 Geometrical Interpretation
t,s are modelers of X and predictors of Y

22 Multivariable Y Y PLS-1 PLS-2 PLS-1 models or ? One y at a time
all in a single model PLS-2 PCA Y Rank of Y ( #PCs) If #PCs << # Y variables : One PLS-2 model If #PCs =< # Y variables : PLS-1 models

23 GOOD prediction ability
No of PLS components !! Underfitting If proper : Overfitting GOOD prediction ability

24 Cross Validation: Predictive REsidual Sum of Squares X Y Calibr. Pred.

25 Different # components in the model Different PRESS values
Model with proper # components is The model with min PRESS value

26 PLS Algorithm Nonlinear Iterative PArtial Least Squares
Common and simple Nonlinear Iterative PArtial Least Squares Initially : Transformation, Scaling and Centering of X and Y

27 X = T P’ + E Y = U Q’ + F = T Q’ + F
Base : Y = U Q’ + F = T Q’ + F T = X P P = X’ T X Utilizing X-model T P

28 X Y A One of Y columns For using as X-score Having:
is (X0, or X1, …, or Xa-1) is (Y0, or Y1, …, or Ya-1) Autoscaled Not deflated For a = 1 to A A Getting u (temporary Y-score): One of Y columns For using as X-score

29 B Xa-1 = ua wa’ + E wa= X’a-1ua/u’aua Calculating wa ( X-weights )
Temp. X-loadings Xa-1 = ua wa’ + E wa= X’a-1ua/u’aua Make w’awa=1

30 C Xa-1 = ta wa’ + E ta= Xa-1wa Calculating ta (X-scores):
Scores for both X and Y

31 D Xa-1 = ta pa’ + E pa = Xa-1ta/ta’ta Ya-1 = ta qa’ + F
Calculating pa ( X-loading) and qa (Y-loading) Xa-1 = ta pa’ + E pa = Xa-1ta/ta’ta Ya-1 = ta qa’ + F qa = Ya-1 ta/ta’ta

32 By calculating ta again
E Testing desireness of ua : By calculating ta again (ua )new = Ya-1 qa / qa’ qa wa= X’a-1ua/u’aua (ta)new= Xa-1wa Performing convergence test on it. (ta)new - ta / (ta)new < 10-7

33 Xa = Xa-1 - ta pa’ Ya = Ya-1 - ta qa’
F If No convergence : Goto Using (ua )new B G If convergence : Calculating new X and Y for the next cycle Xa = Xa-1 - ta pa’ Ya = Ya-1 - ta qa’ Or : a=a+1 and Goto B Next a

34 H Y = X B + B0 B = W(P’W)-1Q’ Last Step (when a = A)
PLS-Regression Coefficients (B)

35 X0 Y0 t1 u1 q1 p1 w1 X1 = X0 – t1 p1’ Y1 = Y0 – t1 q1’ Scores Loadings
summary Scores t1 u1 X0 Y0 Loadings q1 p1 w1 X1 = X0 – t1 p1’ Y1 = Y0 – t1 q1’

36 Scores t2 u2 X1 Y1 Loadings q2 p2 w2 X2 = X1 – t2 p2’ Y2 = Y1 – t2 q2’

37 ta ua Ya-1 qa pa wa Xa-1 Xa = Xa-1 – ta pa’= E Ya = Ya-1 – ta qa’ = F
Scores ta ua Xa-1 Ya-1 Loadings qa pa wa Xa = Xa-1 – ta pa’= E Ya = Ya-1 – ta qa’ = F

38 T U X Y W P Q + A , E, and F


Download ppt "Multivariate R e g r e s s i o n"

Similar presentations


Ads by Google