Multivariate R e g r e s s i o n PLS Squares Least Partial A Standard Tool for : Multivariate R e g r e s s i o n
Regression : Modeling dependent variable(s): Y Chemical property Biological activity By predictor variables: X Chem. composition Chem. structure (Coded)
MLR Traditional method: If X-variables are: few ( # X-variables < # Samples) Uncorrelated (Full Rank X) Noise Free ( when some correlation exist)
But ! Data … Instruments Numerous Correlated Noisy Incomplete Spectrometers Chromatographs Sensor Arrays Numerous Correlated Noisy Incomplete
Correlated X : Independent Variables Predictor
The relation between 1 two Matrices X and Y 2 PLSR Models: By a Linear Multivariate Regression 1 2 The Structure of both X and Y Richer results than MLR
PLSR is a generalization of MLR PLSR is able to analyze Data with: Noise Collinearity (Highly Correlated Data) Numerous X-variables (> # samples) Incompleteness in both X and Y PLSR is a generalization of MLR
Nonlinear Iterative PArtial Least Squares History Herman Wold (1975): Modeling of chain matrices by: Nonlinear Iterative PArtial Least Squares Regression between : - a variable matrix - a parameter vector Other parameter vector Fixed
Svante Wold & H. Martens (1980): Completion and modification of Two-blocks ( X, Y ) PLS (simplest) Herman Wold (~2000): Projection to Latent Structures As a more descriptive interpretation
One Y-variable: a chemical property A QSPR example : One Y-variable: a chemical property The Free Energy of unfolding of a protein Quant. description of variation in chem. structure Seven X-variables: 19 different AminoAcids in position 49 of protein Highly Correlated
data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 PIE 0.23 -0.48 -0.61 0.45 -0.11 -0.51 0.00 0.15 1.20 1.28 -0.77 0.90 1.56 0.38 0.17 1.85 0.89 0.71 PIF 0.31 -0.60 -0.77 1.54 -0.22 -0.64 0.00 0.13 1.80 1.70 -0.99 1.23 1.79 0.49 -0.04 0.26 2.25 0.96 1.22 DGR -0.55 0.51 1.20 -1.40 0.29 0.76 0.00 -0.25 -2.10 -2.00 0.78 -1.60 -2.60 -1.50 0.09 -0.58 -2.70 -1.70 SAC 254.2 303.6 287.9 282.9 335.0 311.6 224.9 337.2 322.6 324.0 336.6 336.3 366.1 288.5 266.7 283.9 401.8 377.8 295.1 MR 2.126 2.994 2.933 3.458 3.243 1.662 3.856 3.350 3.518 3.860 4.638 2.876 2.279 2.743 5.755 4.791 3.054 Lam -0.02 -1.24 -1.08 -0.11 -1.19 -1.43 0.03 -1.06 0.04 0.12 -2.26 -0.33 -0.05 -0.31 -0.40 -0.53 -0.84 -0.13 Vol 82.2 112.3 103.7 99.1 127.5 120.5 65.0 140.6 131.7 131.5 144.3 132.3 155.8 106.7 88.5 105.3 185.9 162.7 115.6 DDGTS 8.5 8.2 11.0 6.3 8.8 7.1 10.1 16.8 15.0 7.9 13.3 11.2 7.4 9.9 12.0 X Y
Symmetrical Distribution Transformation 12.5 4235 0.2 546 100584 1.097 3.627 -0.699 2.737 5.002 log
Scaling Auto Scaling More weights for more informative X-variables No Knowledge about importance of variables Auto Scaling Scale to unit variance (xi /SD). Centering (xi – xaver). Same weights for all X-variables
Numerically More Stable Auto Scaling Numerically More Stable
Base of PLSR Model : T = X W* X-scores ta (a=1,2, …,A) (usually linear) A few “new” variables : X-scores ta (a=1,2, …,A) Modelers of X Predictors of Y Orthogonal & Linear Combination of X-variables : T = X W* Weights
T (X-scores) ta (a=1,2, …,A) X = T P’ + E Y = T Q’ + F Y = XW* Q’ + F loadings Are: X = T P’ + E Modelers of X: Predictors of Y: Y = T Q’ + F PLS-Regression Coefficients (B) Y = XW* Q’ + F
X = T P’ + E X - T P’ = E X - ta pa’ = Xa Estimation of T : By stepwise subtraction of each component (tap’a) from X X = T P’ + E X - T P’ = E Residual after subtraction of ath component X - ta pa’ = Xa
X1 X2 X3 X0= t1p1 +t2p2+ t3p3+ t4p4+… + tapa + E Xa-1 Xa
Stepwise “Deflation” of X-matrix t1 = X0w1 X1 = X – t1 p1’ t2 = X1w2 X2= X1 – t2 p2’ t3 = X2w3 . . Xa-1 = Xa-2 – ta-1 p’a-1 ta = Xa-1 wa Xa= Xa-1 – ta pa’= E
Geometrical Interpretation t,s are modelers of X and predictors of Y
Multivariable Y Y PLS-1 PLS-2 PLS-1 models or ? One y at a time all in a single model PLS-2 PCA Y Rank of Y ( #PCs) If #PCs << # Y variables : One PLS-2 model If #PCs =< # Y variables : PLS-1 models
GOOD prediction ability No of PLS components !! Underfitting If proper : Overfitting GOOD prediction ability
Cross Validation: Predictive REsidual Sum of Squares X Y Calibr. Pred.
Different # components in the model Different PRESS values Model with proper # components is The model with min PRESS value
PLS Algorithm Nonlinear Iterative PArtial Least Squares Common and simple Nonlinear Iterative PArtial Least Squares Initially : Transformation, Scaling and Centering of X and Y
X = T P’ + E Y = U Q’ + F = T Q’ + F Base : Y = U Q’ + F = T Q’ + F T = X P P = X’ T X Utilizing X-model T P
X Y A One of Y columns For using as X-score Having: is (X0, or X1, …, or Xa-1) is (Y0, or Y1, …, or Ya-1) Autoscaled Not deflated For a = 1 to A A Getting u (temporary Y-score): One of Y columns For using as X-score
B Xa-1 = ua wa’ + E wa= X’a-1ua/u’aua Calculating wa ( X-weights ) Temp. X-loadings Xa-1 = ua wa’ + E wa= X’a-1ua/u’aua Make w’awa=1
C Xa-1 = ta wa’ + E ta= Xa-1wa Calculating ta (X-scores): Scores for both X and Y
D Xa-1 = ta pa’ + E pa = Xa-1ta/ta’ta Ya-1 = ta qa’ + F Calculating pa ( X-loading) and qa (Y-loading) Xa-1 = ta pa’ + E pa = Xa-1ta/ta’ta Ya-1 = ta qa’ + F qa = Ya-1 ta/ta’ta
By calculating ta again E Testing desireness of ua : By calculating ta again (ua )new = Ya-1 qa / qa’ qa wa= X’a-1ua/u’aua (ta)new= Xa-1wa Performing convergence test on it. (ta)new - ta / (ta)new < 10-7
Xa = Xa-1 - ta pa’ Ya = Ya-1 - ta qa’ F If No convergence : Goto Using (ua )new B G If convergence : Calculating new X and Y for the next cycle Xa = Xa-1 - ta pa’ Ya = Ya-1 - ta qa’ Or : a=a+1 and Goto B Next a
H Y = X B + B0 B = W(P’W)-1Q’ Last Step (when a = A) PLS-Regression Coefficients (B)
X0 Y0 t1 u1 q1 p1 w1 X1 = X0 – t1 p1’ Y1 = Y0 – t1 q1’ Scores Loadings summary Scores t1 u1 X0 Y0 Loadings q1 p1 w1 X1 = X0 – t1 p1’ Y1 = Y0 – t1 q1’
Scores t2 u2 X1 Y1 Loadings q2 p2 w2 X2 = X1 – t2 p2’ Y2 = Y1 – t2 q2’
ta ua Ya-1 qa pa wa Xa-1 Xa = Xa-1 – ta pa’= E Ya = Ya-1 – ta qa’ = F Scores ta ua Xa-1 Ya-1 Loadings qa pa wa Xa = Xa-1 – ta pa’= E Ya = Ya-1 – ta qa’ = F
T U X Y W P Q + A , E, and F