Multivariate R e g r e s s i o n

Name: Multivariate R e g r e s s i o n
Uploaded: 2017-07-17T20:39:59+00:00
Duration: PTM19S20
Description: Multivariate R e g r e s s i o n

Multivariate R e g r e s s i o n
PLS Squares Least Partial A Standard Tool for : Multivariate R e g r e s s i o n

Regression : Modeling dependent variable(s): Y
Chemical property Biological activity By predictor variables: X Chem. composition Chem. structure (Coded)

MLR Traditional method: If X-variables are:
few ( # X-variables < # Samples) Uncorrelated (Full Rank X) Noise Free ( when some correlation exist)

But ! Data … Instruments Numerous Correlated Noisy Incomplete
Spectrometers Chromatographs Sensor Arrays Numerous Correlated Noisy Incomplete

Correlated X : Independent Variables Predictor

The relation between 1 two Matrices X and Y 2 PLSR Models:
By a Linear Multivariate Regression 1 2 The Structure of both X and Y Richer results than MLR

PLSR is a generalization of MLR
PLSR is able to analyze Data with: Noise Collinearity (Highly Correlated Data) Numerous X-variables (> # samples) Incompleteness in both X and Y PLSR is a generalization of MLR

Nonlinear Iterative PArtial Least Squares
History Herman Wold (1975): Modeling of chain matrices by: Nonlinear Iterative PArtial Least Squares Regression between : - a variable matrix - a parameter vector Other parameter vector Fixed

Svante Wold & H. Martens (1980):
Completion and modification of Two-blocks ( X, Y ) PLS (simplest) Herman Wold (~2000): Projection to Latent Structures As a more descriptive interpretation

One Y-variable: a chemical property
A QSPR example : One Y-variable: a chemical property The Free Energy of unfolding of a protein Quant. description of variation in chem. structure Seven X-variables: 19 different AminoAcids in position 49 of protein Highly Correlated

data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 PIE 0.23 -0.48 -0.61 0.45 -0.11 -0.51 0.00 0.15 1.20 1.28 -0.77 0.90 1.56 0.38 0.17 1.85 0.89 0.71 PIF 0.31 -0.60 -0.77 1.54 -0.22 -0.64 0.00 0.13 1.80 1.70 -0.99 1.23 1.79 0.49 -0.04 0.26 2.25 0.96 1.22 DGR -0.55 0.51 1.20 -1.40 0.29 0.76 0.00 -0.25 -2.10 -2.00 0.78 -1.60 -2.60 -1.50 0.09 -0.58 -2.70 -1.70 SAC 254.2 303.6 287.9 282.9 335.0 311.6 224.9 337.2 322.6 324.0 336.6 336.3 366.1 288.5 266.7 283.9 401.8 377.8 295.1 MR 2.126 2.994 2.933 3.458 3.243 1.662 3.856 3.350 3.518 3.860 4.638 2.876 2.279 2.743 5.755 4.791 3.054 Lam -0.02 -1.24 -1.08 -0.11 -1.19 -1.43 0.03 -1.06 0.04 0.12 -2.26 -0.33 -0.05 -0.31 -0.40 -0.53 -0.84 -0.13 Vol 82.2 112.3 103.7 99.1 127.5 120.5 65.0 140.6 131.7 131.5 144.3 132.3 155.8 106.7 88.5 105.3 185.9 162.7 115.6 DDGTS 8.5 8.2 11.0 6.3 8.8 7.1 10.1 16.8 15.0 7.9 13.3 11.2 7.4 9.9 12.0 X Y

Symmetrical Distribution
Transformation 12.5 4235 0.2 546 100584 1.097 3.627 -0.699 2.737 5.002 log

Scaling Auto Scaling More weights for more informative X-variables
No Knowledge about importance of variables Auto Scaling Scale to unit variance (xi /SD). Centering (xi – xaver). Same weights for all X-variables

Numerically More Stable
Auto Scaling Numerically More Stable

Base of PLSR Model : T = X W* X-scores ta (a=1,2, …,A)
(usually linear) A few “new” variables : X-scores ta (a=1,2, …,A) Modelers of X Predictors of Y Orthogonal & Linear Combination of X-variables : T = X W* Weights

T (X-scores) ta (a=1,2, …,A) X = T P’ + E Y = T Q’ + F Y = XW* Q’ + F
loadings Are: X = T P’ + E Modelers of X: Predictors of Y: Y = T Q’ + F PLS-Regression Coefficients (B) Y = XW* Q’ + F

X = T P’ + E X - T P’ = E X - ta pa’ = Xa
Estimation of T : By stepwise subtraction of each component (tap’a) from X X = T P’ + E X - T P’ = E Residual after subtraction of ath component X - ta pa’ = Xa

X1 X2 X3 X0= t1p1 +t2p2+ t3p3+ t4p4+… + tapa + E Xa-1 Xa

Stepwise “Deflation” of X-matrix
t1 = X0w1 X1 = X – t1 p1’ t2 = X1w2 X2= X1 – t2 p2’ t3 = X2w3 . . Xa-1 = Xa-2 – ta-1 p’a-1 ta = Xa-1 wa Xa= Xa-1 – ta pa’= E

Geometrical Interpretation
t,s are modelers of X and predictors of Y

Multivariable Y Y PLS-1 PLS-2 PLS-1 models or ? One y at a time
all in a single model PLS-2 PCA Y Rank of Y ( #PCs) If #PCs << # Y variables : One PLS-2 model If #PCs =< # Y variables : PLS-1 models

GOOD prediction ability
No of PLS components !! Underfitting If proper : Overfitting GOOD prediction ability

Cross Validation: Predictive REsidual Sum of Squares X Y Calibr. Pred.

Different # components in the model Different PRESS values
Model with proper # components is The model with min PRESS value

PLS Algorithm Nonlinear Iterative PArtial Least Squares
Common and simple Nonlinear Iterative PArtial Least Squares Initially : Transformation, Scaling and Centering of X and Y

X = T P’ + E Y = U Q’ + F = T Q’ + F
Base : Y = U Q’ + F = T Q’ + F T = X P P = X’ T X Utilizing X-model T P

X Y A One of Y columns For using as X-score Having:
is (X0, or X1, …, or Xa-1) is (Y0, or Y1, …, or Ya-1) Autoscaled Not deflated For a = 1 to A A Getting u (temporary Y-score): One of Y columns For using as X-score

B Xa-1 = ua wa’ + E wa= X’a-1ua/u’aua Calculating wa ( X-weights )
Temp. X-loadings Xa-1 = ua wa’ + E wa= X’a-1ua/u’aua Make w’awa=1

C Xa-1 = ta wa’ + E ta= Xa-1wa Calculating ta (X-scores):
Scores for both X and Y

D Xa-1 = ta pa’ + E pa = Xa-1ta/ta’ta Ya-1 = ta qa’ + F
Calculating pa ( X-loading) and qa (Y-loading) Xa-1 = ta pa’ + E pa = Xa-1ta/ta’ta Ya-1 = ta qa’ + F qa = Ya-1 ta/ta’ta

By calculating ta again
E Testing desireness of ua : By calculating ta again (ua )new = Ya-1 qa / qa’ qa wa= X’a-1ua/u’aua (ta)new= Xa-1wa Performing convergence test on it. (ta)new - ta / (ta)new < 10-7

Xa = Xa-1 - ta pa’ Ya = Ya-1 - ta qa’
F If No convergence : Goto Using (ua )new B G If convergence : Calculating new X and Y for the next cycle Xa = Xa-1 - ta pa’ Ya = Ya-1 - ta qa’ Or : a=a+1 and Goto B Next a

H Y = X B + B0 B = W(P’W)-1Q’ Last Step (when a = A)
PLS-Regression Coefficients (B)

X0 Y0 t1 u1 q1 p1 w1 X1 = X0 – t1 p1’ Y1 = Y0 – t1 q1’ Scores Loadings
summary Scores t1 u1 X0 Y0 Loadings q1 p1 w1 X1 = X0 – t1 p1’ Y1 = Y0 – t1 q1’

Scores t2 u2 X1 Y1 Loadings q2 p2 w2 X2 = X1 – t2 p2’ Y2 = Y1 – t2 q2’

ta ua Ya-1 qa pa wa Xa-1 Xa = Xa-1 – ta pa’= E Ya = Ya-1 – ta qa’ = F
Scores ta ua Xa-1 Ya-1 Loadings qa pa wa Xa = Xa-1 – ta pa’= E Ya = Ya-1 – ta qa’ = F

T U X Y W P Q + A , E, and F

Multivariate R e g r e s s i o n

Similar presentations

Presentation on theme: "Multivariate R e g r e s s i o n"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multivariate R e g r e s s i o n

Similar presentations

Presentation on theme: "Multivariate R e g r e s s i o n"— Presentation transcript:

Similar presentations

About project

Feedback