Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.

Regression / Calibration MLR, RR, PCR, PLS

Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi@btk.slu.se paul.geladi@syh.fipaul.geladi@btk.slu.sepaul.geladi@syh.fi

Univariate regression

x y Offset Slope 

x y Offset a Slope b   y = a + bx + 

x y Linear fit Underfit

x y Overfit

x yQuadratic fit

Multivariate linear regression

y = f(x) Works sometimes y = f(x) Works only for a few variables Measurement noise! ∞ possible functions

Xy I K

y = f(x) Simplified by: y = b 0 + b 1 x 1 + b 2 x 2 +... + b K x K + f Linear approximation

y = b 0 + b 1 x 1 + b 2 x 2 +... + b K x K + f y : response x k : predictors b k : regression coefficients b 0 : offset, constant f : residual Nomenclature

Xy I K X, y mean-centered b 0 out

y = b 1 x 1 + b 2 x 2 +... + b K x K + f } I samples

y = b 1 x 1 + b 2 x 2 +... + b K x K +f

Xy I K f b = + y = Xb + fy = Xb + f

X, y known, measurable b, f unknown No solution f must be constrained

The MLR solution Multiple Linear Regression Ordinary Least Squares (OLS)

b = (X’X) -1 X’y Problems? Least squares

3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 One solution

3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 b 1 + b 2 = 4 No solution

3b 1 + 4b 2 + b 3 = 1 4b 1 + 5b 2 + b 3 = 0 ∞ solutions

b = (X’X) -1 X’y -K > I ∞ solutions -I > K no solution -error in X -error in y -inverse may not exist -inverse may be unstable

3b 1 + 4b 2 + e = 1 4b 1 + 5b 2 + e = 0 b 1 + b 2 + e = 4 Solution

Wanted solution - I ≥ K - No inverse - No noise in X

Diagnostics y = Xb + fy = Xb + f SS tot = SSmod + SSres R 2 = SSmod / SStot = 1- SSres / SStot Coefficient of determination

Diagnostics y = Xb + fy = Xb + f SSres = f’f RMSEC = [ SSres / (I-A) ] 1/2 Root Mean Squared Error of Calibration

Alternatives to MLR/OLS

Ridge Regression (RR) b = (X’X) -1 X’y I easiest to invert b = (X’X + kI) -1 X’y k (ridge constant) as small as possible

Problems - Choice of ridge constant - No diagnostics

Principal Component Regression (PCR) - I ≥ K -Easy inversion

Principal Component Regression (PCR) X T K A PCA - A ≤ I - T orthogonal - Noise in X removed

Principal Component Regression (PCR) y = Td + f d = (T’T) -1 T’y

Problem How many components used?

Advantage - PCA done on data - Outliers - Classes - Noise in X removed

Partial Least Squares Regression

XYtu w’w’q’q’ Outer relationship

XYtu w’w’q’q’ Inner relationship

XYtu w’w’ q’q’ A A A A p’p’

Advantages - X decomposed - Y decomposed - Noise in X left out - Noise in Y left out

PCR, PLS are one component at a time methods After each component, a residual is calculated The next component is calculated on the residual

Another view y = Xb + fy = Xb + f y = Xb RR + f RR y = Xb PCR + f PCR y = Xb PLS + f PLS

Prediction

Xcalycal I K Xtestytest J yhat

Prediction diagnostics y hat = X test b f test = y test -y hat PRESS = f test ’f test RMSEP = [ PRESS / J ] 1/2 Root Mean Squared Error of Prediction

Prediction diagnostics y hat = X test b f test = y test -y hat R 2 test = Q 2 = 1 - f test ’f test /y test ’y test

Some rules of thumb R 2 > 0.65 5 PLS comp. R 2 test > 0.5 R 2 - R 2 test < 0.2

Bias f = y - Xb always 0 bias f test = y - y hat bias = 1/J  f test

Leverage - influence b= (X’X) -1 X’y y hat = Xb = X(X’X) -1 X’y = Hy the Hat matrix diagonal elements of H: Leverage

Leverage - influence

Residual plot

Residual -Check histogram f -Check variablewise E -Check objectwise E

XYtu w’w’ q’q’ A A A A p’p’

Plotting: line plots Scree plot RMSEC, RMSECV, RMSEP Loading plot against wavel. Score plot against time Residual against sample Residual against y hat T 2 against sample H against sample

Plotting: scatter plots 2D, 3D Score plot Loading plot Biplot H against residual Inner relation t - u Weight wq

Nonlinearities

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Fundamental theory (e.g. going from transmittance to absorbance) -Use extra latent variables in PCR or PLSR -Use transformations of latent variables -Remove disturbing variables -Find subsets that behave linearly

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Use intrinsically nonlinear methods -Locally transform variables X, y, or both nonlinearly (powers, logarithms, adding powers) -Transformation in a neighbourhood (window methods) -Use global transformations (Fourier, Wavelet) -GIFI type discretization

Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.

Similar presentations

Presentation on theme: "Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.

Similar presentations

Presentation on theme: "Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural."— Presentation transcript:

Similar presentations

About project

Feedback