Regression / Calibration MLR, RR, PCR, PLS
Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa
Univariate regression
x y Offset Slope
x y Offset a Slope b y = a + bx +
x y
x y Linear fit Underfit
x y Overfit
x yQuadratic fit
Multivariate linear regression
y = f(x) Works sometimes y = f(x) Works only for a few variables Measurement noise! ∞ possible functions
Xy I K
y = f(x) Simplified by: y = b 0 + b 1 x 1 + b 2 x b K x K + f Linear approximation
y = b 0 + b 1 x 1 + b 2 x b K x K + f y : response x k : predictors b k : regression coefficients b 0 : offset, constant f : residual Nomenclature
Xy I K X, y mean-centered b 0 out
y = b 1 x 1 + b 2 x b K x K + f } I samples
y = b 1 x 1 + b 2 x b K x K +f
Xy I K f b = + y = Xb + fy = Xb + f
X, y known, measurable b, f unknown No solution f must be constrained
The MLR solution Multiple Linear Regression Ordinary Least Squares (OLS)
b = (X’X) -1 X’y Problems? Least squares
3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 One solution
3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 b 1 + b 2 = 4 No solution
3b 1 + 4b 2 + b 3 = 1 4b 1 + 5b 2 + b 3 = 0 ∞ solutions
b = (X’X) -1 X’y -K > I ∞ solutions -I > K no solution -error in X -error in y -inverse may not exist -inverse may be unstable
3b 1 + 4b 2 + e = 1 4b 1 + 5b 2 + e = 0 b 1 + b 2 + e = 4 Solution
Wanted solution - I ≥ K - No inverse - No noise in X
Diagnostics y = Xb + fy = Xb + f SS tot = SSmod + SSres R 2 = SSmod / SStot = 1- SSres / SStot Coefficient of determination
Diagnostics y = Xb + fy = Xb + f SSres = f’f RMSEC = [ SSres / (I-A) ] 1/2 Root Mean Squared Error of Calibration
Alternatives to MLR/OLS
Ridge Regression (RR) b = (X’X) -1 X’y I easiest to invert b = (X’X + kI) -1 X’y k (ridge constant) as small as possible
Problems - Choice of ridge constant - No diagnostics
Principal Component Regression (PCR) - I ≥ K -Easy inversion
Principal Component Regression (PCR) X T K A PCA - A ≤ I - T orthogonal - Noise in X removed
Principal Component Regression (PCR) y = Td + f d = (T’T) -1 T’y
Problem How many components used?
Advantage - PCA done on data - Outliers - Classes - Noise in X removed
Partial Least Squares Regression
XYtu
XYtu w’w’q’q’ Outer relationship
XYtu w’w’q’q’ Inner relationship
XYtu w’w’ q’q’ A A A A p’p’
Advantages - X decomposed - Y decomposed - Noise in X left out - Noise in Y left out
PCR, PLS are one component at a time methods After each component, a residual is calculated The next component is calculated on the residual
Another view y = Xb + fy = Xb + f y = Xb RR + f RR y = Xb PCR + f PCR y = Xb PLS + f PLS
Prediction
Xcalycal I K Xtestytest J yhat
Prediction diagnostics y hat = X test b f test = y test -y hat PRESS = f test ’f test RMSEP = [ PRESS / J ] 1/2 Root Mean Squared Error of Prediction
Prediction diagnostics y hat = X test b f test = y test -y hat R 2 test = Q 2 = 1 - f test ’f test /y test ’y test
Some rules of thumb R 2 > PLS comp. R 2 test > 0.5 R 2 - R 2 test < 0.2
Bias f = y - Xb always 0 bias f test = y - y hat bias = 1/J f test
Leverage - influence b= (X’X) -1 X’y y hat = Xb = X(X’X) -1 X’y = Hy the Hat matrix diagonal elements of H: Leverage
Leverage - influence b= (X’X) -1 X’y y hat = Xb = X(X’X) -1 X’y = Hy the Hat matrix diagonal elements of H: Leverage
Leverage - influence
Residual plot
Residual -Check histogram f -Check variablewise E -Check objectwise E
XYtu w’w’ q’q’ A A A A p’p’
Plotting: line plots Scree plot RMSEC, RMSECV, RMSEP Loading plot against wavel. Score plot against time Residual against sample Residual against y hat T 2 against sample H against sample
Plotting: scatter plots 2D, 3D Score plot Loading plot Biplot H against residual Inner relation t - u Weight wq
Nonlinearities
Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Fundamental theory (e.g. going from transmittance to absorbance) -Use extra latent variables in PCR or PLSR -Use transformations of latent variables -Remove disturbing variables -Find subsets that behave linearly
Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Use intrinsically nonlinear methods -Locally transform variables X, y, or both nonlinearly (powers, logarithms, adding powers) -Transformation in a neighbourhood (window methods) -Use global transformations (Fourier, Wavelet) -GIFI type discretization