Download presentation
Presentation is loading. Please wait.
Published byHarvey Watkins Modified over 9 years ago
1
Regression / Calibration MLR, RR, PCR, PLS
2
Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa paul.geladi@btk.slu.se paul.geladi@syh.fipaul.geladi@btk.slu.sepaul.geladi@syh.fi
3
Univariate regression
4
x y Offset Slope
5
x y Offset a Slope b y = a + bx +
6
x y
7
x y Linear fit Underfit
8
x y Overfit
9
x yQuadratic fit
10
Multivariate linear regression
11
y = f(x) Works sometimes y = f(x) Works only for a few variables Measurement noise! ∞ possible functions
12
Xy I K
13
y = f(x) Simplified by: y = b 0 + b 1 x 1 + b 2 x 2 +... + b K x K + f Linear approximation
14
y = b 0 + b 1 x 1 + b 2 x 2 +... + b K x K + f y : response x k : predictors b k : regression coefficients b 0 : offset, constant f : residual Nomenclature
15
Xy I K X, y mean-centered b 0 out
16
y = b 1 x 1 + b 2 x 2 +... + b K x K + f } I samples
17
y = b 1 x 1 + b 2 x 2 +... + b K x K +f
18
Xy I K f b = + y = Xb + fy = Xb + f
19
X, y known, measurable b, f unknown No solution f must be constrained
20
The MLR solution Multiple Linear Regression Ordinary Least Squares (OLS)
21
b = (X’X) -1 X’y Problems? Least squares
22
3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 One solution
23
3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 b 1 + b 2 = 4 No solution
24
3b 1 + 4b 2 + b 3 = 1 4b 1 + 5b 2 + b 3 = 0 ∞ solutions
25
b = (X’X) -1 X’y -K > I ∞ solutions -I > K no solution -error in X -error in y -inverse may not exist -inverse may be unstable
26
3b 1 + 4b 2 + e = 1 4b 1 + 5b 2 + e = 0 b 1 + b 2 + e = 4 Solution
27
Wanted solution - I ≥ K - No inverse - No noise in X
28
Diagnostics y = Xb + fy = Xb + f SS tot = SSmod + SSres R 2 = SSmod / SStot = 1- SSres / SStot Coefficient of determination
29
Diagnostics y = Xb + fy = Xb + f SSres = f’f RMSEC = [ SSres / (I-A) ] 1/2 Root Mean Squared Error of Calibration
30
Alternatives to MLR/OLS
31
Ridge Regression (RR) b = (X’X) -1 X’y I easiest to invert b = (X’X + kI) -1 X’y k (ridge constant) as small as possible
32
Problems - Choice of ridge constant - No diagnostics
33
Principal Component Regression (PCR) - I ≥ K -Easy inversion
34
Principal Component Regression (PCR) X T K A PCA - A ≤ I - T orthogonal - Noise in X removed
35
Principal Component Regression (PCR) y = Td + f d = (T’T) -1 T’y
36
Problem How many components used?
37
Advantage - PCA done on data - Outliers - Classes - Noise in X removed
38
Partial Least Squares Regression
39
XYtu
40
XYtu w’w’q’q’ Outer relationship
41
XYtu w’w’q’q’ Inner relationship
42
XYtu w’w’ q’q’ A A A A p’p’
43
Advantages - X decomposed - Y decomposed - Noise in X left out - Noise in Y left out
44
PCR, PLS are one component at a time methods After each component, a residual is calculated The next component is calculated on the residual
45
Another view y = Xb + fy = Xb + f y = Xb RR + f RR y = Xb PCR + f PCR y = Xb PLS + f PLS
47
Prediction
48
Xcalycal I K Xtestytest J yhat
49
Prediction diagnostics y hat = X test b f test = y test -y hat PRESS = f test ’f test RMSEP = [ PRESS / J ] 1/2 Root Mean Squared Error of Prediction
50
Prediction diagnostics y hat = X test b f test = y test -y hat R 2 test = Q 2 = 1 - f test ’f test /y test ’y test
51
Some rules of thumb R 2 > 0.65 5 PLS comp. R 2 test > 0.5 R 2 - R 2 test < 0.2
52
Bias f = y - Xb always 0 bias f test = y - y hat bias = 1/J f test
53
Leverage - influence b= (X’X) -1 X’y y hat = Xb = X(X’X) -1 X’y = Hy the Hat matrix diagonal elements of H: Leverage
54
Leverage - influence b= (X’X) -1 X’y y hat = Xb = X(X’X) -1 X’y = Hy the Hat matrix diagonal elements of H: Leverage
55
Leverage - influence
58
Residual plot
59
Residual -Check histogram f -Check variablewise E -Check objectwise E
62
XYtu w’w’ q’q’ A A A A p’p’
63
Plotting: line plots Scree plot RMSEC, RMSECV, RMSEP Loading plot against wavel. Score plot against time Residual against sample Residual against y hat T 2 against sample H against sample
64
Plotting: scatter plots 2D, 3D Score plot Loading plot Biplot H against residual Inner relation t - u Weight wq
65
Nonlinearities
67
Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Fundamental theory (e.g. going from transmittance to absorbance) -Use extra latent variables in PCR or PLSR -Use transformations of latent variables -Remove disturbing variables -Find subsets that behave linearly
68
Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Use intrinsically nonlinear methods -Locally transform variables X, y, or both nonlinearly (powers, logarithms, adding powers) -Transformation in a neighbourhood (window methods) -Use global transformations (Fourier, Wavelet) -GIFI type discretization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.