Presentation is loading. Please wait.

Presentation is loading. Please wait.

10.02.05 1 WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza.

Similar presentations


Presentation on theme: "10.02.05 1 WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza."— Presentation transcript:

1 10.02.05 1 WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza Vienna Technical University © Kurt Varmuza

2 10.02.05 2 WSC-4 CAC, Lisbon, September 2004

3 10.02.05 3 WSC-4 Leisured Agenda 1.Why errors are limited? 2.Simple calculations, indeed! Univariate case 3.Complicated SIC. Bivariate case 4.Conclusions

4 10.02.05 4 WSC-4 Part I. Why errors are limited?

5 10.02.05 5 WSC-4 Water in wheat. NIR spectra by Lumex C o

6 10.02.05 6 WSC-4 Histogram for Y (water contents) 141 samples

7 10.02.05 7 WSC-4 Normal Probability Plot for Y 3% 21% 38%

8 10.02.05 8 WSC-4 PLS Regression. Whole data set

9 10.02.05 9 WSC-4 PLS Regression. Marked “outliers”

10 10.02.05 10 WSC-4 PLS Regression. Revised data set

11 10.02.05 11 WSC-4 Histogram for Y. Revised data set 124 samples

12 10.02.05 12 WSC-4 Normal Probability Plot. Revised data set 31% 81% 96%

13 10.02.05 13 WSC-4 Histogram for Y. Revised data set m+m+ m+2  m+3  m-3  m-2  m-m- m

14 10.02.05 14 WSC-4 Error Distribution +  -  Normal distribution Truncated normal distribution 3.5  +  -  Both distributions +  - 

15 10.02.05 15 WSC-4 Main SIC postulate All errors are limited! There exists Maximum Error Deviation, , such that for any error   Prob { |  | >  } = 0

16 10.02.05 16 WSC-4 Part 2. Simple calculations

17 10.02.05 17 WSC-4 Case study. Simple Univariate Model xy Training C11.01.28 C22.01.68 C34.04.25 C45.05.32 Test T13.03.35 T24.56.19 T35.55.40 Data y=ax+  Model Error distribution

18 10.02.05 18 WSC-4 OLS calibration OLS Calibration is minimizing the Sum of Least Squares

19 10.02.05 19 WSC-4 Uncertainties in OLS t 3 (P) is quantile of Student's t-distribution for probability P with 3 degrees of freedom

20 10.02.05 20 WSC-4 Maximum Error Deviation is known:  = 0.7 (=2.5s) SIC calibration 22 22 22 22 |  | < 

21 10.02.05 21 WSC-4 SIC calibration xya min a max Training C11.01.280.581.98 C22.01.680.491.19 C34.04.250.891.24 C45.05.320.921.20

22 10.02.05 22 WSC-4 Region of Possible Values xya min a max Training C11.01.280.581.98 C22.01.680.491.19 C34.04.250.891.24 C45.05.320.921.20 RPV

23 10.02.05 23 WSC-4 SIC prediction xyv -v - v +v + Test T13.03.352.773.57 T24.56.194.165.36 T35.55.405.086.55

24 10.02.05 24 WSC-4 Object Status. Calibration Set xya min a max Training C11.01.280.581.98 C22.01.680.491.19 C34.04.250.891.24 C45.05.320.921.20 Samples C2 & C4 are the boundary objects. They form RPV. Samples C1 & C3 are insiders. They could be removed from the calibration set and RPV doesn’t change.

25 10.02.05 25 WSC-4 Object Status. Test Set Let’s consider what happens when a new sample is added to the calibration set.

26 10.02.05 26 WSC-4 Object Status. Insider If we add sample T1, RPV doesn’t change. This object is an insider. Prediction interval lies inside error interval

27 10.02.05 27 WSC-4 Object Status. Outlier If we add sample T2, RPV disappears. This object is an outlier. Prediction Interval lies out error interval

28 10.02.05 28 WSC-4 Object Status. Outsider If we add sample T3, RPV becomes smaller. This object is an outsider. Prediction interval overlaps error interval

29 10.02.05 29 WSC-4 v +v + v –v – y y+y+ y–y– SIC-Residual and SIC-Leverage Definition 1. SIC-residual is defined as – This is a characteristic of bias Definition 2. SIC-leverage is defined as – This is a normalized precision  r r  h h They characterize interactions between prediction and error intervals

30 10.02.05 30 WSC-4 Object Status Plot Statement 1 An object (x, y) is an insider, iff | r (x, y) |  1 – h (x) Presented by triangle BCD Statement 2 An object (x, y) is an outlier, iff | r (x, y) | > 1 + h (x) Presented by lines AB and DE Using simple algebraic calculus one can prove the following statements

31 10.02.05 31 WSC-4 Insiders Outsiders Outliers Absolute outsiders Object Status Classification

32 10.02.05 32 WSC-4 OLS Confidence versus SIC Prediction True response value, y, is always located within the SIC prediction interval. This has been confirmed by simulations repeated 100,000 times. Thus Prob{ v - < y < v + } = 1.00 Confidence intervals tends to infinity when P is increased. Confidence intervals are unreasonably wide!

33 10.02.05 33 WSC-4 Beta Estimation. Minimum  RPV  = 0.7 RPV  = 0.6 RPV  = 0.5 RPV  = 0.4  = 0.3  > b min = 0.3

34 10.02.05 34 WSC-4 Beta Estimation from Regression Residuals e = y measured – y predicted b OLS = max {|e 1 |, |e 2 |,..., |e n |} b OLS = 0.4 b SIC = b OLS C(n) Prob{  < b SIC }=0.90 b SIC = 0.8

35 10.02.05 35 WSC-4 1-2-3-4 Sigma Rule 1s  RMSEC 2s  b min 3s  b OLS 4s  b SIC RMSEC = 0.2 = 1s b min = 0.3 = 1.5s b OLS = 0.4 = 2s b SIC = 0.8 = 4s

36 10.02.05 36 WSC-4 Part 3. Complicated SIC. Bivariate case

37 10.02.05 37 WSC-4 Octane Rating Example (by K. Esbensen) X-values are NIR-measurements over 226 wavelengths Training set = 24 samples Test set =13 samples Y-values are reference measurements of octane number.

38 10.02.05 38 WSC-4 Calibration

39 10.02.05 39 WSC-4 PLS Decomposition n X b y = p p 1 1 n 2PC T a = n 2 1 y n 1 – y 0 1 n 1 P L S

40 10.02.05 40 WSC-4 1-2-3-4 Sigma Rule for Octane Example RMSEC = 0.27 = 1s b min = 0.48 = 1.8s b OLS = 0.58 = 2.2s b SIC = 0.88 = 3.3s  = b SIC = 0.88

41 10.02.05 41 WSC-4 RPV in Two-Dimensional Case y 1 – y 0 –   t 11 a 1 + t 12 a 2  y 1 – y 0 +  y 2 – y 0 –   t 21 a 1 + t 22 a 2  y 2 – y 0 + ... y n – y 0 –   t n1 a 1 + t n2 a 2  y n – y 0 +  We have a system of 2n =48 inequalities regarding two parameters a 1 and a 2

42 10.02.05 42 WSC-4 Region of Possible Values

43 10.02.05 43 WSC-4 Close view on RPV. Calibration Set SamplesBoundary Samples 24 C7C9C13C14C18C23 —— RPV RPV in parameter spaceObject Status Plot

44 10.02.05 44 WSC-4 v –v – SIC Prediction with Linear Programming Linear Programming Problem Vertex #a1a1 a2a2 t t ay 113.9116.36-0.4088.86 214.2218.36-0.3588.90 316.7926.66-0.2489.01 419.9126.61-0.4688.79 520.4113.16-0.9688.30 617.4413.52-0.7488.52 -0.7413.5217.446 88.30-0.9613.1620.415 88.79-0.4626.6119.914 89.01-0.2426.6616.793 88.90-0.3518.3614.222 88.86-0.4016.3613.911 yt t aa2a2 a1a1 Vertex # v +v +

45 10.02.05 45 WSC-4 Octane Prediction. Test Set Reference values PLS  2RMSEP SIC prediction Prediction intervals: SIC & PLSObject Status Plot

46 10.02.05 46 WSC-4 Conclusions Real errors are limited. The truncated normal distribution is a much more realistic model for the practical applications than unlimited error distribution. Postulating that all errors are limited we can draw out a new concept of data modeling that is the SIC method. It is based on this single assumption and nothing else. SIC approach let us a new view on the old chemometrics problems, like outliers, influential samples, etc. I think that this is interesting and helpful view.

47 10.02.05 47 WSC-4 OLS versus SIC SIC-Residuals vs. OLS-ResidualsSIC-Leverages vs. OLS-Leverages SIC Object Status PlotOLS/PLS Influence Plot

48 10.02.05 48 WSC-4 Statistical view on OLS & SIC OLSSIC Statistics Deviation Let’s have a sampling {x 1,...x n } from a distribution with finite support [-1,+1]. The mean value a is unknown!


Download ppt "10.02.05 1 WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza."

Similar presentations


Ads by Google