10.02.05 1 WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza.

Slides:



Advertisements
Similar presentations
Chapter 9: Simple Regression Continued
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Chapter 7 Statistical Data Treatment and Evaluation
Inference for Regression
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Sampling: Final and Initial Sample Size Determination
POINT ESTIMATION AND INTERVAL ESTIMATION
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Chapter 11- Confidence Intervals for Univariate Data Math 22 Introductory Statistics.
Simple Linear Regression
1 Simple Interval Calculation (SIC-method) theory and applications. Rodionova Oxana Semenov Institute of Chemical Physics RAS & Russian.
Simple Interval Calculation bi-linear modelling method. SIC-method Rodionova Oxana Semenov Institute of Chemical Physics RAS & Russian.
1 Status Classification of MVC Objects Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Physics Russian Chemometric Society Moscow.
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Multiple regression analysis
The Simple Linear Regression Model: Specification and Estimation
Statistics: Data Analysis and Presentation Fr Clinic II.
The Simple Regression Model
T-test.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
1 (Student’s) T Distribution. 2 Z vs. T Many applications involve making conclusions about an unknown mean . Because a second unknown, , is present,
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Standard error of estimate & Confidence interval.
Correlation & Regression
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Things that I think are important Chapter 1 Bar graphs, histograms Outliers Mean, median, mode, quartiles of data Variance and standard deviation of.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Chapter 4-5: Analytical Solutions to OLS
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Confidence Intervals for Means. point estimate – using a single value (or point) to approximate a population parameter. –the sample mean is the best point.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
PARAMETRIC STATISTICAL INFERENCE
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Physics 270 – Experimental Physics. Standard Deviation of the Mean (Standard Error) When we report the average value of n measurements, the uncertainty.
Subset Selection Problem Oxana Rodionova & Alexey Pomerantsev Semenov Institute of Chemical Physics Russian Chemometric Society Moscow.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Statistics Bivariate Analysis By: Student 1, 2, 3 Minutes Exercised Per Day vs. Weighted GPA.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Chapter 10: Confidence Intervals
Review of Basic Statistical Concepts. Normal distribution Population Mean: μ and Standard Deviation: σ.
Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama
ES 07 These slides can be found at optimized for Windows)
AGENDA Review In-Class Group Problems Review. Homework #3 Due on Thursday Do the first problem correctly Difference between what should happen over the.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
WSC-5 Hard and soft modeling. A case study Alexey Pomerantsev Institute of Chemical Physics, Moscow.
10.1 – Estimating with Confidence. Recall: The Law of Large Numbers says the sample mean from a large SRS will be close to the unknown population mean.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
AP Statistics Friday, 05 February 2016 OBJECTIVE TSW review for Monday’s test on confidence intervals. ASSIGNMENT DUE –WS AP Review: Confidence Intervals.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Confidence Interval Estimation Business Statistics: A First Course 5 th Edition.
Measurement, Quantification and Analysis
AP Statistics Chapter 14 Section 1.
The Simple Linear Regression Model: Specification and Estimation
Statistics in Applied Science and Technology
Introduction to Instrumentation Engineering
CONCEPTS OF ESTIMATION
BA 275 Quantitative Business Methods
Simple Linear Regression
Statistics PSY302 Review Quiz One Spring 2017
Presentation transcript:

WSC-4 Simple View on Simple Interval Calculation (SIC) Alexey Pomerantsev, Oxana Rodionova Institute of Chemical Physics, Moscow and Kurt Varmuza Vienna Technical University © Kurt Varmuza

WSC-4 CAC, Lisbon, September 2004

WSC-4 Leisured Agenda 1.Why errors are limited? 2.Simple calculations, indeed! Univariate case 3.Complicated SIC. Bivariate case 4.Conclusions

WSC-4 Part I. Why errors are limited?

WSC-4 Water in wheat. NIR spectra by Lumex C o

WSC-4 Histogram for Y (water contents) 141 samples

WSC-4 Normal Probability Plot for Y 3% 21% 38%

WSC-4 PLS Regression. Whole data set

WSC-4 PLS Regression. Marked “outliers”

WSC-4 PLS Regression. Revised data set

WSC-4 Histogram for Y. Revised data set 124 samples

WSC-4 Normal Probability Plot. Revised data set 31% 81% 96%

WSC-4 Histogram for Y. Revised data set m+m+ m+2  m+3  m-3  m-2  m-m- m

WSC-4 Error Distribution +  -  Normal distribution Truncated normal distribution 3.5  +  -  Both distributions +  - 

WSC-4 Main SIC postulate All errors are limited! There exists Maximum Error Deviation, , such that for any error   Prob { |  | >  } = 0

WSC-4 Part 2. Simple calculations

WSC-4 Case study. Simple Univariate Model xy Training C C C C Test T T T Data y=ax+  Model Error distribution

WSC-4 OLS calibration OLS Calibration is minimizing the Sum of Least Squares

WSC-4 Uncertainties in OLS t 3 (P) is quantile of Student's t-distribution for probability P with 3 degrees of freedom

WSC-4 Maximum Error Deviation is known:  = 0.7 (=2.5s) SIC calibration 22 22 22 22 |  | < 

WSC-4 SIC calibration xya min a max Training C C C C

WSC-4 Region of Possible Values xya min a max Training C C C C RPV

WSC-4 SIC prediction xyv -v - v +v + Test T T T

WSC-4 Object Status. Calibration Set xya min a max Training C C C C Samples C2 & C4 are the boundary objects. They form RPV. Samples C1 & C3 are insiders. They could be removed from the calibration set and RPV doesn’t change.

WSC-4 Object Status. Test Set Let’s consider what happens when a new sample is added to the calibration set.

WSC-4 Object Status. Insider If we add sample T1, RPV doesn’t change. This object is an insider. Prediction interval lies inside error interval

WSC-4 Object Status. Outlier If we add sample T2, RPV disappears. This object is an outlier. Prediction Interval lies out error interval

WSC-4 Object Status. Outsider If we add sample T3, RPV becomes smaller. This object is an outsider. Prediction interval overlaps error interval

WSC-4 v +v + v –v – y y+y+ y–y– SIC-Residual and SIC-Leverage Definition 1. SIC-residual is defined as – This is a characteristic of bias Definition 2. SIC-leverage is defined as – This is a normalized precision  r r  h h They characterize interactions between prediction and error intervals

WSC-4 Object Status Plot Statement 1 An object (x, y) is an insider, iff | r (x, y) |  1 – h (x) Presented by triangle BCD Statement 2 An object (x, y) is an outlier, iff | r (x, y) | > 1 + h (x) Presented by lines AB and DE Using simple algebraic calculus one can prove the following statements

WSC-4 Insiders Outsiders Outliers Absolute outsiders Object Status Classification

WSC-4 OLS Confidence versus SIC Prediction True response value, y, is always located within the SIC prediction interval. This has been confirmed by simulations repeated 100,000 times. Thus Prob{ v - < y < v + } = 1.00 Confidence intervals tends to infinity when P is increased. Confidence intervals are unreasonably wide!

WSC-4 Beta Estimation. Minimum  RPV  = 0.7 RPV  = 0.6 RPV  = 0.5 RPV  = 0.4  = 0.3  > b min = 0.3

WSC-4 Beta Estimation from Regression Residuals e = y measured – y predicted b OLS = max {|e 1 |, |e 2 |,..., |e n |} b OLS = 0.4 b SIC = b OLS C(n) Prob{  < b SIC }=0.90 b SIC = 0.8

WSC Sigma Rule 1s  RMSEC 2s  b min 3s  b OLS 4s  b SIC RMSEC = 0.2 = 1s b min = 0.3 = 1.5s b OLS = 0.4 = 2s b SIC = 0.8 = 4s

WSC-4 Part 3. Complicated SIC. Bivariate case

WSC-4 Octane Rating Example (by K. Esbensen) X-values are NIR-measurements over 226 wavelengths Training set = 24 samples Test set =13 samples Y-values are reference measurements of octane number.

WSC-4 Calibration

WSC-4 PLS Decomposition n X b y = p p 1 1 n 2PC T a = n 2 1 y n 1 – y 0 1 n 1 P L S

WSC Sigma Rule for Octane Example RMSEC = 0.27 = 1s b min = 0.48 = 1.8s b OLS = 0.58 = 2.2s b SIC = 0.88 = 3.3s  = b SIC = 0.88

WSC-4 RPV in Two-Dimensional Case y 1 – y 0 –   t 11 a 1 + t 12 a 2  y 1 – y 0 +  y 2 – y 0 –   t 21 a 1 + t 22 a 2  y 2 – y 0 + ... y n – y 0 –   t n1 a 1 + t n2 a 2  y n – y 0 +  We have a system of 2n =48 inequalities regarding two parameters a 1 and a 2

WSC-4 Region of Possible Values

WSC-4 Close view on RPV. Calibration Set SamplesBoundary Samples 24 C7C9C13C14C18C23 —— RPV RPV in parameter spaceObject Status Plot

WSC-4 v –v – SIC Prediction with Linear Programming Linear Programming Problem Vertex #a1a1 a2a2 t t ay yt t aa2a2 a1a1 Vertex # v +v +

WSC-4 Octane Prediction. Test Set Reference values PLS  2RMSEP SIC prediction Prediction intervals: SIC & PLSObject Status Plot

WSC-4 Conclusions Real errors are limited. The truncated normal distribution is a much more realistic model for the practical applications than unlimited error distribution. Postulating that all errors are limited we can draw out a new concept of data modeling that is the SIC method. It is based on this single assumption and nothing else. SIC approach let us a new view on the old chemometrics problems, like outliers, influential samples, etc. I think that this is interesting and helpful view.

WSC-4 OLS versus SIC SIC-Residuals vs. OLS-ResidualsSIC-Leverages vs. OLS-Leverages SIC Object Status PlotOLS/PLS Influence Plot

WSC-4 Statistical view on OLS & SIC OLSSIC Statistics Deviation Let’s have a sampling {x 1,...x n } from a distribution with finite support [-1,+1]. The mean value a is unknown!