Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.

Slides:



Advertisements
Similar presentations
Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Advertisements

Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Regression. Major Questions Given an economic model involving a relationship between two economic variables, how do we go about specifying the.
Multivariate Data Analysis Principal Component Analysis.
P M V Subbarao Professor Mechanical Engineering Department
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Lecture 7: Principal component analysis (PCA)
Psychology 202b Advanced Psychological Statistics, II February 3, 2011.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
CALIBRATION Prof.Dr.Cevdet Demir
Multivariate R e g r e s s i o n
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Gordon Stringer, UCCS1 Regression Analysis Gordon Stringer.
Ordinary least squares regression (OLS)
Regression Diagnostics Checking Assumptions and Data.
Basics of regression analysis
1 5. Multiway calibration Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Baburao Kamble (Ph.D) University of Nebraska-Lincoln
Classification and Prediction: Regression Analysis
Calibration & Curve Fitting
Sirius™ version 6.0 Sirius™ is a software package for multivariate data analysis and experimental design. Application areas: Spectral analysis and calibration.
Multiple Linear Regression - Matrix Formulation Let x = (x 1, x 2, …, x n )′ be a n  1 column vector and let g(x) be a scalar function of x. Then, by.
Threeway analysis Batch organic synthesis. Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
Linear Regression James H. Steiger. Regression – The General Setup You have a set of data on two variables, X and Y, represented in a scatter plot. You.
1 Chapter 3 Multiple Linear Regression Multiple Regression Models Suppose that the yield in pounds of conversion in a chemical process depends.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Research Methods I Lecture 10: Regression Analysis on SPSS.
Curve-Fitting Regression
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Thomas Knotts. Engineers often: Regress data  Analysis  Fit to theory  Data reduction Use the regression of others  Antoine Equation  DIPPR.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Chapter 5 Residuals, Residual Plots, & Influential points.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Definition and overview of chemometrics. Paul Geladi Head of Research NIRCE Chairperson NIR Nord Unit of Biomass Technology and Chemistry Swedish University.
1 Modelling procedures for directed network of data blocks Agnar Höskuldsson, Centre for Advanced Data Analysis, Copenhagen Data structures : Directed.
Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama
Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”
1 Y SIMPLE REGRESSION MODEL Suppose that a variable Y is a linear function of another variable X, with unknown parameters  1 and  2 that we wish to estimate.
CpSc 881: Machine Learning
Linear Models Alan Lee Sample presentation for STATS 760.
Math 4030 – 11b Method of Least Squares. Model: Dependent (response) Variable Independent (control) Variable Random Error Objectives: Find (estimated)
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Principal Component Analysis (PCA)
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Correlation and Regression Basic Concepts. An Example We can hypothesize that the value of a house increases as its size increases. Said differently,
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Statistical Forecasting
CHAPTER 3 Describing Relationships
Regression Analysis Week 4.
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
So far --> looked at the effect of a discrete variable on a continuous variable t-test, ANOVA, 2-way ANOVA.
1/18/2019 ST3131, Lecture 1.
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
Ch11 Curve Fitting II.
Presentation transcript:

Regression / Calibration MLR, RR, PCR, PLS

Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural Sciences Umeå Technobothnia Vasa

Univariate regression

x y Offset Slope 

x y Offset a Slope b   y = a + bx + 

x y

x y Linear fit Underfit

x y Overfit

x yQuadratic fit

Multivariate linear regression

y = f(x) Works sometimes y = f(x) Works only for a few variables Measurement noise! ∞ possible functions

Xy I K

y = f(x) Simplified by: y = b 0 + b 1 x 1 + b 2 x b K x K + f Linear approximation

y = b 0 + b 1 x 1 + b 2 x b K x K + f y : response x k : predictors b k : regression coefficients b 0 : offset, constant f : residual Nomenclature

Xy I K X, y mean-centered b 0 out

y = b 1 x 1 + b 2 x b K x K + f } I samples

y = b 1 x 1 + b 2 x b K x K +f

Xy I K f b = + y = Xb + fy = Xb + f

X, y known, measurable b, f unknown No solution f must be constrained

The MLR solution Multiple Linear Regression Ordinary Least Squares (OLS)

b = (X’X) -1 X’y Problems? Least squares

3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 One solution

3b 1 + 4b 2 = 1 4b 1 + 5b 2 = 0 b 1 + b 2 = 4 No solution

3b 1 + 4b 2 + b 3 = 1 4b 1 + 5b 2 + b 3 = 0 ∞ solutions

b = (X’X) -1 X’y -K > I ∞ solutions -I > K no solution -error in X -error in y -inverse may not exist -inverse may be unstable

3b 1 + 4b 2 + e = 1 4b 1 + 5b 2 + e = 0 b 1 + b 2 + e = 4 Solution

Wanted solution - I ≥ K - No inverse - No noise in X

Diagnostics y = Xb + fy = Xb + f SS tot = SSmod + SSres R 2 = SSmod / SStot = 1- SSres / SStot Coefficient of determination

Diagnostics y = Xb + fy = Xb + f SSres = f’f RMSEC = [ SSres / (I-A) ] 1/2 Root Mean Squared Error of Calibration

Alternatives to MLR/OLS

Ridge Regression (RR) b = (X’X) -1 X’y I easiest to invert b = (X’X + kI) -1 X’y k (ridge constant) as small as possible

Problems - Choice of ridge constant - No diagnostics

Principal Component Regression (PCR) - I ≥ K -Easy inversion

Principal Component Regression (PCR) X T K A PCA - A ≤ I - T orthogonal - Noise in X removed

Principal Component Regression (PCR) y = Td + f d = (T’T) -1 T’y

Problem How many components used?

Advantage - PCA done on data - Outliers - Classes - Noise in X removed

Partial Least Squares Regression

XYtu

XYtu w’w’q’q’ Outer relationship

XYtu w’w’q’q’ Inner relationship

XYtu w’w’ q’q’ A A A A p’p’

Advantages - X decomposed - Y decomposed - Noise in X left out - Noise in Y left out

PCR, PLS are one component at a time methods After each component, a residual is calculated The next component is calculated on the residual

Another view y = Xb + fy = Xb + f y = Xb RR + f RR y = Xb PCR + f PCR y = Xb PLS + f PLS

Prediction

Xcalycal I K Xtestytest J yhat

Prediction diagnostics y hat = X test b f test = y test -y hat PRESS = f test ’f test RMSEP = [ PRESS / J ] 1/2 Root Mean Squared Error of Prediction

Prediction diagnostics y hat = X test b f test = y test -y hat R 2 test = Q 2 = 1 - f test ’f test /y test ’y test

Some rules of thumb R 2 > PLS comp. R 2 test > 0.5 R 2 - R 2 test < 0.2

Bias f = y - Xb always 0 bias f test = y - y hat bias = 1/J  f test

Leverage - influence b= (X’X) -1 X’y y hat = Xb = X(X’X) -1 X’y = Hy the Hat matrix diagonal elements of H: Leverage

Leverage - influence b= (X’X) -1 X’y y hat = Xb = X(X’X) -1 X’y = Hy the Hat matrix diagonal elements of H: Leverage

Leverage - influence

Residual plot

Residual -Check histogram f -Check variablewise E -Check objectwise E

XYtu w’w’ q’q’ A A A A p’p’

Plotting: line plots Scree plot RMSEC, RMSECV, RMSEP Loading plot against wavel. Score plot against time Residual against sample Residual against y hat T 2 against sample H against sample

Plotting: scatter plots 2D, 3D Score plot Loading plot Biplot H against residual Inner relation t - u Weight wq

Nonlinearities

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Fundamental theory (e.g. going from transmittance to absorbance) -Use extra latent variables in PCR or PLSR -Use transformations of latent variables -Remove disturbing variables -Find subsets that behave linearly

Remedies for nonlinearites. Making nonlinear data fit a linear model or making the model nonlinear. -Use intrinsically nonlinear methods -Locally transform variables X, y, or both nonlinearly (powers, logarithms, adding powers) -Transformation in a neighbourhood (window methods) -Use global transformations (Fourier, Wavelet) -GIFI type discretization