Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.

Slides:



Advertisements
Similar presentations
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Simple Regression Model
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Ch11 Curve Fitting Dr. Deshi Ye
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Lecture 3 HSPM J716. New spreadsheet layout Coefficient Standard error T-statistic – Coefficient ÷ its Standard error.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
CALIBRATION Prof.Dr.Cevdet Demir
Multivariate R e g r e s s i o n
Multivariate Data Analysis Chapter 4 – Multiple Regression.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Lecture 6: Multiple Regression
Predictive Analysis in Marketing Research
1 5. Multiway calibration Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Simple Linear Regression Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Correlation & Regression
Regression / Calibration MLR, RR, PCR, PLS. Paul Geladi Head of Research NIRCE Unit of Biomass Technology and Chemistry Swedish University of Agricultural.
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc. Revised talk:
Some matrix stuff.
Classification Supervised and unsupervised Tormod Næs Matforsk and University of Oslo.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
Data Mining Manufacturing Data Dave E. Stevens Eastman Chemical Company Kingsport, TN.
1 Multivariate Linear Regression Models Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
Regression Regression relationship = trend + scatter
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
This supervised learning technique uses Bayes’ rule but is different in philosophy from the well known work of Aitken, Taroni, et al. Bayes’ rule: Pr is.
Chimiometrie 2009 Proposed model for Challenge2009 Patrícia Valderrama
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
QUANTITATIVE ANALYSIS OF POLYMORPHIC MIXTURES USING INFRARED SPECTROSCOPY IR Spectroscopy Calibration –Homogeneous Solid-State Mixtures –Multivariate Calibration.
Principal Component Analysis (PCA)
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Regression Analysis1. 2 INTRODUCTION TO EMPIRICAL MODELS LEAST SQUARES ESTIMATION OF THE PARAMETERS PROPERTIES OF THE LEAST SQUARES ESTIMATORS AND ESTIMATION.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 12 1 MER301: Engineering Reliability LECTURE 12: Chapter 6: Linear Regression Analysis.
Principal Component Analysis
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Univariate Point Estimation Confidence Interval Estimation Bivariate: Linear Regression Multivariate: Multiple Regression 1 Chapter 4: Statistical Approaches.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
The simple linear regression model and parameter estimation
Multiple Linear Regression
Kin 304 Regression Linear Regression Least Sum of Squares
Statistics in MSmcDESPOT
Regression.
BPK 304W Regression Linear Regression Least Sum of Squares
The Least-Squares Regression Line
BPK 304W Correlation.
Example of PCR, interpretation of calibration equations
What is Regression Analysis?
Multivariate Statistics
1/18/2019 ST3131, Lecture 1.
Interpretation of Regression Coefficients
Multivariate Linear Regression Models
Checking the data and assumptions before the final analysis.
Ch 4.1 & 4.2 Two dimensions concept
Structural Equation Modeling
Presentation transcript:

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data

Typical examples Spectroscopy: Predict chemistry from spectral measurements Product development: Relating sensory to chemistry data Marketing: Relating sensory data to consumer preferences

Topics covered Simple linear regression The selectivity problem: a reason why multivariate methods are needed The collinearity problem: a reason why data compression is needed The outlier problem: why and how to detect

Simple linear regression One y and one x. Use x to predict y. Use a linear model/equation and fit it by least squares

Data structure Y-variable X-variable Objects, same number in x and y-column

b0b0 b1b1 y=b 0 +b 1 x+e x y Least squares (LS) used for estimation of regression coefficients Simple linear regression

Model Data (X,Y) Regression analysis Future XPrediction Regression analysis Outliers? Pre-processing Interpretation

The selectivity problem A reason why multivariate methods are needed

Can be used for several Y’s also

Multiple linear regression Provides –predicted values –regression coefficients –diagnostics If there are many highly collinear variables –unstable regression equations –difficult to interpret coefficients: many and unstable

y=b 0 +b 1 x 1 +b 2 x 2 +e The two x’s have high correlation Leads to unstable equation/plane (in the direction with little variability) Collinearity, the problem of correlated X-variable Regression in this case is fitting a plane to the data (open circles)

Possible solutions Select the most important wavelengths/variables (stepwise methods) Compress the variables to the most dominating dimensions (PCR, PLS) We will concentrate on the latter (can be combined)

Data compression We will first discuss the situation with one y-variable Focus on ideas and principles Provides regression equation (as above) and plots for interpretation

Model for data compression methods X=TP T +E y=Tq+f T-scores, carrier of information from X to y P,q –loadings E,f – residuals (noise) Centred X and y

Regression by data compression Regression on scores PC1 t-score y q titi PCA to compress data x1x1 x2x2 x3x3

x4 x1 x2 x3 x4 x2 x3 x1 x2 x4 x3 y y y t1 t2 MLR PCR PLS x1 t1 t2

PCR and PLS For each factor/component PCR –Maximize variance of linear combinations of X PLS –Maximize covariance between linear combinations of X and y Each factor is subtracted before the next is computed

Principal component regression (PCR) Uses principal components Solves the collinearity problem, stable solutions Provides plots for interpretation (scores and loadings) Well understood Outlier diagnostics Easy to modify But uses only X to determine components

PLS-regression Easy to compute Stable solutions Provides scores and loadings Often less number of components than PCR Sometimes better predictions

PCR and PLS for several Y- variables PCR is computed for each Y. Each Y is regressed onto the principal components PLS: The algorithm is easily modified. Maximises linear combinations of X and Y. For both methods: Regression equations and plots

Validation is important Measure quality of the predictor Determine A – number of components Compare methods

Prediction testing Calibration Estimate coefficients Testing/validation Predict y, use the coefficients

Cross-validation Predict y, use the coefficients Calibrate, find y=f(x) estimate coefficients

Validation Compute Plot RMSEP versus component Choose the number of components with best RMSEP properties Compare for different methods

RMSEP NIR calibration of protein in wheat. 6 NIR wavelengths 12 calibration samples, 26 test samples MLR

Conceptual illustration of important phenomena Estimation error Model error

Prediction vs. cross-validation Prediction testing: Prediction ability of the predictor at hand. Requires much data. Cross-validation: Property of the method. Better for smaller data set.

Validation One should also plot measured versus predicted y-value Correlation can be computed, but can sometimes be misleading

Plot of measured and predicted protein NIR calibration Example, plot of y versus predicted y

Outlier detection Instrument error or noise Drift of signal (over time) Misprints Samples outside normal range (different population)

Outlier detection Outliers can be detected because –Model for spectral data (X=TP T +E) –Model for relationship between X and y (y=Tq+f)

Outlier detection tools Residuals –X and y-residuals –X-residuals as before, y-residual is difference between measured and predicted y Leverage –h i