Diagnostics and Remedial Measures

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Regression of NFL Scores on Vegas Line – 2007 Regular Season.
Inference for Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Inference for Regression
Objectives (BPS chapter 24)
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Regression Diagnostics Checking Assumptions and Data.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
STA291 Statistical Methods Lecture 27. Inference for Regression.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Model Building III – Remedial Measures KNNL – Chapter 11.
Chapter 3: Diagnostics and Remedial Measures
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Autocorrelation in Time Series KNNL – Chapter 12.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Residual Analysis for ANOVA Models KNNL – Chapter 18.
ANOVA, Regression and Multiple Regression March
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
BPS - 5th Ed. Chapter 231 Inference for Regression.
Forecasting. Model with indicator variables The choice of a forecasting technique depends on the components identified in the time series. The techniques.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Lecture 9 Forecasting. Introduction to Forecasting * * * * * * * * o o o o o o o o Model 1Model 2 Which model performs better? There are many forecasting.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 13 Simple Linear Regression
Regression Analysis AGEC 784.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation, Bivariate Regression, and Multiple Regression
Statistics for Managers using Microsoft Excel 3rd Edition
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Chapter 11: Simple Linear Regression
CHAPTER 12 More About Regression
Chapter 12: Regression Diagnostics
(Residuals and
Checking Regression Model Assumptions
I271B Quantitative Methods
Diagnostics and Transformation for SLR
CHAPTER 26: Inference for Regression
Checking Regression Model Assumptions
Autocorrelation.
Single-Factor Studies
Chapter 14 – Correlation and Simple Regression
Residuals The residuals are estimate of the error
Single-Factor Studies
Undergraduated Econometrics
When You See (This), You Think (That)
Basic Practice of Statistics - 3rd Edition Inference for Regression
CHAPTER 12 More About Regression
Multiple Regression Numeric Response variable (Y)
Product moment correlation
SIMPLE LINEAR REGRESSION
The Examination of Residuals
Chapter 13 Additional Topics in Regression Analysis
CHAPTER 12 More About Regression
Linear Regression and Correlation
Autocorrelation.
Diagnostics and Transformation for SLR
Inference for Regression
Model Adequacy Checking
Diagnostics and Remedial Measures
Presentation transcript:

Diagnostics and Remedial Measures KNNL – Chapter 3

Diagnostics for Predictor Variables Problems can occur when: Outliers exist among X levels X levels are associated with run order when experiment is run sequentially Useful plots of X levels Dot plot for discrete data Histogram Box Plot Sequence Plot (X versus Run #)

Residuals

Model Departures Detected With Residuals and Plots Relation between Y and X is not linear Errors have non-constant variance Errors are not uncorrelated Existence of Outlying Observations Non-normal Errors Missing predictor variable(s) Common Plots Residuals/Absolute Residuals versus Predictor Variable Residuals/Absolute Residuals versus Predicted Values Residuals versus Omitted variables Residuals versus Time Box Plots, Histograms, Normal Probability Plots

Detecting Nonlinearity of Regression Function Plot Residuals versus X Random Cloud around 0  Linear Relation U-Shape or Inverted U-Shape  Nonlinear Relation Maps Distribution Example (Table 3.1,Figure 3.5, p.10)

Non-Constant Error Variance / Outliers / Non-Independence Plot Residuals versus X or Predicted Values Random Cloud around 0  Linear Relation Funnel Shape  Non-constant Variance Outliers fall far above (positive) or below (negative) the general cloud pattern Plot absolute Residuals, squared residuals, or square root of absolute residuals Positive Association  Non-constant Variance Measurements made over time: Plot Residuals versus Time Order (Expect Random Cloud if independent) Linear Trend  Process “improving” or “worsening” over time Cyclical Trend  Measurements close in time are similar

Non-Normal Errors Box-Plot of Residuals – Can confirm symmetry and lack of outliers Check Proportion that lie within 1 standard deviation from 0, 2 SD, etc, where SD=sqrt(MSE) Normal probability plot of residual versus expected values under normality – should fall approximately on a straight line (Only works well with moderate to large samples) qqnorm(e); qqline(e) in R

Omission of Important Predictors If data are available on other variables, plots of residuals versus X, separate for each level or sub-ranges of the new variable(s) can lead to adding new predictor variable(s) If, for instance, if residuals from a regression of salary versus experience was fit, we could view residuals versus experience for males and females separately, if one group tends to have positive residuals, and the other negative, then add gender to model

Test for Independence – Runs Test Runs Test (Presumes data are in time order) Write out the sequence of +/- signs of the residuals Count n1 = # of positive residuals, n2 = # of negative residuals Count u = # of “runs” of positive and negative residuals If n1 + n2 ≤ 20, refer to Table of critical values (Not random if u is too small) If n1 + n2 > 20, use a large-sample (approximate) z-test:

Test For Independence - Durbin-Watson Test

Test for Normality of Residuals Correlation Test Obtain correlation between observed residuals and expected values under normality (see slide 7) Compare correlation with critical value based on a-level from Table B.6, page 1329 A good approximation for the a=0.05 critical value is: 1.02-1/sqrt(10n) Reject the null hypothesis of normal errors if the correlation falls below the table value Shapiro-Wilk Test – Performed by most software packages. Related to correlation test, but more complex calculations – see NFL Point Spreads and Actual Scores Case Study for description of one version

Equal (Homogeneous) Variance - I

Equal (Homogeneous) Variance - II

Linearity of Regression

Remedial Measures Nonlinear Relation – Add polynomials, fit exponential regression function, or transform Y and/or X Non-Constant Variance – Weighted Least Squares, transform Y and/or X, or fit Generalized Linear Model Non-Independence of Errors – Transform Y or use Generalized Least Squares Non-Normality of Errors – Box-Cox tranformation, or fit Generalized Linear Model Omitted Predictors – Include important predictors in a multiple regression model Outlying Observations – Robust Estimation

Transformations for Non-Linearity – Constant Variance X’ = 1/X X’ = e-X X’ = X2 X’ = eX X’ = √X X’ = ln(X)

Transformations for Non-Linearity – Non-Constant Variance Y’ = 1/Y Y’ = √Y Y’ = ln(Y)

Box-Cox Transformations Automatically selects a transformation from power family with goal of obtaining: normality, linearity, and constant variance (not always successful, but widely used) Goal: Fit model: Y’ = b0 + b1X + e for various power transformations on Y, and selecting transformation producing minimum SSE (maximum likelihood) Procedure: over a range of l from, say -2 to +2, obtain Wi and regress Wi on X (assuming all Yi > 0, although adding constant won’t affect shape or spread of Y distribution)

Lowess (Smoothed) Plots Nonparametric method of obtaining a smooth plot of the regression relation between Y and X Fits regression in small neighborhoods around points along the regression line on the X axis Weights observations closer to the specific point higher than more distant points Re-weights after fitting, putting lower weights on larger residuals (in absolute value) Obtains fitted value for each point after “final” regression is fit Model is plotted along with linear fit, and confidence bands, linear fit is good if lowess lies within bands