Model Checking Using residuals to check the validity of the linear regression model assumptions.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chapter 12 Inference for Linear Regression
Diagnostics – Part I Using plots to check to see if the assumptions we made about the model are realistic.
Inference for Regression
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Objectives (BPS chapter 24)
Simple Linear Regression
9. SIMPLE LINEAR REGESSION AND CORRELATION
1 Simple Linear Regression and Correlation Chapter 17.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Prediction concerning Y variable. Three different research questions What is the mean response, E(Y h ), for a given level, X h, of the predictor variable?
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
Confidence Intervals for the Regression Slope 12.1b Target Goal: I can perform a significance test about the slope β of a population (true) regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 15 Inference for Regression
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Chapter 3: Diagnostics and Remedial Measures
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chapter 10 Correlation and Regression
Summarizing Bivariate Data
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
TODAY we will Review what we have learned so far about Regression Develop the ability to use Residual Analysis to assess if a model (LSRL) is appropriate.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
© Buddy Freeman, Independence of error assumption. In many business applications using regression, the independent variable is TIME. When the data.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Non-linear Regression Example.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Correlation & Regression Analysis
Chapter 10 Inference for Regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Lecturer: Ing. Martina Hanová, PhD..  How do we evaluate a model?  How do we know if the model we are using is good?  assumptions relate to the (population)
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Inference for Regression
(Residuals and
Slides by JOHN LOUCKS St. Edward’s University.
Regression model Y represents a value of the response variable.
No notecard for this quiz!!
Basic Practice of Statistics - 3rd Edition Inference for Regression
Product moment correlation
The Examination of Residuals
Essentials of Statistics for Business and Economics (8e)
Model Adequacy Checking
Presentation transcript:

Model Checking Using residuals to check the validity of the linear regression model assumptions

The simple linear regression model The mean of the responses, E(Y i ), is a linear function of the x i. The errors, ε i, and hence the responses Y i, are independent. The errors, ε i, and hence the responses Y i, are normally distributed. The errors, ε i, and hence the responses Y i, have equal variances (σ 2 ) for all x values.

The simple linear regression model with the independent error terms  i following a normal distribution with mean 0 and equal variance  2. Assume (!!) response is linear function of trend and error:

Why do we have to check our model? All estimates, intervals, and hypothesis tests have been developed assuming that the model is correct. If the model is incorrect, then the formulas and methods we use are at risk of being incorrect.

When should we worry most? All tests and intervals are very sensitive to –departures from independence. –moderate departures from equal variance. Tests and intervals for β 0 and β 1 are fairly robust against departures from normality. Prediction intervals are quite sensitive to departures from normality.

What can go wrong with the model? Regression function is not linear. Error terms are not independent. Error terms are not normal. Error terms do not have equal variance. The model fits all but one or a few outlier observations. An important predictor variable has been left out of the model.

The basic idea of residual analysis The observed residuals: should reflect the properties assumed for the unknown true error terms: So, investigate the observed residuals to see if they behave “properly.”

Distinction between true errors  i and residuals e i

The sample mean of the residuals e i is always 0. x y RESIDUAL (round-off error)

The residuals are not independent.

A residuals vs. fits plot A scatter plot with residuals on the y axis and fitted values on the x axis. Helps to identify non-linearity, outliers, and non-constant variance.

Example: Alcoholism and muscle strength?

A well-behaved residuals vs. fits plot

Characteristics of a well-behaved residual vs. fits plot The residuals “bounce randomly” around the 0 line. (Linear is reasonable). No one residual “stands out” from the basic random pattern of residuals. (No outliers). The residuals roughly form a “horizontal band” around 0 line. (Constant variance).

A residuals vs. predictor plot A scatter plot with residuals on the y axis and the values of a predictor on the x axis. If the predictor on the x axis is the same predictor used in model, offers nothing new. If the predictor on the x axis is a new and different predictor, can help to determine whether the predictor should be added to model.

A residuals vs. predictor plot offering nothing new. (Same predictor!)

Example: What are good predictors of blood pressure? n = 20 hypertensive individuals age = age of individual weight = weight of individual duration = years with high blood pressure

Regression of BP on Age

Residuals (age only) vs. weight plot (New predictor!)

Residuals (age, weight) vs. duration plot (New predictor!)

How a non-linear function shows up on a residual vs. fits plot The residuals depart from 0 in some systematic manner: –such as, being positive for small x values, negative for medium x values, and positive again for large x values

Example: A linear relationship between tread wear and mileage? mileagegroove X = mileage in 1000 miles Y = groove depth in mils

Is tire tread wear linearly related to mileage?

A residual vs. fits plot suggesting relationship is not linear

How non-constant error variance shows up on a residual vs. fits plot The plot has a “fanning” effect. –Residuals are close to 0 for small x values and are more spread out for large x values. The plot has a “funneling” effect –Residuals are spread out for small x values and close to 0 for large x values. Or, the spread of the residuals can vary in some complex fashion.

Example: How is plutonium activity related to alpha particle counts?

A residual vs. fits plot suggesting non-constant error variance

How an outlier shows up on a residuals vs. fits plot The observation’s residual stands apart from the basic random pattern of the rest of the residuals. The random pattern of the residual plot can even disappear if one outlier really deviates from the pattern of the rest of the data.

Example: Relationship between tobacco use and alcohol use? Region Alcohol Tobacco North Yorkshire Northeast EastMidlands WestMidlands EastAnglia Southeast Southwest Wales Scotland Northern Ireland Family Expenditure Survey of British Dept. of Employment X = average weekly expenditure on tobacco Y = average weekly expenditure on alcohol

Example: Relationship between tobacco use and alcohol use?

A residual vs. fits plot suggesting an outlier exists “outlier”

How large does a residual need to be before being flagged? The magnitude of the residuals depends on the units of the response variable. Make the residuals “unitless” by dividing by their standard deviation. That is, use “standardized residuals.” Then, an observation with a standardized residual greater than 2 or smaller than -2 should be flagged for further investigation.

Standardized residuals vs. fits plot

Minitab identifies observations with large standardized residuals Unusual Observations Obs Tobacco Alcohol Fit SE Fit Resid St Resid R R denotes an observation with a large standardized residual.

Anscombe data set #3

A residual vs. fits plot suggesting an outlier exists

Residuals vs. order plot Helps assess serial correlation of error terms. If the data are obtained in a time (or space) sequence, a “residuals vs. order” plot helps to see if there is any correlation between error terms that are near each other in the sequence. A horizontal band bouncing randomly around 0 suggests errors are independent, while a systematic pattern suggests not.

Residuals vs. order plots suggesting non-independence of error terms

Normal (probability) plot of residuals Helps assess normality of error terms. If data are Normal(μ, σ 2 ), then percentiles of the normal distribution should plot linearly against sample percentiles (with sampling variation). The parameters μ and σ 2 are unknown. Theory shows it’s okay to assume μ = 0 and σ 2 = 1.

Normal (probability) plot of residuals x y i RESI1 PCT MTB_PCT NSCORE Ordered!

Normal (probability) plot of residuals (cont’d) Plot normal scores (theoretical percentiles) on vertical axis against ordered residuals (sample percentiles) on horizontal axis. Plot that is nearly linear suggests normality of error terms.

Normal (probability) plot

A normal (probability) plot with non-normal error terms

Residual plots in Minitab’s regression command Select Stat >> Regression >> Regression Specify predictor and response Under Graphs… –select either Regular or Standardized –select desired types of residual plots (normal plot, versus fits, versus order, versus predictor variable)

Normal plots outside of Minitab’s regression command Select Stat >> Regression >> Regression... Specify predictor and response Under Storage … –select Regular or Standardized residuals –Select OK. Residuals will appear in worksheet. (Either) Select Graph >> Probability plot… –Specify RESI as variable and select Normal distribution. Select OK. (Or) Select Stat >> Basic Stat >> Normality Test –Specify RESI as variable and select OK.