Regression Diagnostics Checking Assumptions and Data.

Slides:



Advertisements
Similar presentations
STA305 week 31 Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for.
Advertisements

Regression Analysis Simple Regression. y = mx + b y = a + bx.
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Correlation and Linear Regression
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Feb 21, 2006Lecture 6Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing.
BA 555 Practical Business Analysis
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Lecture 24: Thurs., April 8th
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #20.
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
Stat Today: Multiple comparisons, diagnostic checking, an example After these notes, we will have looked at (skip figures 1.2 and 1.3, last.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Business Statistics - QBM117 Statistical inference for regression.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Correlation & Regression
Inference for regression - Simple linear regression
Model Checking Using residuals to check the validity of the linear regression model assumptions.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 14 Inference for Regression © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Outliers and influential data points. No outliers?
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 12: Correlation and Linear Regression 1.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Inference for Least Squares Lines
AP Statistics Chapter 14 Section 1.
Diagnostics and Transformation for SLR
Stats Club Marnie Brennan
CHAPTER 29: Multiple Regression*
Regression Assumptions
Chapter 13 Additional Topics in Regression Analysis
Diagnostics and Remedial Measures
Essentials of Statistics for Business and Economics (8e)
Diagnostics and Transformation for SLR
Model Adequacy Checking
Regression Assumptions
Diagnostics and Remedial Measures
Presentation transcript:

Regression Diagnostics Checking Assumptions and Data

Questions What is the linearity assumption? How can you tell if it seems met? What is homoscedasticity (heteroscedasticity)? How can you tell if it’s a problem? What is an outlier? What is leverage? What is a residual? How can you use residuals in assuring that the regression model is a good representation of the data? What is a studentized residual?

Linear Model Assumptions Linear relations between X and Y Independent Errors Normal distribution for errors & Y Equal Variance of Errors: Homoscedasticity ( spread of error in Y across levels of X)

Good-Looking Graph No apparent departures from line.

Problem with Linearity

Problem with Heteroscedasticity Common problem when Y = $

Outliers Outlier = pathological point

Residual Plots Histogram of Residuals Residuals vs Fitted Values Residuals vs Predictor Variable Normal Q-Q Plots Studentized Residuals or standardized Residuals

Residuals Standardized Residuals Look for large values (some say |>2) Studentized residual: The studentized residual considers the distance of the point from the mean. The farther X is from the mean, the smaller the standard error and the larger the residual. Look for large values. Residual i Standard deviation

Residual Plots

Abnormal Patterns in Residual Plots Figures a), b) Non-linearity Figure c) Augtocorrelations Figure d) Heteroscedasticity

Patterns of Outliers a) Outlier is extreme in both X and Y but not in pattern. Removal is unlikely to alter regression line. b) Outlier is extreme in both X and Y as well as in the overall pattern. Inclusion will strongly influence regression line c) Outlier is extreme for X nearly average for Y. d) Outlier extreme in Y not in X. e) Outlier extreme in pattern, but not in X or Y.

Influence Analysis Leverage: h_ii (in page8) Leverage is an index of the importance of an observation to a regression analysis. –Function of X only –Large deviations from mean are influential –Maximum is 1; min is 1/n – It is considered large if more than 3 x p /n (p=number of predictors including the constant).

Cook’s distance measures the influence of a data point on the regression equation. i.e. measures the effect of deleting a given observation: data points with large residuals (outliers) and/or high leverage Cook’s D > 1 requires careful checking (such points are influential); > 4 suggests potentially serious outliers.

Sensitivity in Inference All tests and intervals are very sensitive to even minor departures from independence. All tests and intervals are sensitive to moderate departures from equal variance. The hypothesis tests and confidence intervals for β 0 and β 1 are fairly "robust" (that is, forgiving) against departures from normality. Prediction intervals are quite sensitive to departures from normality.

Remedies If important predictor variables are omitted, see whether adding the omitted predictors improves the model. If there are unequal error variances, try transforming the response and/or predictor variables or use "weighted least squares regression." If an outlier exists, try using a "robust estimation procedure." If error terms are not independent, try fitting a "time series model."

If the mean of the response is not a linear function of the predictors, try a different function. For example, polynomial regression involves transforming one or more predictor variables while remaining within the multiple linear regression framework. For another example, applying a logarithmic transformation to the response variable also allows for a nonlinear relationship between the response and the predictors.

Data Transformation The usual approach for dealing with nonconstant variance, when it occurs, is to apply a variance-stabilizing transformation. For some distributions, the variance is a function of E(Y). Box-Cox transformation λ λ λ λ