Assumptions & Requirements.  Three Important Assumptions 1.The errors are normally distributed. 2.The errors have constant variance (i.e., they are homoscedastic)

Slides:



Advertisements
Similar presentations
13 Multiple Regression Chapter Multiple Regression
Advertisements

Week 13 November Three Mini-Lectures QMM 510 Fall 2014.
Multivariate Regression
Four Mini-Lectures QMM 510 Fall 2014
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 13 Additional Topics in Regression Analysis
Feb 21, 2006Lecture 6Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing.
Lecture 25 Multiple Regression Diagnostics (Sections )
Lecture 24 Multiple Regression (Sections )
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Lecture 20 Simple linear regression (18.6, 18.9)
Regression Diagnostics - I
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
BCOR 1020 Business Statistics Lecture 26 – April 24, 2007.
Topic 3: Regression.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Linear Regression Example Data
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Business Statistics - QBM117 Statistical inference for regression.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Regression Method.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Bivariate Regression (Part 1) Chapter1212 Visual Displays and Correlation Analysis Bivariate Regression Regression Terminology Ordinary Least Squares Formulas.
What does it mean? The variance of the error term is not constant
The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Autocorrelation in Time Series KNNL – Chapter 12.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1Spring 02 Problems in Regression Analysis Heteroscedasticity Violation of the constancy of the variance of the errors. Cross-sectional data Serial Correlation.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin The Two-Variable Model: Hypothesis Testing chapter seven.
INDE 6335 ENGINEERING ADMINISTRATION SURVEY DESIGN Dr. Christopher A. Chung Dept. of Industrial Engineering.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Assumption checking in “normal” multiple regression with Stata.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Multivariate Regression
Chapter 12: Regression Diagnostics
Simple Linear Regression
Regression Forecasting and Model Building
BEC 30325: MANAGERIAL ECONOMICS
Chapter 13 Additional Topics in Regression Analysis
BEC 30325: MANAGERIAL ECONOMICS
Presentation transcript:

Assumptions & Requirements

 Three Important Assumptions 1.The errors are normally distributed. 2.The errors have constant variance (i.e., they are homoscedastic) 3.The errors are independent (i.e., they are non- autocorrelated). The error  i is unobservable. The residuals e i from the fitted regression give clues about the violation of these assumptions.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Non-normal Errors Non-normality of errors is a mild violation since the regression parameter estimates b 0 and b 1 and their variances remain unbiased and consistent.Non-normality of errors is a mild violation since the regression parameter estimates b 0 and b 1 and their variances remain unbiased and consistent. Confidence intervals for the parameters may be untrustworthy because normality assumption is used to justify using Student’s t distribution.Confidence intervals for the parameters may be untrustworthy because normality assumption is used to justify using Student’s t distribution.

Probable Solutions A large sample size would compensate.A large sample size would compensate. Outliers could pose serious problems.Outliers could pose serious problems.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Histogram of Residuals Check for non-normality by creating histograms of the residuals or standardized residuals (each residual is divided by its standard error).Check for non-normality by creating histograms of the residuals or standardized residuals (each residual is divided by its standard error). Standardized residuals range between -3 and +3 unless there are outliers.Standardized residuals range between -3 and +3 unless there are outliers.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Normal Probability Plot The Normal Probability Plot tests the assumption H 0 : Errors are normally distributed H 1 : Errors are not normally distributedThe Normal Probability Plot tests the assumption H 0 : Errors are normally distributed H 1 : Errors are not normally distributed If H 0 is true, the residual probability plot should be linear.If H 0 is true, the residual probability plot should be linear.

Probable Solution McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  What to Do About Non-Normality? 1.Trim outliers only if they clearly are mistakes. 2.Increase the sample size if possible. 3.Try a logarithmic transformation of both X and Y.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Heteroscedastic Errors (Nonconstant Variance) The ideal condition is if the error magnitude is constant (i.e., errors are homoscedastic).The ideal condition is if the error magnitude is constant (i.e., errors are homoscedastic). Heteroscedastic errors increase or decrease with X.Heteroscedastic errors increase or decrease with X. In the most common form of heteroscedasticity, the variances of the estimators are likely to be understated.In the most common form of heteroscedasticity, the variances of the estimators are likely to be understated. This results in overstated t statistics and artificially narrow confidence intervals.This results in overstated t statistics and artificially narrow confidence intervals.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Tests for Heteroscedasticity Plot the residuals against X. Ideally, there is no pattern in the residuals moving from left to right.Plot the residuals against X. Ideally, there is no pattern in the residuals moving from left to right.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Tests for Heteroscedasticity The “fan-out” pattern of increasing residual variance is the most common pattern indicating heteroscedasticity.The “fan-out” pattern of increasing residual variance is the most common pattern indicating heteroscedasticity.

Probable Solution McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  What to Do About Heteroscedasticity? Transform both X and Y, for example, by taking logs.Transform both X and Y, for example, by taking logs. Although it can widen the confidence intervals for the coefficients, heteroscedasticity does not bias the estimates.Although it can widen the confidence intervals for the coefficients, heteroscedasticity does not bias the estimates.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Autocorrelated Errors Autocorrelation is a pattern of non- independent errors.Autocorrelation is a pattern of non- independent errors. In a time-series regression, each residual e t should be independent of it predecessors e t-1, e t-2, …, e t-n.In a time-series regression, each residual e t should be independent of it predecessors e t-1, e t-2, …, e t-n. In a first-order autocorrelation, e t is correlated with e t-1.In a first-order autocorrelation, e t is correlated with e t-1. The estimated variances of the OLS estimators are biased, resulting in confidence intervals that are too narrow, overstating the model’s fit.The estimated variances of the OLS estimators are biased, resulting in confidence intervals that are too narrow, overstating the model’s fit.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Runs Test for Autocorrelation In the runs test, count the number of the residual’s sign reversals (i.e., how often does the residual cross the zero centerline?).In the runs test, count the number of the residual’s sign reversals (i.e., how often does the residual cross the zero centerline?). If the pattern is random, the number of sign changes should be n/2.If the pattern is random, the number of sign changes should be n/2. Fewer than n/2 would suggest positive autocorrelation.Fewer than n/2 would suggest positive autocorrelation. More than n/2 would suggest negative autocorrelation.More than n/2 would suggest negative autocorrelation.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Runs Test for Autocorrelation Positive autocorrelation is indicated by runs of residuals with same sign. Negative autocorrelation is indicated by runs of residuals with alternating signs.

Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Durbin-Watson Test Tests for autocorrelation under the hypotheses H 0 : Errors are nonautocorrelated H 1 : Errors are autocorrelatedTests for autocorrelation under the hypotheses H 0 : Errors are nonautocorrelated H 1 : Errors are autocorrelated The Durbin-Watson test statistic isThe Durbin-Watson test statistic is The DW statistic will range from 0 to 4. DW 2 suggests negative autocorrelationThe DW statistic will range from 0 to 4. DW 2 suggests negative autocorrelation

Probable Solutions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  What to Do About Autocorrelation? Transform both variables using the method of first differences in which both variables are redefined as changes:Transform both variables using the method of first differences in which both variables are redefined as changes: Although it can widen the confidence interval for the coefficients, autocorrelation does not bias the estimates.Although it can widen the confidence interval for the coefficients, autocorrelation does not bias the estimates.

Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Outliers To fix the problem, - delete the data - delete the data - formulate a multiple regression model that includes the lurking variable Outliers may be caused by - an error in recording data - impossible data - an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t.

Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Outliers To fix the problem, - formulate a multiple regression model that includes the lurking variable - an observation that has been influenced by an unspecified “lurking” A variable that has an important effect and yet is not included amongst the predictor variables under consideration. Perhaps its existence is unknown or its effect unsuspected. Perhaps its existence is unknown or its effect unsuspected.

Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Model Misspecification If a relevant predictor has been omitted, then the model is misspecified.If a relevant predictor has been omitted, then the model is misspecified. Use multiple regression instead of bivariate regression.Use multiple regression instead of bivariate regression.

Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Ill-Conditioned Data Well-conditioned data values are of the same general order of magnitude.Well-conditioned data values are of the same general order of magnitude. Ill-conditioned data have unusually large or small data values and can cause loss of regression accuracy or awkward estimates.Ill-conditioned data have unusually large or small data values and can cause loss of regression accuracy or awkward estimates. Avoid mixing magnitudes by adjusting the magnitude of your data before running the regression.Avoid mixing magnitudes by adjusting the magnitude of your data before running the regression.

Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Spurious Correlation In a spurious correlation two variables appear related because of the way they are defined.In a spurious correlation two variables appear related because of the way they are defined. This problem is called the size effect or problem of totals.This problem is called the size effect or problem of totals.

Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Model Form and Variable Transforms Sometimes a nonlinear model is a better fit than a linear model.Sometimes a nonlinear model is a better fit than a linear model. Excel offer many model forms.Excel offer many model forms. Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit.Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit. Log transformations reduce heteroscedasticity.Log transformations reduce heteroscedasticity. Nonlinear models may be difficult to interpret.Nonlinear models may be difficult to interpret.

Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved.  Regression by Splines Splines are subperiods.Splines are subperiods. By comparing regression slopes for each subperiod, you will obtain clues about what is happening with the data.By comparing regression slopes for each subperiod, you will obtain clues about what is happening with the data.

Multicollinerity  When the independent variables are inter-correlated instead of being independent, we have a condition known as multicollinearity.  It does not bias the lest square estimates, but it does induce variance inflation. When predictors are strongly correlated, the variance of their estimated coefficients tend to be inflated, widening the confidence intervals for the true coefficients and making the t- statistics less reliable.

Probable solution  We can inspect the correlation matrix for the predictors.