Lab 9 – Regression Diagnostics

Slides:



Advertisements
Similar presentations
ASSUMPTION CHECKING In regression analysis with Stata
Advertisements

Inference for Linear Regression (C27 BVD). * If we believe two variables may have a linear relationship, we may find a linear regression line to model.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r Assumptions.
REGRESSION What is Regression? What is the Regression Equation? What is the Least-Squares Solution? How is Regression Based on Correlation? What are the.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 24 Multiple Regression (Sections )
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Simple Linear Regression Analysis
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation,
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Regression. Population Covariance and Correlation.
Economics 173 Business Statistics Lecture 20 Fall, 2001© Professor J. Petry
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Correlation & Regression. The Data SPSS-Data.htmhttp://core.ecu.edu/psyc/wuenschk/SPSS/ SPSS-Data.htm Corr_Regr.
Week 5Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing Residuals.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Assumption checking in “normal” multiple regression with Stata.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1)
Linear Regression Basics III Violating Assumptions Fin250f: Lecture 7.2 Spring 2010 Brooks, chapter 4(skim) 4.1-2, 4.4, 4.5, 4.7,
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
CE 525. ESRI VIDEO Take notes! is/player.cfm
732G21/732G28/732A35 Lecture 3. Properties of the model errors ε 4. ε are assumed to be normally distributed
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Metrics Lab Econometric Problems Lab. Import the Macro data from Excel and use first row as variable names Time set the year variable by typing “tsset.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Advanced Quantitative Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Advanced Quantitative Techniques
Advanced Quantitative Techniques
The Simple Linear Regression Model: Specification and Estimation
Inference for Regression
Chapter 12: Regression Diagnostics
بحث في التحليل الاحصائي SPSS بعنوان :
Regression Model Building - Diagnostics
I271B Quantitative Methods
Test for Mean of a Non-Normal Population – small n
Multiple Regression A curvilinear relationship between one variable and the values of two or more other independent variables. Y = intercept + (slope1.
Residuals The residuals are estimate of the error
I271b Quantitative Methods
Regression is the Most Used and Most Abused Technique in Statistics
Ch11 Curve Fitting II.
Regression Diagnostics
Regression Model Building - Diagnostics
Checking the data and assumptions before the final analysis.
Regression Forecasting and Model Building
Correlation and Covariance
Presentation transcript:

Lab 9 – Regression Diagnostics November 12, 2015

Are the residuals normally distributed? open ops2004.dta Drop envhat res regress env_con educat inc com3 hlthprob epht3, beta

Are the residuals normally distributed? predict res, residual (this command provides the residual for each observation based on the last regression conducted) summarize res, detail

Are the residuals normally distributed? sktest res (this command does a normality test based on skewness and kurtosis) Have in mind that the largest the sample the more likely it is to find statistically significant results. But a large sample also means that distribution will approach normality.

Are the residuals normally distributed? rvfplot (plot residuals versus fitted plot) alternative way: Statistics –> Linear models and related –> Regression diagnostics –> Residual-versus-fitted plot

Are the residuals normally distributed? regress env_con educat inc com3 hlthprob epht3, beta predict envhat preserve set seed 111 sample 100, count twoway (scatter env_con envhat) (lfit env_con envhat) restore

Are the residuals normally distributed? Scattergram: predicted versus actual values of the dependent variable

What to do when distribution of residuals is problematic? Option 1: robust regression regress env_con educat inc com3 hlthprob epht3, vce(robust) This command estimates the variance-covariance matrix of errors without assuming normality.

What to do when distribution of residuals is problematic? Difference is on t-values and standard errors.

What to do when distribution of residuals is problematic? Option 2: bootstrap estimation of standard errors regress env_con educat inc com3 hlthprob epht3, vce(bootstrap, reps(1000)) Stata draws several random samples with replacement, and conducts a regression for each of these. Then, with the variances of the distribution of samples, Stata estimates standard errors.

What to do when distribution of residuals is problematic? Again, difference is on t-values and standard errors.

Outliers Outliers as cases that we cannot predict. regress env_con educat inc com3 hlthprob epht3, beta predict yhat predict residual, residual predict rstandard, rstandard list respnum env_con yhat residual rstandard if abs(rstandard) > 2.58 & rstandard <. Alternative way: Statistics -> Protoestimation -> Predictions -> Predictions and … Z score for two-tailed 0.01 level of significance

Outliers

Influential cases 2/√N dfbeta Alternative way: Statistics -> Linear models and related -> Regression diagnostics -> DFBETAs This command the dfbetas as new variables Next: list respnum rstandard _dfbeta_1 if (abs(_dfbeta_1) > 2/sqrt(3769) & _dfbeta_1 <. 2/√N

Collinearity and multicollinearity If two or more independent variables are correlated, we cannot know which of the variables if having an effect on the dependent variable. regress env_con educat inc com3 hlthprob epht3, beta estat vif

Collinearity and multicollinearity Variance inflation factor (VIF): if the value is more than 10 for any variable, or if the average value is substantially greater than 1, there might be a problem. These results do not show any problems. 1/VIF is 1 - R-squared value (1-R2) of the regression of the variable (taken as dependent) on the other independent variables. If 1/VIF is less than 0.10, there might be a problem.