Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.

Slides:



Advertisements
Similar presentations
4/14/ lecture 81 STATS 330: Lecture 8. 4/14/ lecture 82 Collinearity Aims of today’s lecture: Explain the idea of collinearity and its connection.
Advertisements

1 Outliers and Influential Observations KNN Ch. 10 (pp )
Multiple Regression in Practice The value of outcome variable depends on several explanatory variables. The value of outcome variable depends on several.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
EPI809/Spring Testing Individual Coefficients.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multiple Regression Dr. Andy Field.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Topic 18: Model Selection and Diagnostics
Simple linear regression and correlation analysis
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Lecture 13 Diagnostics in MLR Variance Inflation Factors Added variable plots Identifying outliers BMTRY 701 Biostatistical Methods II.
Analysis of Residuals Data = Fit + Residual. Residual means left over Vertical distance of Y i from the regression hyper-plane An error of “prediction”
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Outliers and influential data points. No outliers?
Linear Models Alan Lee Sample presentation for STATS 760.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Lecture 13 Diagnostics in MLR Added variable plots Identifying outliers Variance Inflation Factor BMTRY 701 Biostatistical Methods II.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Regression Review Population Vs. Sample Regression Line Residual and Standard Error of Regression Interpretation of intercept & slope T-test, F-test.
1 Reg12W G Multiple Regression Week 12 (Wednesday) Review of Regression Diagnostics Influence statistics Multicollinearity Examples.
Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression.
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Unit 9: Dealing with Messy Data I: Case Analysis
CHAPTER 3 Describing Relationships
Inference for Least Squares Lines
Chapter 6 Diagnostics for Leverage and Influence
Multiple Regression Prof. Andy Field.
Multiple Linear Regression
Regression Diagnostics
Chapter 12: Regression Diagnostics
Slides by JOHN LOUCKS St. Edward’s University.
بحث في التحليل الاحصائي SPSS بعنوان :
Regression Model Building - Diagnostics
Diagnostics and Transformation for SLR
AP Stats: 3.3 Least-Squares Regression Line
Multiple Linear Regression
CHAPTER 3 Describing Relationships
Least-Squares Regression
Regression Diagnostics
Regression Model Building - Diagnostics
Outliers and Influence Points
Least-Squares Regression
Essentials of Statistics for Business and Economics (8e)
Diagnostics and Transformation for SLR
REGRESSION DIAGNOSTICS
Presentation transcript:

Anaregweek11 Regression diagnostics

Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s D, DFBETAS Variance inflation factor Tolerance

NKNW Example NKNW p 389, section 11.1 Y is amount of life insurance X 1 is average annual income X 2 is a risk aversion score n = 18 managers

Manajer i Income X i1 Risk X i2 Life Insurance Y i Manajer i Income X i1 Risk X i2 Life Insurance Y i

Partial regression plots Also called added variable plots or adjusted variable plots One plot for each X i

Partial regression plots (2) Consider X 1 –Use the other X’s to predict Y –Use the other X’s to predict X 1 –Plot the residuals from the first regression vs the residuals from the second regression

Partial regression plots (3) These plots can detect –Nonlinear relationships –Heterogeneous variances –Outliers

Output Source DF F Value Pr > F Model <.0001 Error 15 C Total 17 Root MSE R-Square

Output (2) Par St Var Est Err t Pr > |t| Int <.0001 income <.0001 risk

Plot the residuals vs each Indep Variables From the regression of Y on X 1 and X 2 we plot the residual against each of indep. Variable. The plot of residual against X 1 indicates a curvelinear effect. Therefore, we need to check further by looking at the partial regression plot

Plot the residuals vs Risk

Plot the residuals vs income

The partial regression plots To generate the partial regression plots Regress Y and X 1 each on X 2. Get the residual from each regression namely e(Y|X 2 ) and e(X 1 |X 2 ) Plot e(Y|X 2 ) against e(X 1 |X 2 ) Do the same for Y and X 2 each on X 1.

The partial regression plots (2)

The partial regression plots(3)

Residuals There are several versions –Residuals e i = Y i – Ŷ i –Studentized residuals e i / √MSE –Deleted residuals : d i = e i / (1-h ii ) where h ii is the leverage –Studentized deleted residuals d i * = d i / s(d i ) Where Or equivalenly

Residuals (2) We use the notation (i) to indicate that case i has been deleted from the computations X (i) is the X matrix with case i deleted MSE (i) is the MSE with case i deleted

Residuals (3) When we examine the residuals we are looking for –Outliers –Non normal error distributions –Influential observations

Hat matrix diagonals h ii is a measure of how much Y i is contributing to the prediction Y i (hat) Ŷ 1 = h 11 Y 1 + h 12 Y 2 + h 13 Y 3 + … h ii is sometimes called the leverage of the i th observation

Hat matrix diagonals (2) 0 < h ii < 1 Σ h ii = p We would like h ii to be small The average value is p/n Values far from this average point to cases that should be examined carefully

Hat diagonals Hat Diag Obs H

DFFITS A measure of the influence of case i on Ŷ i It is a standardized version of the difference between Ŷ i computed with and without case i It is closely related to h ii

Cook’s Distance A measure of the influence of case i on all of the Ŷ i ’s It is a standardized version of the sum of squares of the differences between the predicted values computed with and without case i

DFBETAS A measure of the influence of case i on each of the regression coefficients It is a standardized version of the difference between the regression coefficient computed with and without case i

Variance Inflation Factor The VIF is related to the variance of the estimated regression coefficients We calculate it for each explanatory variable One suggested rule is that a value of 10 or more indicates excessive multicollinearity

Tolerance TOL = (1 – R 2 k ) Where R 2 k is the squared multiple correlation obtained in a regression where all other explanatory variables are used to predict X k TOL = 1/VIF Described in comment on p 411

Output (Tolerance) Variable Tolerance Intercept. income risk

Last slide Read NKNW Chapter 11