Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Similar presentations


Presentation on theme: "Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008."— Presentation transcript:

1 Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008

2 Beware of Outliers Regression is sensitive to outliers – Important to detect outliers and influential points Summary stats can be misleading… – Important to explore the data, rather than relying on just 1-2 summary stats

3 Look at your Data! – For all three plots, r, means, and SD are equal

4 But it’s not enough to look…

5 So what should we do? Ways of Detecting Outliers: – Studentized residuals for outliers on y – Mahalanobis distance &Hat matrix for outliers in the space of predictors

6 Types of Outliers Classifying Outliers: - Outliers in the space of outcomes (outliers on y) - Outliers in the space of predictors (outliers on x)

7 So what should we do? Ways of Detecting Outliers: – Studentized residuals for outliers on y – Mahalanobis distance &Hat matrix for outliers in the space of predictors

8 So what should we do? Ways of Detecting Outliers: – Studentized residuals for outliers on y – Mahalanobis distance &Hat matrix for outliers in the space of predictors BUT… The points they identify will not necessarily be influential in affecting the regression coefficients…

9 Outliers and Influential Points outliers influential points

10 Example: Influential Points Non-influential Influential

11 Cook’s Distance: Identifying Influential Points A measure of the change in the regression coefficients that would occur if the case was omitted. – Affected by both the case being an outlier on y and in the set of predictors – Measures the joint (combined) influence on the case being an outlier on y and on x

12 Now what? Step 1. Detect Step 2. Isolate Step 3. Examine -Are they qualitatively different? -Are they influential? Another thing to consider: influential “clusters”?

13 Example: Groups of Cases

14 Now what? Step 1. Detect Step 2. Isolate Step 3. Examine -Are they qualitatively different? -Are they influential? Step 4. Delete or retain as you see fit … Or try both

15 The End


Download ppt "Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008."

Similar presentations


Ads by Google