Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008.

Slides:



Advertisements
Similar presentations
ASSUMPTION CHECKING In regression analysis with Stata
Advertisements

Section 10-3 Regression.
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
17.2 Extrapolation and Prediction
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Getting to Know Your Scatterplot and Residuals
Chapter 2: Looking at Data - Relationships /true-fact-the-lack-of-pirates-is-causing-global-warming/
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Stat 112: Lecture 14 Notes Finish Chapter 6:
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Regression Diagnostics Checking Assumptions and Data.
Linear Regression Analysis 5E Montgomery, Peck and Vining 1 Chapter 6 Diagnostics for Leverage and Influence.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Conditions of applications. Key concepts Testing conditions of applications in complex study design Residuals Tests of normality Residuals plots – Residuals.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Multiple Collinearity, Serial Correlation,
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
4.3 Diagnostic Checks VO Verallgemeinerte lineare Regressionsmodelle.
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Reg12M G Multiple Regression Week 12 (Monday) Quality Control and Critical Evaluation of Regression Results An example Identifying Residuals Leverage:
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 1 Stats 330: Lecture 23.
Assumption checking in “normal” multiple regression with Stata.
12/17/ lecture 111 STATS 330: Lecture /17/ lecture 112 Outliers and high-leverage points  An outlier is a point that has a larger.
Outliers and influential data points. No outliers?
Applied Quantitative Analysis and Practices LECTURE#31 By Dr. Osman Sadiq Paracha.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Reg12W G Multiple Regression Week 12 (Wednesday) Review of Regression Diagnostics Influence statistics Multicollinearity Examples.
Chapter 9 Regression Wisdom
Individual observations need to be checked to see if they are: –outliers; or –influential observations Outliers are defined as observations that differ.
Multiple Linear Regression An introduction, some assumptions, and then model reduction 1.
Residual Plots EXPLORING BIVARIATE DATA. STUDY GUIDE 1. Read pages 57—64 of the Exploring Bivariate Data packet.
Unit 9: Dealing with Messy Data I: Case Analysis
Chapter 6 Diagnostics for Leverage and Influence
Regression Diagnostics
Regression Wisdom Chapter 9.
Daniela Stan Raicu School of CTI, DePaul University
1. An example for using graphics
Residuals, Residual Plots, and Influential points
AP Stats: 3.3 Least-Squares Regression Line
Outliers… Leverage… Influential points….
Motivational Examples Three Types of Unusual Observations
Multiple Linear Regression
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 2 Looking at Data— Relationships
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.
Residuals, Residual Plots, & Influential points
Review of Chapter 3 Examining Relationships
Influential points.
Three Measures of Influence
Regression Diagnostics
Outliers and Influence Points
Checking the data and assumptions before the final analysis.
Checking Assumptions Primary Assumptions Secondary Assumptions
SA3101 Final Examination Solution
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Stem and Leaf Plots Stem and leaf plot is used to organize data.
Review of Chapter 3 Examining Relationships
Presentation transcript:

Outliers and Influential Data Points in Regression Analysis James P. Stevens sujin jang november 10, 2008

Beware of Outliers Regression is sensitive to outliers – Important to detect outliers and influential points Summary stats can be misleading… – Important to explore the data, rather than relying on just 1-2 summary stats

Look at your Data! – For all three plots, r, means, and SD are equal

But it’s not enough to look…

So what should we do? Ways of Detecting Outliers: – Studentized residuals for outliers on y – Mahalanobis distance &Hat matrix for outliers in the space of predictors

Types of Outliers Classifying Outliers: - Outliers in the space of outcomes (outliers on y) - Outliers in the space of predictors (outliers on x)

So what should we do? Ways of Detecting Outliers: – Studentized residuals for outliers on y – Mahalanobis distance &Hat matrix for outliers in the space of predictors

So what should we do? Ways of Detecting Outliers: – Studentized residuals for outliers on y – Mahalanobis distance &Hat matrix for outliers in the space of predictors BUT… The points they identify will not necessarily be influential in affecting the regression coefficients…

Outliers and Influential Points outliers influential points

Example: Influential Points Non-influential Influential

Cook’s Distance: Identifying Influential Points A measure of the change in the regression coefficients that would occur if the case was omitted. – Affected by both the case being an outlier on y and in the set of predictors – Measures the joint (combined) influence on the case being an outlier on y and on x

Now what? Step 1. Detect Step 2. Isolate Step 3. Examine -Are they qualitatively different? -Are they influential? Another thing to consider: influential “clusters”?

Example: Groups of Cases

Now what? Step 1. Detect Step 2. Isolate Step 3. Examine -Are they qualitatively different? -Are they influential? Step 4. Delete or retain as you see fit … Or try both

The End