Model Diagnostics and OLS Assumptions

Slides:



Advertisements
Similar presentations
Chapter 4 Describing the Relation Between Two Variables 4.3 Diagnostics on the Least-squares Regression Line.
Advertisements

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Chapter 13 Multiple Regression
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Econ 140 Lecture 121 Prediction and Fit Lecture 12.
BA 555 Practical Business Analysis
Chapter 12 Multiple Regression
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Lecture 24: Thurs., April 8th
Regression Diagnostics - I
Statistical Analysis SC504/HS927 Spring Term 2008 Session 7: Week 23: 7 th March 2008 Complex independent variables and regression diagnostics.
Chapter 11 Multiple Regression.
1 MF-852 Financial Econometrics Lecture 6 Linear Regression I Roy J. Epstein Fall 2003.
Econ 140 Lecture 191 Heteroskedasticity Lecture 19.
Regression Diagnostics Checking Assumptions and Data.
Empirical Estimation Review EconS 451: Lecture # 8 Describe in general terms what we are attempting to solve with empirical estimation. Understand why.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Business Statistics - QBM117 Statistical inference for regression.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
©2006 Thomson/South-Western 1 Chapter 13 – Correlation and Simple Regression Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western.
Exercise 8.25 Stat 121 KJ Wang. Votes for Bush and Buchanan in all Florida Counties Palm Beach County (outlier)
Anaregweek11 Regression diagnostics. Regression Diagnostics Partial regression plots Studentized deleted residuals Hat matrix diagonals Dffits, Cook’s.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Dr. C. Ertuna1 Issues Regarding Regression Models (Lesson - 06/C)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Mixed Effects Models Rebecca Atkins and Rachel Smith March 30, 2015.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
B AD 6243: Applied Univariate Statistics Multiple Regression Professor Laku Chidambaram Price College of Business University of Oklahoma.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 7: Regression.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L11.1 Simple linear regression What regression analysis does The simple.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Using the regression equation (17.6). Why regression? 1.Analyze specific relations between Y and X. How is Y related to X? 2.Forecast / Predict the variable.
CHAPTER 3 Describing Relationships
Inference for Least Squares Lines
Statistical Data Analysis - Lecture /04/03
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Model Diagnostics Political Analysis II.
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Non-Linear Models Tractable non-linearity Intractable non-linearity
Statistics in MSmcDESPOT
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Chapter 12: Regression Diagnostics
BIVARIATE REGRESSION AND CORRELATION
I271B Quantitative Methods
Diagnostics and Transformation for SLR
Stats Club Marnie Brennan
1. Describe the Form and Direction of the Scatterplot.
Chapter 14 – Correlation and Simple Regression
Residuals The residuals are estimate of the error
Multiple Linear Regression
CHAPTER 3 Describing Relationships
Least-Squares Regression
Regression is the Most Used and Most Abused Technique in Statistics
Simple Linear Regression
Least-Squares Regression
Regression Forecasting and Model Building
Chapter 13 Additional Topics in Regression Analysis
Diagnostics and Transformation for SLR
Presentation transcript:

Model Diagnostics and OLS Assumptions Political Analysis II

Why should we care about unusual observations? Because they can drive our results and lead to misleading findings (especially in small samples) To improve our theory and statistical model Three types of unusual observations: Regression outliers High leverage observations Influential observations

A useful tool: Residuals

Regression outliers Regression outliers = extreme values for Y given their values on X For example, oil-rich non-democracies Coding error, peculiarity Limited effect, but they can increase our standard errors Detect: large studentized residuals (> |2|) Fix: Check coding, revise theory Fox (2008)

Example of regression outliers Lijphart excluded India and Israel from his analysis because they had extreme values on the dependent variable of political stability and absence of violence (i.e. univariate outliers). But only Israel is a regression outlier.

High leverage observations High leverage = extreme values on one or more independent variables. They can change the estimate of regression coefficients (if they don’t follow the pattern of the data) Detect: hat values (measure based on the fitted/Y-hat values) Fox (2008)

Example of high leverage observations Lijphart described India as an “extreme outlier”, but it is actually a high leverage observation.

Example of high leverage observations Lijphart described India as an “extreme outlier”, but it is actually a high leverage observation.

Example of high leverage observations We can see this clearly when we look at India’s very high hat-values.

Influential observations Influential observations = extreme values for X and Y Influence = Outlierness and Leverage Excluding them significantly changes the direction, strength, or significance of the results Detect: studentized residuals versus leverage, Cook’s Distance Check coding, “dummying out”, re-run the model without the observation(s) and compare results Fox (2008)

Example of influential observations No influential observations in Lijphart’s sample… India: high hat-values, but small residuals Israel: large residuals, but low hat-values We find influential observations in the lower-right corner and upper-right corner (not shown here).

The infamous butterfly ballot Wand et al. (2001) show that more than 2,000 Democrats voted for Buchanan in Palm Beach County, a typically Democratic county, due to the butterfly ballot. This type of ballot was only used in this county and only for election-day for president. As a result, George W. Bush, and not Al Gore, won Florida and the presidency. Kellstedt and Whitten (2013)

Why ordinary least squares (OLS) assumptions? Describing linear relationships between variables Interpreting regressions causally Hypothesis testing and predictions

The OLS assumptions Linearity Homoscedasticity Mean independence No autocorrelation (Normally distributed errors) Standard errors

The linearity assumption The relationship between the independent and dependent variables should be linear. A one-unit change in X leads to x-amount of change in Y, regardless of the value of X.

Based on the argument of Przeworski and Limongi (1997). “Modernization: Theories and Facts.” World Politics 49 (02): 155–83.

Violations of the linearity assumption Can you think of other nonlinear relationships? District magnitude and the number of legislative parties Age and the likelihood of voting … Solutions: Interaction effects Transform the data (e.g. log, quadratic, exponential) More on nonlinear relationships next week

More articles On influential observations: Fails and Krieckhaus (2010). Colonialism, Property Rights and the Modern World Income Distribution. British Journal of Political Science, 40(3), 487- 503. Data: https://sites.google.com/a/oakland.edu/mfails/research/colonialism- property-rights-and-the-modern-world-income-distribution Wand et al. (2001). The Butterfly Did It: The Aberrant Vote for Buchanan in Palm Beach County, Florida. American Political Science Review, 95(4), 793- 810. Data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/103 89 On nonlinear relationships: Przeworski and Limongi (1997). Modernization: Theories and Facts. World Politics 49 (02): 155–83.