Model Diagnostics and OLS Assumptions Political Analysis II
Why should we care about unusual observations? Because they can drive our results and lead to misleading findings (especially in small samples) To improve our theory and statistical model Three types of unusual observations: Regression outliers High leverage observations Influential observations
A useful tool: Residuals
Regression outliers Regression outliers = extreme values for Y given their values on X For example, oil-rich non-democracies Coding error, peculiarity Limited effect, but they can increase our standard errors Detect: large studentized residuals (> |2|) Fix: Check coding, revise theory Fox (2008)
Example of regression outliers Lijphart excluded India and Israel from his analysis because they had extreme values on the dependent variable of political stability and absence of violence (i.e. univariate outliers). But only Israel is a regression outlier.
High leverage observations High leverage = extreme values on one or more independent variables. They can change the estimate of regression coefficients (if they don’t follow the pattern of the data) Detect: hat values (measure based on the fitted/Y-hat values) Fox (2008)
Example of high leverage observations Lijphart described India as an “extreme outlier”, but it is actually a high leverage observation.
Example of high leverage observations Lijphart described India as an “extreme outlier”, but it is actually a high leverage observation.
Example of high leverage observations We can see this clearly when we look at India’s very high hat-values.
Influential observations Influential observations = extreme values for X and Y Influence = Outlierness and Leverage Excluding them significantly changes the direction, strength, or significance of the results Detect: studentized residuals versus leverage, Cook’s Distance Check coding, “dummying out”, re-run the model without the observation(s) and compare results Fox (2008)
Example of influential observations No influential observations in Lijphart’s sample… India: high hat-values, but small residuals Israel: large residuals, but low hat-values We find influential observations in the lower-right corner and upper-right corner (not shown here).
The infamous butterfly ballot Wand et al. (2001) show that more than 2,000 Democrats voted for Buchanan in Palm Beach County, a typically Democratic county, due to the butterfly ballot. This type of ballot was only used in this county and only for election-day for president. As a result, George W. Bush, and not Al Gore, won Florida and the presidency. Kellstedt and Whitten (2013)
Why ordinary least squares (OLS) assumptions? Describing linear relationships between variables Interpreting regressions causally Hypothesis testing and predictions
The OLS assumptions Linearity Homoscedasticity Mean independence No autocorrelation (Normally distributed errors) Standard errors
The linearity assumption The relationship between the independent and dependent variables should be linear. A one-unit change in X leads to x-amount of change in Y, regardless of the value of X.
Based on the argument of Przeworski and Limongi (1997). “Modernization: Theories and Facts.” World Politics 49 (02): 155–83.
Violations of the linearity assumption Can you think of other nonlinear relationships? District magnitude and the number of legislative parties Age and the likelihood of voting … Solutions: Interaction effects Transform the data (e.g. log, quadratic, exponential) More on nonlinear relationships next week
More articles On influential observations: Fails and Krieckhaus (2010). Colonialism, Property Rights and the Modern World Income Distribution. British Journal of Political Science, 40(3), 487- 503. Data: https://sites.google.com/a/oakland.edu/mfails/research/colonialism- property-rights-and-the-modern-world-income-distribution Wand et al. (2001). The Butterfly Did It: The Aberrant Vote for Buchanan in Palm Beach County, Florida. American Political Science Review, 95(4), 793- 810. Data: https://dataverse.harvard.edu/dataset.xhtml?persistentId=hdl:1902.1/103 89 On nonlinear relationships: Przeworski and Limongi (1997). Modernization: Theories and Facts. World Politics 49 (02): 155–83.