Introduction to Statistics: Political Science (Class 5)

Slides:



Advertisements
Similar presentations
Inferential Statistics and t - tests
Advertisements

Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Introduction to Statistics: Political Science (Class 6) Interactions Between Variables.
Logistic Regression.
Introduction: The General Linear Model b b The General Linear Model is a phrase used to indicate a class of statistical models which include simple linear.
Introduction to Statistics: Political Science (Class 7) Part I: Interactions Wrap-up Part II: Why Experiment in Political Science?
Introduction to Statistics: Political Science (Class 9) Review.
Qualitative Variables and
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Problems in Applying the Linear Regression Model Appendix 4A
Chapter 13 Multiple Regression
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Feb 21, 2006Lecture 6Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Multiple Regression Models
Multiple Regression Applications
Chapter 11 Multiple Regression.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Ch. 14: The Multiple Regression Model building
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
Economics Prof. Buckles
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Hypothesis Tests and Confidence Intervals in Multiple Regressors
Chapter 12 Multiple Regression and Model Building.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
Bivariate Data When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Chapter 14 Introduction to Multiple Regression
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
CHAPTER 14 MULTIPLE REGRESSION
Interactions POL 242 Renan Levine March 13/15, 2007.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Byron Gangnes Econ 427 lecture 3 slides. Byron Gangnes A scatterplot.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Chapter 13 Multiple Regression
Discussion of time series and panel models
YOU NEED TO KNOW WHAT THIS MEANS
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Introduction to testing statistical significance of interactions Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 14-1 Chapter 14 Introduction to Multiple Regression Statistics for Managers using Microsoft.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Regression. Outline of Today’s Discussion 1.Coefficient of Determination 2.Regression Analysis: Introduction 3.Regression Analysis: SPSS 4.Regression.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Chapter 4 Basic Estimation Techniques
Introduction to Regression Analysis
Multiple Regression.
Undergraduated Econometrics
Chapter 7: The Normality Assumption and Inference with OLS
Regression Part II.
Presentation transcript:

Introduction to Statistics: Political Science (Class 5) Non-Linear Relationships

Thus far Focus on examining and controlling for linear relationships Each one unit increase in an IV is associated with the same expected change in the DV Ordinary-least-squares regression can only estimate linear relationships But, we can “trick” regression into estimating non-linear relationships buy transforming our independent (and/or dependent) variables

When to transform an IV Theoretical expectation Look at the data (sometimes tricky in multivariate analysis or when you have thousands of cases) Today: three types of transformations Logarithm Squared terms Converting to indicator variables

Logarithm The power to which a base must be raised to produce a given value We’ll focus on natural logarithms where ln(x) is the power to which e (2.718281) must be raised to get x ln(4) = 1.386 because e1.386 = 4

1  5 in original measure = 1.609 change in logged value So the effect of a change in a 1 unit change x depends on whether the change is from 1 to 2 or 2 to 3 Υ = β0 + β1ln(x) + u

When to log an IV “Diminishing returns” as X gets large Data is skewed – e.g., income

Income and home value $60,000/year  $200,000 home Bill Gates makes about $175 million/year $175,000,000 = 2917 x $60,000 Should we expect him to have a 2917 x $200,000 ($583,400,000) home?

TVs and Infant Mortality TVs as proxy for resources or wealth Biggest differences at the low end? E.g., “there are a couple of TVs in town” and “some people have TVs in their private homes”

0.6 TVs  predicted infant mortality rate of -19.054

Coef. SE T P TVs per capita -156.436 12.934 -12.100 0.000 Constant 74.810 3.419 21.880 R-squared = 0.566 Coef. SE T P TVs per capita (logged) -24.656 1.397 -17.640 0.000 Constant -11.151 3.346 -3.330 0.001 R-squared = 0.748

Getting Predicted Values Coef. SE T P TVs per capita (logged) -24.656 1.397 -17.640 0.000 Constant -11.151 3.346 -3.330 0.001 TVs per capita Logged Predicted value 0.1 -2.303 45.621 0.2 -1.609 28.531 0.3 -1.204 18.534 0.4 -0.916 11.441 0.5 -0.693 5.939 0.6 -0.511 1.444

Quadratic (squared) models Curved like logarithm Key difference: quadratics allow for “U-shaped” relationship Enter original variable and squared term Allows for a direct test of whether allowing the line to curve significantly improves the predictive power of the model

Age and Political Ideology Coef. SE T P Age -0.007 0.004 -1.740 0.082 Constant 0.122 0.209 0.580 0.561 What would we conclude from this analysis? Coef. SE T P Age -0.065 0.025 -2.630 0.009 Age-squared 0.001 0.000 2.390 0.017 Constant 1.554 0.635 2.450 0.015

Age and Political Ideology Coef. SE T P Age -0.065 0.025 -2.630 0.009 Age-squared 0.001 0.000 2.390 0.017 Constant 1.554 0.635 2.450 0.015 Age Age2 -0.065*Age .0005574*Age2 Constant Predicted Value 18 324 -1.178 0.181 1.554 0.557 28 784 -1.832 0.437 0.159 38 1444 -2.487 0.805 -0.128 48 2304 -3.141 1.284 -0.303 58 3364 -3.795 1.875 -0.366 68 4624 -4.450 2.577 -0.319 78 6084 -5.104 3.391 -0.159

Age and Political Ideology Coef. SE T P Age -0.065 0.025 -2.630 0.009 Age-squared 0.001 0.000 2.390 0.017 Constant 1.554 0.635 2.450 0.015 Note: We are using two variables to measure the relationship between age and ideology. Interpretation: statistically significant relationship between age and ideology (can confirm with an F-test) squared term significantly contributes to the predictive power of the model.

If you add a linear and squared term (e. g If you add a linear and squared term (e.g., age and age2) to a model and neither is independently statistically significant This does not necessarily mean that age is not significantly related to the outcome Why? What we want to know is whether age and age2 jointly improve the predictive power of the model. How can we test this?

Check whether value is above critical value in the F-distribution Formula F = (SSRr - SSRur)/q SSRur/(n-(k+1) q = # of variables being tested n = number of cases k = number of IVs in unrestricted Check whether value is above critical value in the F-distribution [depends on degrees of freedom: Numerator = number of IVs being tested; Denominator = N-(number of IVs)-1 ]

Don’t worry about the F-test formula The point is: F-tests are a way to test whether adding a set of variables reduces the sum of squared residuals enough to justify throwing these new variables into the model Depends on: How much sum of squared residuals is reduced How many variables we’re adding How many cases we have to work with More “acceptable” to add variables if you have a lot of cases Intuition: explaining 10 cases with 10 variables v. explaining 1000 cases with 10 variables?

TVs and Infant Mortality Squared term or logarithm? Coef. SE T P TVs per capita -380.088 29.949 -12.690 0.000 TVs per capita (squared) 410.957 51.629 7.960 Constant 90.197 3.353 26.900

Which is “better”? Two basic ways to decide: Theory Which yields a better fit?

Run two models and compare R-squared… or possibly… Coef. SE T P TVs per capita -30.288 74.056 -0.410 0.683 TVs per capita (squared) 63.413 81.652 0.780 0.439 TVs per capita (logged) -24.635 5.155 -4.780 0.000 Constant -9.465 20.417 -0.460 0.644 What might we conclude from these model estimates? Probably should also do an F-test of joint significance of TVs per capita and TVs per capita-squared. Why? That F-test returned a significance level of 0.335. So we can conclude that… Ultimately you’re best off relying on theory about the shape of the relationship

Ordered IVs  Indicators Sometimes we have reason to expect the relationship between an IV and outcome to be more complex Can address this using more polynomials (e.g., variable3, variable4, etc) We won’t go there… instead… Example: Party identification and evaluations of candidates and issues

Standard “branching” PID Items Generally speaking, do you usually think of yourself as a Republican, a Democrat, an Independent, or something else? If Republican or Democrat ask: Would you call yourself a strong (Republican/Democrat) or a not very strong (Republican/Democrat)? If Independent or something else ask: Do you think of yourself as closer to the Republican or Democratic party?

Party Identification Measure People who say Democrat or Republican in response to first question Strong Republican Weak Republican Lean Republican Independent Democrat Weak Democrat Strong Democrat -3 -2 -1 1 2 3 Question: Is the change from -2 to -1 (or 1 to 2) the same as the change from 0 to 1 or 2 to 3?

Party Identification (-3 to 3) Create Indicators Party Identification (-3 to 3) Seven Variables: Strong Republican (1=yes) Weak Republican (1=yes) Lean Republican (1=yes) Pure Independent (1=yes) Lean Democrat (1=yes) Weak Democrat (1=yes) Strong Democrat (1=yes)

Predict Obama Favorability (1-4) Coef. SE T P Strong Republican -1.632 0.161 -10.160 0.000 Weak Republican -0.707 0.198 -3.580 Lean Republican -1.235 0.181 -6.810 Lean Democrat 0.674 0.197 3.430 0.001 Weak Democrat 0.494 0.187 2.640 0.009 Strong Democrat 0.595 0.159 3.750 Constant 2.940 0.134 21.870 Excluded category: Pure Independents

Obama Favorability

Predict Obama Favorability (1-4) Coef. SE T P Strong Republican -0.397 0.150 -2.650 0.008 Weak Republican 0.528 0.189 2.790 0.006 Pure Independent 1.235 0.181 6.810 0.000 Lean Democrat 1.909 0.188 10.150 Weak Democrat 1.729 0.179 9.680 Strong Democrat 1.831 0.148 12.360 Constant 1.705 0.122 14.010 New excluded category: Leaning Republicans

DV: Obama Favorability Coef. SE T P Strong Republican -1.652 0.161 -10.290 0.000 Weak Republican -0.704 0.197 -3.580 Lean Republican -1.229 0.181 -6.790 Lean Democrat 0.654 0.195 3.340 0.001 Weak Democrat 0.457 0.187 2.440 0.015 Strong Democrat 0.579 0.158 3.650 Gender (female=1) 0.072 0.087 0.830 0.405 Age -0.041 0.019 -2.140 0.033 Age2 0.044 0.018 2.430 Constant 3.784 0.509 7.430 Predicted value for Pure Independent Male, age 20? Remember!: Always interpret these coefficients as the estimated relationships holding other variables in the model constant (or controlling for the other variables)

Notes and Next Time Homework due next Thursday (11/18) Next homework handed out next Tuesday Not due until Tuesday after Fall Break Next time: Dealing with situations where you expect the relationship between an IV and a DV to depend on the value of another IV