Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistics: Political Science (Class 9) Review.

Similar presentations


Presentation on theme: "Introduction to Statistics: Political Science (Class 9) Review."— Presentation transcript:

1 Introduction to Statistics: Political Science (Class 9) Review

2 Probability of having cardiovascular disease Purpose of statistics: –Inferences about populations using samples We draw a random sample of 1,000 adults and 405 have some form of CVD Based on our sample, if we randomly select one adult from the population: what is the probability that they have cardiovascular disease?

3 Conditional Probability No CVDCVD Exercise less than 3 days/week (N=602) 30.3%28.9% Exercise 3 or more days/week (N=398) 30.2%10.6% Probability of exercising <3 days/week? Probability of CVD among those who exercise <3 days/week? Probability of CVD among those who exercise 3 or more days/week?

4 Association between exercise and CVD? No CVDCVD Exercise less than 3 days/week (N=602) 30.3%28.9% Exercise 3 or more days/week (N=398) 30.2%10.6% p 1 = 28.9/(30.3+28.9) = 0.488 p 2 = 10.6/(30.2+10.6) = 0.260 Difference = 0.488 - 0.260 =.228 Those who exercise less than 3 days/week.228 (22.8%) more likely to have CVD

5 Specifying and testing hypotheses Difference of proportions =.228 What’s our null hypothesis? Why a “null hypothesis”? Why not test whether the difference is.228? Central limit theorem –In repeated sampling, the distribution of our estimates of the mean (or difference of means or slope) will be normally distributed and centered over the true population value

6 Central limit theorem 1 standard error 0 Proposed true value

7 Comparing proportions Difference of proportions =.228 p 1 = 28.9/(30.3+28.9) = 0.488 (N=602) p 2 = 10.6/(30.2+10.6) = 0.260 (N=398) Standard error of this difference:

8 Comparing proportions So, standard error of difference is the square root of: (.488*(1-.488)/602)+(.260*(1-.260)/398) –Which is.0299 Difference of proportions =.237

9 Hypotheses Null hypothesis: –There is no difference in the rate of CVD between those who exercise less than 3 days/week and those who do Alternate hypothesis: –There is a difference in the rate of CVD between those who exercise less than 3 days/week and those who do (i.e., the difference is not 0)

10 If 0 is was the true difference, it would be very unlikely that we would find a difference 7.93 (.237/.0299) standard errors from that value by chance 1 standard error 0 Proposed true value

11 Does exercise cause lower CVD? Reverse causation? Might CVD cause exercise? Failure to account for confounds –Typically leads to over-estimating the strength of a relationship (not always… but usually)

12

13 Specification and Interpretation Multivariate Regression

14 Does exercise make CDV less likely? Regression (predict CDV) Estimated likelihood of CDV if exercise 4 days/week? What might confound our estimate of the relationship between exercise and CVD? Coef. SE T P-value Days Exercise (0-7)-0.06.001 ? 0.000 Constant 0.56.002 ? 0.000

15 Controlling for confounds Coef. SE T P-value Days Exercise (0-7)-0.03.001 -3.0 0.002 Days Fast Food (0-7) 0.04.002 2.0 0.048 Constant 0.42.002 21.0 0.000

16 % Chance CVD Days per Week Exercise High Fast Food Low Fast Food

17 Controlling for dichotomous confounds Predicted probability of CVD for –2 days exercise, 2 days Fast food, smoker Coef. SE T P-value Days Exercise (0-7)-0.03.001 -3.0 0.002 Days Fast Food (0-7) 0.04.002 2.0 0.048 Smoker (1=yes) 0.11.001 11.0 0.000 Constant 0.38.002 19.0 0.000

18 Nominal Variables Variable that does not have an “order” to it –Nothing is “higher” or “lower” Create set of dichotomous variables Always interpret coefficients with respect to the reference category

19

20 Controlling for nominal confounds Coef. SE T P-value Days Exercise (0-7)-0.03.001 -3.0 0.002 Days Fast Food (0-7) 0.03.002 1.5 0.135 Smoker (1=yes) 0.09.001 9.0 0.000 South (1=yes) 0.03.002 1.5 0.137 West (1=yes)-0.01.002 -0.5 0.642 Northeast (1=yes) 0.02.002 1.0 0.410 Constant 0.34.002 17.0 0.000 (Midwest is excluded category) What if we wanted to test whether including region indicators improves fit of the model?

21 Non-linear relationships

22 Logarithms Why use a logarithmic transformation? You think the relationship looks like this…

23 Logarithms

24 Squared term – U(or ∩)-shaped relationship Coef.SETP Age-0.0070.004-1.7400.082 Constant0.1220.2090.5800.561 Coef.SETP Age-0.0650.025-2.6300.009 Age-squared0.0010.0002.3900.017 Constant1.5540.6352.4500.015 Age and political ideology (-2=very conservative, 2=very liberal)

25 Age and Political Ideology Coef.SETP Age-0.0650.025-2.6300.009 Age-squared0.0010.0002.3900.017 Constant1.5540.6352.4500.015 AgeAge 2 -0.065*Age.0005574*Age 2 ConstantPredicted Value 18324-1.1780.1811.5540.557 28784-1.8320.4371.5540.159 381444-2.4870.8051.554-0.128 482304-3.1411.2841.554-0.303 583364-3.7951.8751.554-0.366 684624-4.4502.5771.554-0.319 786084-5.1043.3911.554-0.159

26

27 Create indicators from an ordered variable Party Identification (-3 to 3) Seven Variables: Strong Republican (1=yes) Weak Republican (1=yes) Lean Republican (1=yes) Pure Independent (1=yes) Lean Democrat (1=yes) Weak Democrat (1=yes) Strong Democrat (1=yes)

28 Predict Obama Favorability (1-4) Coef.SETP Strong Republican-1.6320.161-10.1600.000 Weak Republican-0.7070.198-3.5800.000 Lean Republican-1.2350.181-6.8100.000 Lean Democrat0.6740.1973.4300.001 Weak Democrat0.4940.1872.6400.009 Strong Democrat0.5950.1593.7500.000 Constant2.9400.13421.8700.000 Excluded category: Pure Independents

29 Obama Favorability

30 Predict Obama Favorability (1-4) Coef.SETP Strong Republican-0.3970.150-2.6500.008 Weak Republican0.5280.1892.7900.006 Pure Independent1.2350.1816.8100.000 Lean Democrat1.9090.18810.1500.000 Weak Democrat1.7290.1799.6800.000 Strong Democrat1.8310.14812.3600.000 Constant1.7050.12214.0100.000 New excluded category: Leaning Republicans

31 Interactions One variable moderates the effect of another – i.e., the relationship between one variable and an outcome depends on the value of another variable

32 Coef.SETP Party Affiliation (-3=strong R; 3=strong D)1.2860.8781.4600.143 Voted in 2008-1.1381.484-0.7700.443 Party Affiliation x Voted in 20083.5750.9183.9000.000 Constant61.1001.35844.9800.000 61.100 + 1.286*Party – 1.138*Voted + 3.575*Party*Voted + u 61.100 + Party*1.286 + Party*Voted*3.575 – 1.138*Voted + u 61.100 + Party(1.286 + Voted*3.575) – 1.138*Voted + u 61.100 + Party*1.286 + Voted*Party*3.575 – Voted*1.138 + u 61.100 + Party*1.286 + Voted(Party*3.575 –1.138) + u OR Regression estimates an equation…

33 Party Aff.VotedParty Aff.VotedParty x VotedConstantPredicted Value Coefficients  1.286-1.1383.57561.100 -30-3.8580061.10057.242 -20-2.5720061.10058.528 0-1.2860061.10059.814 000.0000061.100 101.2860061.10062.386 202.5720061.10063.672 303.8580061.10064.959 Party Aff.VotedParty Aff.VotedParty x VotedConstantPredicted Value Coefficients  1.286-1.1383.57561.100 -31-3.858-1.13775-10.725861.10045.378 -21-2.572-1.13775-7.150561.10050.240 1-1.286-1.13775-3.5752561.10055.101 010.000-1.13775061.10059.962 111.286-1.137753.57525261.10064.824 212.572-1.137757.15050461.10069.685 313.858-1.1377510.7257661.10074.547

34

35 Establishing causality

36 Dealing with confounds Theory + multivariate regression Experiments

37 Dealing with reverse causation Theory Experiments

38 What is the key characteristic of an experiment? How does this address reverse causality? How does it address confounds? Weaknesses/limitations of experiments?

39 Exam Expectations Describe probabilities / conditional probabilities Write hypotheses –Demonstrate understanding of how null hypotheses relate to the central limit theorem Test difference of proportions (formula for SE will be provided) Interpreting multivariate regression –Relationships (slopes) –Predicted values –Sketch graphs of relationships Discuss strengths and limitations of analyses –Why an estimated slope might be biased –Benefits and limitations of experiments

40 Notes Homework 3 graded Homework 4 due Thursday 12/9 Office hours next week – email to come Exam December 14 at 2pm


Download ppt "Introduction to Statistics: Political Science (Class 9) Review."

Similar presentations


Ads by Google