Download presentation
Presentation is loading. Please wait.
Published byChastity Harrington Modified over 9 years ago
1
Introduction to Statistics: Political Science (Class 9) Review
2
Probability of having cardiovascular disease Purpose of statistics: –Inferences about populations using samples We draw a random sample of 1,000 adults and 405 have some form of CVD Based on our sample, if we randomly select one adult from the population: what is the probability that they have cardiovascular disease?
3
Conditional Probability No CVDCVD Exercise less than 3 days/week (N=602) 30.3%28.9% Exercise 3 or more days/week (N=398) 30.2%10.6% Probability of exercising <3 days/week? Probability of CVD among those who exercise <3 days/week? Probability of CVD among those who exercise 3 or more days/week?
4
Association between exercise and CVD? No CVDCVD Exercise less than 3 days/week (N=602) 30.3%28.9% Exercise 3 or more days/week (N=398) 30.2%10.6% p 1 = 28.9/(30.3+28.9) = 0.488 p 2 = 10.6/(30.2+10.6) = 0.260 Difference = 0.488 - 0.260 =.228 Those who exercise less than 3 days/week.228 (22.8%) more likely to have CVD
5
Specifying and testing hypotheses Difference of proportions =.228 What’s our null hypothesis? Why a “null hypothesis”? Why not test whether the difference is.228? Central limit theorem –In repeated sampling, the distribution of our estimates of the mean (or difference of means or slope) will be normally distributed and centered over the true population value
6
Central limit theorem 1 standard error 0 Proposed true value
7
Comparing proportions Difference of proportions =.228 p 1 = 28.9/(30.3+28.9) = 0.488 (N=602) p 2 = 10.6/(30.2+10.6) = 0.260 (N=398) Standard error of this difference:
8
Comparing proportions So, standard error of difference is the square root of: (.488*(1-.488)/602)+(.260*(1-.260)/398) –Which is.0299 Difference of proportions =.237
9
Hypotheses Null hypothesis: –There is no difference in the rate of CVD between those who exercise less than 3 days/week and those who do Alternate hypothesis: –There is a difference in the rate of CVD between those who exercise less than 3 days/week and those who do (i.e., the difference is not 0)
10
If 0 is was the true difference, it would be very unlikely that we would find a difference 7.93 (.237/.0299) standard errors from that value by chance 1 standard error 0 Proposed true value
11
Does exercise cause lower CVD? Reverse causation? Might CVD cause exercise? Failure to account for confounds –Typically leads to over-estimating the strength of a relationship (not always… but usually)
13
Specification and Interpretation Multivariate Regression
14
Does exercise make CDV less likely? Regression (predict CDV) Estimated likelihood of CDV if exercise 4 days/week? What might confound our estimate of the relationship between exercise and CVD? Coef. SE T P-value Days Exercise (0-7)-0.06.001 ? 0.000 Constant 0.56.002 ? 0.000
15
Controlling for confounds Coef. SE T P-value Days Exercise (0-7)-0.03.001 -3.0 0.002 Days Fast Food (0-7) 0.04.002 2.0 0.048 Constant 0.42.002 21.0 0.000
16
% Chance CVD Days per Week Exercise High Fast Food Low Fast Food
17
Controlling for dichotomous confounds Predicted probability of CVD for –2 days exercise, 2 days Fast food, smoker Coef. SE T P-value Days Exercise (0-7)-0.03.001 -3.0 0.002 Days Fast Food (0-7) 0.04.002 2.0 0.048 Smoker (1=yes) 0.11.001 11.0 0.000 Constant 0.38.002 19.0 0.000
18
Nominal Variables Variable that does not have an “order” to it –Nothing is “higher” or “lower” Create set of dichotomous variables Always interpret coefficients with respect to the reference category
20
Controlling for nominal confounds Coef. SE T P-value Days Exercise (0-7)-0.03.001 -3.0 0.002 Days Fast Food (0-7) 0.03.002 1.5 0.135 Smoker (1=yes) 0.09.001 9.0 0.000 South (1=yes) 0.03.002 1.5 0.137 West (1=yes)-0.01.002 -0.5 0.642 Northeast (1=yes) 0.02.002 1.0 0.410 Constant 0.34.002 17.0 0.000 (Midwest is excluded category) What if we wanted to test whether including region indicators improves fit of the model?
21
Non-linear relationships
22
Logarithms Why use a logarithmic transformation? You think the relationship looks like this…
23
Logarithms
24
Squared term – U(or ∩)-shaped relationship Coef.SETP Age-0.0070.004-1.7400.082 Constant0.1220.2090.5800.561 Coef.SETP Age-0.0650.025-2.6300.009 Age-squared0.0010.0002.3900.017 Constant1.5540.6352.4500.015 Age and political ideology (-2=very conservative, 2=very liberal)
25
Age and Political Ideology Coef.SETP Age-0.0650.025-2.6300.009 Age-squared0.0010.0002.3900.017 Constant1.5540.6352.4500.015 AgeAge 2 -0.065*Age.0005574*Age 2 ConstantPredicted Value 18324-1.1780.1811.5540.557 28784-1.8320.4371.5540.159 381444-2.4870.8051.554-0.128 482304-3.1411.2841.554-0.303 583364-3.7951.8751.554-0.366 684624-4.4502.5771.554-0.319 786084-5.1043.3911.554-0.159
27
Create indicators from an ordered variable Party Identification (-3 to 3) Seven Variables: Strong Republican (1=yes) Weak Republican (1=yes) Lean Republican (1=yes) Pure Independent (1=yes) Lean Democrat (1=yes) Weak Democrat (1=yes) Strong Democrat (1=yes)
28
Predict Obama Favorability (1-4) Coef.SETP Strong Republican-1.6320.161-10.1600.000 Weak Republican-0.7070.198-3.5800.000 Lean Republican-1.2350.181-6.8100.000 Lean Democrat0.6740.1973.4300.001 Weak Democrat0.4940.1872.6400.009 Strong Democrat0.5950.1593.7500.000 Constant2.9400.13421.8700.000 Excluded category: Pure Independents
29
Obama Favorability
30
Predict Obama Favorability (1-4) Coef.SETP Strong Republican-0.3970.150-2.6500.008 Weak Republican0.5280.1892.7900.006 Pure Independent1.2350.1816.8100.000 Lean Democrat1.9090.18810.1500.000 Weak Democrat1.7290.1799.6800.000 Strong Democrat1.8310.14812.3600.000 Constant1.7050.12214.0100.000 New excluded category: Leaning Republicans
31
Interactions One variable moderates the effect of another – i.e., the relationship between one variable and an outcome depends on the value of another variable
32
Coef.SETP Party Affiliation (-3=strong R; 3=strong D)1.2860.8781.4600.143 Voted in 2008-1.1381.484-0.7700.443 Party Affiliation x Voted in 20083.5750.9183.9000.000 Constant61.1001.35844.9800.000 61.100 + 1.286*Party – 1.138*Voted + 3.575*Party*Voted + u 61.100 + Party*1.286 + Party*Voted*3.575 – 1.138*Voted + u 61.100 + Party(1.286 + Voted*3.575) – 1.138*Voted + u 61.100 + Party*1.286 + Voted*Party*3.575 – Voted*1.138 + u 61.100 + Party*1.286 + Voted(Party*3.575 –1.138) + u OR Regression estimates an equation…
33
Party Aff.VotedParty Aff.VotedParty x VotedConstantPredicted Value Coefficients 1.286-1.1383.57561.100 -30-3.8580061.10057.242 -20-2.5720061.10058.528 0-1.2860061.10059.814 000.0000061.100 101.2860061.10062.386 202.5720061.10063.672 303.8580061.10064.959 Party Aff.VotedParty Aff.VotedParty x VotedConstantPredicted Value Coefficients 1.286-1.1383.57561.100 -31-3.858-1.13775-10.725861.10045.378 -21-2.572-1.13775-7.150561.10050.240 1-1.286-1.13775-3.5752561.10055.101 010.000-1.13775061.10059.962 111.286-1.137753.57525261.10064.824 212.572-1.137757.15050461.10069.685 313.858-1.1377510.7257661.10074.547
35
Establishing causality
36
Dealing with confounds Theory + multivariate regression Experiments
37
Dealing with reverse causation Theory Experiments
38
What is the key characteristic of an experiment? How does this address reverse causality? How does it address confounds? Weaknesses/limitations of experiments?
39
Exam Expectations Describe probabilities / conditional probabilities Write hypotheses –Demonstrate understanding of how null hypotheses relate to the central limit theorem Test difference of proportions (formula for SE will be provided) Interpreting multivariate regression –Relationships (slopes) –Predicted values –Sketch graphs of relationships Discuss strengths and limitations of analyses –Why an estimated slope might be biased –Benefits and limitations of experiments
40
Notes Homework 3 graded Homework 4 due Thursday 12/9 Office hours next week – email to come Exam December 14 at 2pm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.