Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters. Today we’ll use the very same principles, but we’ll do tests to use sample proportions of categorical variables to make inferences about parameters.
E.g., a random sample of households in a poor neighborhood in Lima, Peru, finds that 38% of the households are headed by single women. Is this neighborhood proportion representative of the population proportion of Lima poor neighborhoods), or is it different enough to test statistically significant?
E.g., since a non-government organization began providing enterprise training & subsidies to women in a poor neighborhood in Buenos Aires, a random sample finds that the neighborhood’s proportion of income-earning adult women has increased from 34% to 39%. Is this after vs. before proportion due to sampling variability? Or is it different enough to be statistically significant?
The same random sample finds that 41% of the sampled women who participated in the program & 36% of the sampled women who didn’t participate are now engaged in income-earning activities. Is this difference in proportions due to sampling variability? Or is it different enough to be statistically significant?
Sample proportion: a binomial count within a sample divided by the sample size-n. It is a categorical variable (e.g., yes vs. no; lived vs. died) Sample Proportion
Premises Random sample of independent observations. Binomial (i.e. ‘success’/’failure’) count. The population must be at least 10 times larger than the sample. There must be at least 10 observations for p & at least 10 observations for 1 – p.
If these sample assumptions are met, then the difference between the two proportions being compared (e.g., observed proportion vs. benchmark; two- sample proportions; after vs. before proportion) is approximately standard normal in distribution.
Moore/McCabe use a more precise than ‘traditional’ estimate of confidence interval for proportions called the ‘Wilson estimate.’ The Stata command is:. ci binaryvar, binomial wilson As we’ll later discuss, there are other options besides ‘wilson’.
Wilson estimate of the population proportion based on sample data: = X + 2/n + 4 Standard error of the proportion (i.e. based on sample data):
Approximate level C confidence interval for the proportion: Large-sample (n >5) significance test for a population proportion
The Stata command to find a ‘Wilson estimate’ of a population proportion based on a confidence interval of sample data:. ci hmath, binomial wilson. ci hmath, b w level(90). ci hmath, b w l(99) Other binomial options: exact, agresti, jeffreys.
To Repeat: The Steps Step 1: Ask if the binomial assumptions are fulfilled (including that both the expected #failures & the expected #successes >10). Step 2: Do a frequency table or bar graph of the binary variable & display the variable’s sample proportion. Step 3: If all checks out okay, state the null hypothesis & the alternative hypothesis. Step 4: Conduct the hypothesis test.
. use hsb2, clear. gen hmath=math>=60 & math<.. la var hmath “Honors math (>=60)” Example
. tab hmath Honors math (>=60)Freq.PercentCum Total
. ci hmath, binomial wilson Variable Obs Mean Std. Err. [95% Conf Interval] hmath | Ho: hmath=.265; Ha: hmath~=.265. Is hmath significantly different from.265 (two-sided test, i.e. does the mean of hmath fall outside the confidence interval)? Does using other command options (or no option except ‘binomial’) make a difference? A Two-Sided CI Hypothesis Test
How to Conduct a Large-Sample Hypothesis Test for a Population Proportion: prtest. ‘prtest’ allows testing one- or two- sided hypotheses. Check the premises & data. Test the hypothesis.
. prtest hmath =.265 One-sample test of proportion hmath:Number of obs = 200 Variable Mean Std. Err. z P>z [95% Conf. Interval] hmath Ho: proportion(hmath) =.265 Ha: hmath.265 z = z = z = P z = P > z = Conclusion: Fail to reject Ho.
Recall that conclusions are always uncertain.
How to Conduct a Test Comparing Two Proportions Ho: female hmath = male hmath Ha: female hmath ~= male hmath Check the premises & the data:.
tab female hmath female=1 | math>=60 male=0 | 0 1 | Total male | | 91 female | | Total | | 200
State the hypotheses. Ho: female hmath = male hmath Ha: female hmath ~= male hmath
Step 3: test the hypothesis.
. prtest hmath, by(female) Two-sample test of proportion male: Number of obs = 91 female: Number of obs = 109 Variable Mean Std. Err. z P>z [95% Conf. Interval] male female diff under Ho: Ho: proportion(male) - proportion(female) = diff = 0 Ha: diff 0 z = z = z = P z = P > z = Conclusion: Fail to reject Ho.
Substantive conclusions? Next research steps?
Here’s an after vs. before example.. Does a summer math course significantly increase the proportion of students who qualify for honors math? Check the sample premises. Display the data proportion. Test the hypothesis: Ho: post-test honors proportion = pre-test honors proportion (i.e. difference = 0) Ha: post-test honors proportion > pre-test honors proportion (i.e. difference > 0)
. prtesti Two-sample test of proportion x: Number of obs = 191 y: Number of obs = Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] x | y | diff | | under Ho: Ho: proportion(x) - proportion(y) = diff = 0 Ha: diff 0 z = z = z = P |z| = P > z =
Test conclusion? Results are always uncertain.