 Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

 Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same principles, but we’ll do tests to use sample proportions of categorical variables to make inferences about parameters.

 E.g., a random sample of households in a poor neighborhood in Lima, Peru, finds that 38% of the households are headed by single women.  Is this neighborhood proportion representative of the population proportion of Lima poor neighborhoods), or is it different enough to test statistically significant?

 E.g., since a non-government organization began providing enterprise training & subsidies to women in a poor neighborhood in Buenos Aires, a random sample finds that the neighborhood’s proportion of income-earning adult women has increased from 34% to 39%.  Is this after vs. before proportion due to sampling variability? Or is it different enough to be statistically significant?

 The same random sample finds that 41% of the sampled women who participated in the program & 36% of the sampled women who didn’t participate are now engaged in income-earning activities.  Is this difference in proportions due to sampling variability? Or is it different enough to be statistically significant?

 Sample proportion: a binomial count within a sample divided by the sample size-n.  It is a categorical variable (e.g., yes vs. no; lived vs. died) Sample Proportion

Premises  Random sample of independent observations.  Binomial (i.e. ‘success’/’failure’) count.  The population must be at least 10 times larger than the sample.  There must be at least 10 observations for p & at least 10 observations for 1 – p.

 If these sample assumptions are met, then the difference between the two proportions being compared (e.g., observed proportion vs. benchmark; two- sample proportions; after vs. before proportion) is approximately standard normal in distribution.

 Moore/McCabe use a more precise than ‘traditional’ estimate of confidence interval for proportions called the ‘Wilson estimate.’  The Stata command is:. ci binaryvar, binomial wilson  As we’ll later discuss, there are other options besides ‘wilson’.

 Wilson estimate of the population proportion based on sample data: = X + 2/n + 4  Standard error of the proportion (i.e. based on sample data):

 Approximate level C confidence interval for the proportion:  Large-sample (n >5) significance test for a population proportion

 The Stata command to find a ‘Wilson estimate’ of a population proportion based on a confidence interval of sample data:. ci hmath, binomial wilson. ci hmath, b w level(90). ci hmath, b w l(99)  Other binomial options: exact, agresti, jeffreys.

To Repeat: The Steps  Step 1: Ask if the binomial assumptions are fulfilled (including that both the expected #failures & the expected #successes >10).  Step 2: Do a frequency table or bar graph of the binary variable & display the variable’s sample proportion.  Step 3: If all checks out okay, state the null hypothesis & the alternative hypothesis.  Step 4: Conduct the hypothesis test.

. use hsb2, clear. gen hmath=math>=60 & math<.. la var hmath “Honors math (>=60)” Example

. tab hmath Honors math (>=60)Freq.PercentCum. 015175.5075.50 1 4924.50100.00 Total200 100.00

. ci hmath, binomial wilson Variable Obs Mean Std. Err. [95% Conf Interval] hmath | 200.245.0304118.1905687.3090424  Ho: hmath=.265; Ha: hmath~=.265.  Is hmath significantly different from.265 (two-sided test, i.e. does the mean of hmath fall outside the confidence interval)?  Does using other command options (or no option except ‘binomial’) make a difference? A Two-Sided CI Hypothesis Test

How to Conduct a Large-Sample Hypothesis Test for a Population Proportion: prtest. ‘prtest’ allows testing one- or two- sided hypotheses.  Check the premises & data.  Test the hypothesis.

. prtest hmath =.265 One-sample test of proportion hmath:Number of obs = 200 Variable Mean Std. Err. z P>z [95% Conf. Interval] hmath.245.0304118 8.05609 0.0000.1853941.3046059 Ho: proportion(hmath) =.265 Ha: hmath.265 z = -0.641 z = -0.641 z = -0.641 P z = 0.5216 P > z = 0.7392 Conclusion: Fail to reject Ho.

 Recall that conclusions are always uncertain.

How to Conduct a Test Comparing Two Proportions Ho: female hmath = male hmath Ha: female hmath ~= male hmath  Check the premises & the data:.

tab female hmath female=1 | math>=60 male=0 | 0 1 | Total ------------------------------------------- male | 68 23 | 91 female | 83 26 | 109 ------------------------------------------- Total | 151 49 | 200

 State the hypotheses. Ho: female hmath = male hmath Ha: female hmath ~= male hmath

 Step 3: test the hypothesis.

. prtest hmath, by(female) Two-sample test of proportion male: Number of obs = 91 female: Number of obs = 109 Variable Mean Std. Err. z P>z [95% Conf. Interval] male.2527473.0455571 5.54792 0.0000.1634569.3420376 female.2385321.0408212 5.84334 0.0000.158524.3185402 diff.0142151.0611704 -.1056767.134107 under Ho:.0610714.232763 0.8159 Ho: proportion(male) - proportion(female) = diff = 0 Ha: diff 0 z = 0.233 z = 0.233 z = 0.233 P z = 0.8159 P > z = 0.4080  Conclusion: Fail to reject Ho.

 Substantive conclusions? Next research steps?

 Here’s an after vs. before example.. Does a summer math course significantly increase the proportion of students who qualify for honors math?  Check the sample premises.  Display the data proportion.  Test the hypothesis: Ho: post-test honors proportion = pre-test honors proportion (i.e. difference = 0) Ha: post-test honors proportion > pre-test honors proportion (i.e. difference > 0)

. prtesti 191.271 200.245 Two-sample test of proportion x: Number of obs = 191 y: Number of obs = 200 ------------------------------------------------------------------------------ Variable | Mean Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x |.271.0321612.2079653.3340347 y |.245.0304118.1853941.3046059 -------------+---------------------------------------------------------------- diff |.026.044263 -.0607539.1127539 | under Ho:.0442491 0.59 0.557 ------------------------------------------------------------------------------ Ho: proportion(x) - proportion(y) = diff = 0 Ha: diff 0 z = 0.588 z = 0.588 z = 0.588 P |z| = 0.5568 P > z = 0.2784

 Test conclusion?  Results are always uncertain.

 Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Similar presentations

Presentation on theme: " Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

 Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same.

Similar presentations

Presentation on theme: " Last time we discussed t-tests: how to use sample means of quantitative variables to make inferences about parameters.  Today we’ll use the very same."— Presentation transcript:

Similar presentations

About project

Feedback