Download presentation
Presentation is loading. Please wait.
Published byNathaniel Bailey Modified over 9 years ago
1
Fall 2002Biostat 511247 Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis tests
2
Fall 2002Biostat 511248 Confidence Intervals - Binomial Proportion Recall that if n is “large” (np > 10 and n(1-p) > 10) we can approximate the binomial distribution by a normal: X ~ N(np,np(1-p)) or, equivalently, This suggests the following confidence interval for p:
3
Fall 2002Biostat 511249 Hypothesis Testing 1-sample Tests: Binomial Proportion When it is appropriate to use the normal approximation to the binomial, a test of the success probability, p, based on a binomial variable X, looks just like the Z- test (see homework for the case when np is small, so normal approximation is not appropriate) Example: Suppose that there is an equal chance that a child is male or female. We find in a sample of 114 workers at a pesticide plant (with only one child) that 66 of the children are female. Is this evidence that these workers are more likely to have girls? Define: p = probability that worker has daughter H 0 : p = 0.5 H A : p > 0.5
4
Fall 2002Biostat 511250 Hypothesis Testing 1-sample Tests: Binomial Proportion Data: X = 66, n = 114 X ~ N ( np, np(1-p) ) Under H 0 : p 0 = 0.5 For this one-sided alternative we reject H 0 if The critical value for a one-sided = 0.05 test is =1.65. Since the test statistic, Z = 1.69, exceeds the critical value, we reject H 0. Repeat this using in place of X to convince yourself it is equivalent!
5
Fall 2002Biostat 511251 Confidence interval? (use instead of p o in the standard error)
6
Fall 2002Biostat 511252 Hypothesis Testing 2 Sample Proportions - Motivation In a study of the morbidity and mortality among pediatric victims of motor vehicle accidents, information was gathered on whether children were wearing seat belts at the time of the accident (Osberg and DiScala, AJPH 1992). Of the 123 children who were wearing a seat belt, 3 died, while of the 290 children found not wearing a seat belt, 13 died. Q: Does this study show that wearing a seat belt is beneficial? Let’s construct a hypothesis test to assess the two fatality probabilities.
7
Fall 2002Biostat 511253 Hypothesis Testing 2 Sample Proportions What are the parameters and the statistics for this question? p 1 = mortality for children when seat belt is worn p 2 = mortality for children when seat belt is not worn X 1 = # of recorded deaths among those who wear seat belts n 1 = # of children surveyed from those that wear seat belts X 2 = # of recorded deaths among those who do not wear seat belts n 2 = # of children surveyed from those that no not wear seat belts
8
Fall 2002Biostat 511254 Hypothesis Testing 2 Sample Proportions The hypothesis that the two populations are the same is addressed by the hypotheses: H 0 : p 1 = p 2 H A : p 1 p 2 A statistic useful for this comparison is the difference in the observed, or sample, proportions (we’ll see some others later): Q: What is the distribution of this statistic? A: Approximately normal.
9
Fall 2002Biostat 511255 Hypothesis Testing 2 Sample Proportions We obtain a standard normal (approximately) if we use: Then under the null, H 0 : p 1 = p 2 = p 0, we obtain However, we still don’t know the common estimate, p 0, in the denominator… We need to replace p 0 with an estimate - just like pooled variances,, we obtain a weighted average as the estimate used in the variance.
10
Fall 2002Biostat 511256 The test statistic used for testing H 0 : p 1 = p 2 is: Note: The test is still valid if we had simply used the separate estimates, and, instead of the common estimate based on H 0. Note: A common estimate isn’t used when confidence intervals are computed for the difference in the population proportions, p 1 - p 2. In this case, we don’t have any assumption regarding the relationship between p 1 and p 2 so use the following as a 95% CI for p 1 - p 2
11
Fall 2002Biostat 511257 Returning to the example… So we estimate the separate risks as Thus it appears that the risk is nearly twice as high for the seat belt - group. We can test H 0 : p 1 = p 2 but we first need a common estimate (under the null): We use the statistic Since |Z| < 1.96 we fail to reject H 0 and conclude that the observed difference is not statistically significant at the 0.05 level. Hypothesis Testing 2 Sample Proportions
12
Fall 2002Biostat 511258 Hypothesis Testing 2 Sample Proportions Note, under the null we would have expected: n 1 x p 0 = 0.039 X 123 = 4.73 deaths in the seat belt + group. n 2 x p 0 = 0.039 X 290 = 11.15 deaths in the seat belt - group. The fact that one of these is “small” (less than 5) causes some concern about the normal approximation to the binomial. One alternative in this case is known as Fisher’s Exact Test - which does not make the normality assumption.
13
Fall 2002Biostat 511259 Power and Sample Size Power = Pr(reject H 0 | H a true) Power depends on 0, a, 2, and n. Sample size calculations ensure that the study is capable of detecting departures from the null hypothesis. Power and Sample size require a model for the data under both the null and the alternative is required.
14
Fall 2002Biostat 511260 We have mainly focused on the distribution of the test statistic under the null hypothesis. Shouldn’t we also consider the distribution under the alternative hypothesis? Yes! The distribution of the test statistic under the alternative hypothesis tells us the power of the test. Power indicates the ability of the test procedure to reliably detect departures from the null hypothesis. Power (1 - ) and significance ( ) are important considerations in the planning of a study. Sample size calculations. Power and Sample Size
15
Fall 2002Biostat 511261 Power Power refers to the probability of rejecting the null hypothesis when it truly is false: 1 - = P [ reject H 0 | H A true ] So when we consider power we compute probabilities assuming now that the alternative is the “truth”. Consider the 1-sample testing situation with the hypotheses: H 0 : = 0 H A : 0 The scenario is enough to illustrate all of the important concepts. Details change when we consider variants: 1. One-sided alternatives 2. 2-sample problems 3. Sample proportions We will look at the 1-sample case in detail and give results for the other situations.
16
Fall 2002Biostat 511262 Power Power refers to the likelihood of detecting a difference from H 0. Clearly the greater the difference between the null mean, 0, and the alternative mean, A, the more likely the sample mean will be “significantly far from H 0 ”. The idea here is that when the distance between the mean under the null and the mean under the alternative, | 0 - A | is “large” then the power is also large. However, as usual, we measure “large” in terms of SEM units (standard error of the mean), Define: = | 0 - A | Then what we want “large” to give good power is Based on this, we expect power to increase as: 1.Sample size increases. 2.Distance between 0 and A ( ) increases. 3.Variance gets smaller.
17
Fall 2002Biostat 511263 Power Q: How can we compute the power? We assume that is known. Even if we don’t know it, we’ll need an estimate of it. Also, we know the sample size, n. 1. Choose 2. Identify 0 3. Determine 1-sided / 2-sided 4. Identify A (1) - (3) determine what the rejection region will be. For example, a two-sided test of H 0 : 0 REJECT H 0 :
18
Fall 2002Biostat 511264 Power example µ 0 = 13.0, µ A = 12.8, = 0.7, =.05 n = 25 n = 100
19
Fall 2002Biostat 511265 Power Let’s assume that A < 0. Then, only the lower rejection region has non-negligible probability. (4) determines the distribution under the alternative so that power can be computed: So here we find the key quantity that determines the power is indeed (Note: if A > 0 then power depends on A - 0 )
20
Fall 2002Biostat 511266 Power Let’s come up for air We have shown that the POWER can be written as This tells us the probability of rejecting H 0 when the alternative is true. This is important!! Why spend $$$ on a study, that hopes to show a treatment effect, if the probability of rejecting H 0 is small? In fact, in the acquisition of $$$ from NIH one must show that the study is capable (sufficient power) of detecting a meaningful difference. One useful display is the Power Curve. This shows the power for different A.
21
Fall 2002Biostat 511267 Suppose we are testing blood pressure medication and know that average systolic blood pressure among hypertensives is 150 with a standard deviation of 10. For various treatment effects, measured as decreases, d, how likely are we to reject the null hypothesis (2-sided at = 0.05) that H 0 : d = 0 Depends on sample size and the alternative - Power Curve! Power Curve
22
Fall 2002Biostat 511268 Sample Size Q: How many patients should we treat? 1.Specify significance level ( ) 2.Specify null mean ( 0 ) 3.Specify power (1 - ) 4. Specify the alternative mean ( A ) 5. Specify the variance ( ) The real work is coming up with (1) - (5). All of this gives the results in the following (2) requirements: Significance: Power: Choose the sample size n that satisfies the 2 conditions:
23
Fall 2002Biostat 511269 Sample Size Suppose that we consider a decrease in blood pressure of 5mm Hg to be scientifically important. Then given the standard deviation, 10, how many patients are required to obtain 80% power using a 2-sided = 0.05 test? So we need Recruit n = 32 patients into our study.
24
Fall 2002Biostat 511270 Sample Size
25
Fall 2002Biostat 511271 Factors that Influence Sample Size The required sample size increases as: 1. 2 increases 2.significance level is made smaller ( decreases) 3.power increases (1 - increases) 4.distance | 0 - A | decreases
26
Fall 2002Biostat 511272 Sample Size 1-sample Mean, 1-sided Test 1-sample Proportion, 1-sided Test 1-sample Proportion, 2-sided Test
27
Fall 2002Biostat 511273 Sample Size 2-sample Mean, 2-sided Test, Equal sizes: n 1 = n 2 = n 2-sample Proportion, 2-sided Test, Equal sizes: n 1 = n 2 = n (rough approximation - for p 0 p A )
28
Fall 2002Biostat 511274 Summary Power is an important component in study design. Sample size calculations ensure that the study is capable of detecting departures from the null hypothesis. Power and Sample size require more than test - a model for the data under both the null and the alternative is required. STRUTs ( =.05; power =.80; 2-tailed) *One sample: n = 8/D 2 *Two sample: n = 16/D 2 where D = / (difference in std. dev. units)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.