Download presentation
1
Point and Confidence Interval Estimation of a Population Proportion, p
2
We are frequently interested in estimating the proportion of a population with a characteristic of interest, such as: the proportion of smokers the proportion of cancer patients who survive at least 5 years the proportion of new HIV patients who are female
3
If we take a random sample from a population
observe the number of subjects with the characteristic of interest (# of “successes”) we are observing a binomial random variable. Now, however, we will focus on estimating the true proportion , p, in the population rather than focusing on the count.
4
Again, one way to deal with this type of data is to define a random variable X that can take two values: X = 1, if characteristic is present – a “success” X = 0, if characteristic is absent – a “failure” Then if we sum all values in a population, we are summing zeros and ones – this will give a count of the number of individuals in the population with the characteristic:
5
The population mean is the Proportion of individuals in population with the characteristic:
The sample proportion is then: Therefore, p is the estimator of p, the proportion with a characteristic of interest.
6
By the Central Limit Theorem, we know, for n large
even when X is not normally distributed. When X is a 0,1 variable, for n large we know from the central limit theorem.
7
What is the variance, s2, for a 0,1 variable?
We know By use of algebra, and the fact that xs2 = xs. for a 0,1 variable, we can show that
8
For those who want the algebra:
expand x2 = x, for 0,1 sum over constant
9
Hence, The standard error of the sample proportion is Standard error of P:
10
We also know, by the central limit theorem, that for large n, P is approximately normally distributed: For Estimation of the population proportion, p: Point Estimate: Confidence Interval Estimate:
11
Example: Suppose that a sample of 1000 voters is taken to determine presidential preference.
In this sample, 585 persons indicated that they would vote for candidate A. Construct a 95% confidence interval estimate for the true proportion, p, in the population planning to vote for candidate A. The confidence interval for p takes the form:
12
The point estimate of the proportion is: p= (585/1000) = .585
The 95% confidence interval estimator of p is However we don’t know p, so we will use p in it’s place to estimate the standard error:
13
The 95% CI on the proportion preferring Candidate A is (.554, .616).
This does not include the value .50: Either we obtained an unusually large sample mean (such that the interval estimate did not overlap µ=0.5) if µ really is .5, or the population mean is not .5, suggesting that candidate A will win the election.
14
When is the sample large enough to use the normal approximation to the binomial?
When (n)(π)5, and (n)(1-π)5 That is, when both the expected number of successes and the expected number of failures is greater than 5.
15
Aside: improve to the normal Appoximation for a Binomial
The Binomial distribution is discrete, while the normal distribution is continuous. When the true proportion,π, is known, we can match the binomial distribution better to a normal distribution by including a correction. The correction is called the ‘continuity correction’. For example, when π = .5, and n = 10, to approximate We use instead the normal approximation for the probability
16
Example of ‘Continuity Correction' to the Normal Approximation to the Binomial.
Suppose π = .5 and n = 16. Compare the exact normal approximation and continuity corrected values of P(.4375 ≤ P ≤ .5). From Binomial Table: Using Normal Approximation, no correction Using Correction:
17
Using P in place of p to estimate the standard error sp:
1.If (n)(π)5 and (n)(1-π)5, use P: 2.Otherwise, a) Assume π=.5,or b) use an ‘exact ’method for the CI We do this to avoid underestimating the variance, p(1– p) which is at a maximum when p=.5 Don’t use Student’s t with proportions since the assumption of normality of the underlying population elements is not satisfied by a 0,1 variable.
18
What do we use when the normal approximation is not appropriate?
Exact Binomial Confidence Intervals for p can be computed: Solve for x in the following and then substitute into p= x/n: Lower Limit: Upper Limit: Clearly, exact binomial CI is not simple to compute
19
Go to Minitab or other software Stat Basic Statistics 1 Proportion
Leave blank for Binomial CI; Check for Normal approx. n x
20
Normal Approximation:
EXACT Binomial: Test and CI for One Proportion Test of p = 0.5 vs p not = 0.5 Exact Sample X N Sample p % CI P-Value ( , ) Normal Approximation: Sample X N Sample p % CI Z-Value P-Value ( , )
21
Sample Size Estimation when the goal is
Estimating a Population Proportion, p The pattern is the same as when goal is estimation of a mean: If we know the desired precision (width of interval) confidence level “guess” of the proportion to get std error we can estimate the sample size, n.
22
The width of a confidence interval for P is: w = 2[z1-a/2 (sP)] ,
where sP is the standard error of P w ( ) P – z1-a/2(sP) P P + z1-a/2(sP) Using we have
23
Solving for n gives us Note: this requires information about p, which is our goal! However, p(1–p) is at a maximum when p=.5 To be conservative (over- rather than under-estimate sample size) use (.5) in place of p
24
Substituting in .5 for p gives a conservative sample size estimator of:
25
Example: For an election poll, how many voters should be surveyed to estimate the proportion, to within 5%, in favor of re-electing the current mayor, with 95% confidence? We have a confidence level, 1–a = z.975 = 1.96 We have a desired width of 5% = .05, w = .10 Conservative: n = (z1-a/2)2/w2 = (1.96)2/(.10)2 = We should poll 385 voters to achieve a 95% CI of 5%
26
What if we have some information on p?
A previous poll tells us that the current office-holder had ~ 75% of the voter support. Assuming p = .75: n = 4p(1–p)(z1-a/2)2/w2 = 4(.75)(.25)(1.96)2/(.10)2 = Using available information we get a sample size estimate of 289 voters which can save us considerable time and expense, compared to the more conservative estimate.
27
Confidence Interval Calculation for the
Difference between two proportions, p1 – p2, Two independent groups We are often interested in comparing proportions from 2 populations: Is the incidence of disease A the same in two populations? Patients are treated with either drug D, or with placebo. Is the proportion “improved” the same in both groups?
28
Suppose we take independent, random samples from two groups, and estimate a proportion in each.
For large enough sample size, we know: Then the standard error of the difference between the sample proportions is the square root of the sum of the variances:
29
Or, since we don’t know the true proportions, the sample estimate of the standard error:
Thus, for n large, the (1-a) confidence interval estimator is:
30
Example: In a clinical trial for a new drug to treat hypertension, 50 patients were randomly assigned to receive the new drug, and 50 patients to receive a placebo. 34 of the patients receiving the drug showed improvement, while 15 of those receiving placebo showed improvement. Compute a 95% confidence interval estimate for the difference between proportions improved.
31
Point Estimate of (p1 – p2):
p2 = 15/50 = (p1 – p2)= .68 – .30 = .38 Since we have n1 = n2 = 50, our sample size is large enough to use the sample estimate of standard error:
32
Confidence coefficient: For 1 – a = .95, z1-a/2 = z.975 = 1.96
Confidence Interval Estimate: The 95% CI estimate is: (.199 , .561) or (19.9% , 56.1%) The difference between proportions improved is bounded away from zero – it seems that the proportion improved by the drug is clearly greater than the proportion by placebo.
33
Using Minitab: Stat Basic Statistics 2 Proportions
Enter sample sizes n1 and n2 Enter # of successes x1 and x2
34
Test and Confidence Interval for Two Proportions
Sample X N Sample p Estimate for p(1) - p(2): 0.38 95% CI for p(1) - p(2): ( , )
35
The same cautions apply here, as for estimates for a single proportion
the sample size should be large enough in each group, so that the normal approximation will hold: nπ5 and n(1-π)5 for each sample Otherwise: a) use .5 in place of π when estimating the variance for the confidence interval. b) use some other method. Minitab offers the option to compute a pooled estimate of the standard error
36
And in summary: Confidence interval estimates provide a range of likely values an associated probability, or confidence level. The width of the confidence interval depends upon: The underlying variability in the population The sample size The confidence level
37
It is important to keep track of assumptions that we must make about the data:
Samples should be selected randomly selection of any element is independent of selection of any others For many cases, we must assume that the underlying population follows a normal distribution without this assumption, probabilities computed using the t-distribution c2–distribution F-distribution may not be correct.
38
When we speak of “knowing” the population variance, s2,
we really mean that we have an outside source of information previous research, census data, etc. the key is that we are not using the sample estimate, s2, based upon the current sample.
39
The key to confidence interval estimation is to know
what parameter you are estimating the point estimate of the parameter the confidence level what distributional assumptions are required the associated distribution for computing probabilities. I have started a summary table for you below – completing this table will be a good review exercise.
40
(1 – a) Confidence Interval Estimate
Distribution of data Parameter to Estimate Point Estimate (1 – a) Confidence Interval Estimate N(m , s2) m s2 known: s2 unknown: Any, n large For n large: Bin (n,p) p P s2 S2 N(m1 , s12) N(m2 , s22) m1 – m2 Bin (n1,p1) Bin (n2,p2) p1 – p2 s12/s22
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.