Determination of Sample Size In almost all research situations the researcher is interested in the question: How large should the sample be?
Answer: Depends on: How accurate you want the answer. Accuracy is specified by: Specifying the magnitude of the error bound Level of confidence
Confidence Intervals for the mean of a Normal Population, m Let and Then t1 to t2 is a (1 – a)100% = P100% confidence interval for m = (1 – a)100% Error Bound for m The accuracy of a confidence interval is specified by setting: The magnitude of B (the Error bound), and The level of confidence (1 – a)100%
Error Bound: If we have specified the level of confidence then the value of za/2 will be known. If we have specified the magnitude of B, it will also be known Solving for n we get: s* is some estimate of s.
Summarizing: The sample size that will estimate m with an Error Bound B and level of confidence P = 1 – a is: where: B is the desired Error Bound za/2 is the a/2 critical value for the standard normal distribution s* is some preliminary estimate of s.
Notes: n increases as B, the desired Error Bound, decreases Larger sample size required for higher level of accuracy n increases as the level of confidence, (1 – a), increases za/2 increases as a/2 becomes closer to zero. Larger sample size required for higher level of confidence n increases as the standard deviation, s, of the population increases. If the population is more variable then a larger sample size required
Summary: The sample size n depends on: Desired level of accuracy Desired level of confidence Variability of the population
Example Suppose that one is interested in estimating the average number of grams of fat (m) in one kilogram of lean beef hamburger : This will be estimated by: randomly selecting one kilogram samples, then Measuring the fat content for each sample. Preliminary estimates of m and s indicate: that m and s are approximately 220 and 40 respectively. I want the study to estimate m with an error bound 5 and a level of confidence to be 95% (i.e. a = 0.05 and za/2 = z0.025 = 1.960)
Solution Hence n = 246 one kilogram samples are required to estimate m within B = 5 gms with a 95% level of confidence.
Confidence Intervals for the mean of a Bernoulli probability, p Let and Then t1 to t2 is a (1 – a)100% = P100% confidence interval for p = (1 – a)100% Error Bound for p The accuracy of a confidence interval is specified by setting: The magnitude of B (the Error bound), and The level of confidence (1 – a)100%
Error Bound: If we have specified the level of confidence then the value of za/2 will be known. If we have specified the magnitude of B, it will also be known Solving for n we get:
Summarizing: The sample size that will estimate p with an Error Bound B and level of confidence P = 1 – a is: where: B is the desired Error Bound za/2 is the a/2 critical value for the standard normal distribution p* is some preliminary estimate of p. If no estimate for p is available use p = 0.50. One can easily check that the maximum sample size required occurs when p = 0.50.
maximum sample size n occurs when p = 0.50.
Example Suppose that I want to conduct a survey and want to estimate p = proportion of voters who favour a downtown location for a casino: I know that the approximate value of p is p* = 0.50. This is also a good choice for p if one has no preliminary estimate of its value. I want the survey to estimate p with an error bound B = 0.01 (1 percentage point) I want the level of confidence to be 95% (i.e. a = 0.05 and za/2 = z0.025 = 1.960 Then
A general method for constructing confidence limits
Definition: A statistic t is called a pivotal statistic (for determining confidence limits for the parameter f) if: The distribution of t is completely known (not dependent on any unknown parameters.) The only unknown parameter that the statistic t depends on is the parameter f (the parameter being estimated.) The statistic t depends on the data x1, …, xn through the sufficient statistics, S1, …, Sq.
Examples of pivotal statistics Estimating m, the mean of a Normal population A pivotal statistic if s is known. Has a known distribution N(0,1). Only depends on the unknown parameter s. Depends on the data through the sufficient statistics
Estimating p, the bernoulli probability A pivotal statistic. Has a known distribution N(0,1). Only depends on the unknown parameter p. Depends on the data through the sufficient statistics
To construct confidence limits using a pivotal statistic Construct a probability statement regarding the pivotal statistic t. This is possible because the distribution of t is completely known. Translate this statement into a confidence statement about the parameter f (the parameter being estimated.)
Estimating m, the mean of a Normal population (s2 known) Pivotal Statistic Starting with after some manipulation we get
Estimating p, a Bernoulli probability Pivotal Statistic Starting with after some manipulation we get
Estimating m, the mean of a Normal population The t distribution Estimating m, the mean of a Normal population (s2 unknown) Let x1, … , xn denote a sample from the normal distribution with mean m and variance s2. Both m and s2 are unknown Recall Also
Recall also that if : then has a t-distribution with n degrees of freedom. Thus since then has a t-distribution with n – 1 degrees of freedom.
Thus we use as the pivotal statistic It satisfies the conditions of a pivotal statistic. has a known distribution, the t-distribution with n -1 df. only depends on the unknown parameter m. depends on the data through the sufficient statistics
Critical Values for the t–distribution with n df Definition The a-upper critical values for the t–distribution with n df is the quantity such that t–distribution with n df
Thus we use as the pivotal statistic to set up confidence limits for m. Starting with
Hence are (1 – a)100% confidence limits for m.
Example Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases. The Data: The summary statistics:
95% Confidence Intervals (use a = 0.05) 95% Confidence Limits
Confidence Limits for s2 the variance of a Normal population Let x1, … , xn denote a sample from the normal distribution with mean m and variance s2. Both m and s2 are unknown. Recall The statistic U satisfies the conditions for a pivotal statistic for estimating s2.
U has a known distribution, the c2-distribution with n -1 df. only depends on the unknown parameter s2. depends on the data through the sufficient statistics
Critical Values for the c2–distribution with n df Definition The a-upper critical values for the c2–distribution with n df is the quantity such that c2–distribution with n df
Note: and
Confidence limits for s2 and s. thus
hence and
hence is a (1 – a)100 % confidence interval for s2. and is a (1 – a)100 % confidence interval for s.
Example Let x1, x2, x3 , x4, x5, x6 denote weight loss from a new diet for n = 6 cases. The Data: The summary statistics:
(1 – a)100 % confidence interval for s2. Using a = 0.05 Thus 95 % confidence interval for s2 are:
(1 – a)100 % confidence interval for s. Using a = 0.05 Thus 95 % confidence interval for s are: