Download presentation
Presentation is loading. Please wait.
Published byLambert Little Modified over 8 years ago
1
William Christensen, Ph.D.
2
Using Confidence Intervals to Estimate Population Parameters Do you understand what a “population parameter” is? We use the word “parameter” as a general way to describe one or all of the characteristics of a population such as; average/mean, proportion, standard deviation and variance. In the “real world” we usually do not know the true population parameters (such as the mean) because it is too expensive and time consuming to collect data on every member of a population. Therefore, we most often use sample data to estimate things about the population like mean and standard deviation (population parameters)
3
Using Confidence Intervals to Estimate Population Parameters The most common method for using sample data to estimate a population parameter is to create Confidence Intervals Basically, a confidence interval allows us to say something like this: “We are 90% confident that the true population mean is between 24.5 and 27.8.” Here you can see the two main parts of a confidence interval: 1. A level of confidence, such as 90% or 95% or 99%. By the way, there is no 100% confidence level in statistics 2. A range of values. Using the Confidence Interval methods you are about to learn, we will establish a range of values that we think the true population parameter falls between
4
Using Confidence Intervals to Estimate Population Parameters Remember: the whole reason for calculating confidence intervals is that we usually only have sample data which is only a small subset of the population we are interested in. Since there are probably some differences between our sample data (which we have) and the true population data (which we don’t have), we need to be able to estimate what the true population parameters are. Confidence Intervals allow us to use our sample data to estimate the true population parameters, such as mean and standard deviation.
5
Using Confidence Intervals to Estimate Population Parameters In this Section you will learn to create confidence intervals to estimate the following population parameters: Confidence intervals for a Population Mean When our sample size is large (more than 30) You will also learn how to calculate the sample size that would be necessary to estimate a population mean with a given level of accuracy Confidence intervals for a Population Mean When our sample size is small (less than or equal to 30) Confidence intervals for a Population Proportion A proportion is kind of like a mean, but expressed as a probability (between 0 and 1) or percentage Confidence intervals for a Population Variance and/or Standard Deviation
6
Point Estimate of Population Parameter Without a confidence interval, the best estimate (Point Estimate) of a population parameter is simply whatever we calculate from the sample data. For example, if we have a sample of women’s weights with a mean of 143 lbs., then this is the best “Point Estimate” we have of the true population mean. Confidence Intervals allow us to create a better estimate
7
Two Parts of a Confidence Interval Let’s re-visit the two main parts of a confidence interval: 1. A level of confidence, such as 90% or 95% or 99%. The 3 most common confidence levels are 90%, 95%, and 99% Associated with any confidence level is a value called alpha ( ) is simply the difference between the confidence level and 1 For a confidence level of 90%, = 0.10 For a confidence level of 95%, = 0.05 For a confidence level of 99%, = 0.01 You can view a confidence level as the chance we are right about our confidence interval, and (alpha) as the chance that we are wrong
8
Two Parts of a Confidence Interval 2. A range of values. A Confidence Interval is defined as a range (or an interval) of values used to estimate the true value of the population parameter. The correct form for expressing the range or interval of values is: Lower # population parameter Upper # Note: the population parameter must always be expressed by using the appropriate symbol: (mu) for population mean (sigma) for population standard deviation 2 (sigma squared) for population variance p for population proportion Example: 24.3 27.8
9
Confidence Intervals for Population Means in large samples (n > 30)
10
Estimating Population Means Calculating the Lower & Upper Limits Lower # µ Upper # x - Ex + E We calculate the lower and upper limits of a confidence interval for a population mean by taking the sample mean (x-bar) - / + the margin of error (E). Where: E = z α/2 σ n Before proceeding to use this formula, let’s learn a little more about this z α/2, or what is called the CRITICAL VALUE
11
The Critical Value z α/2 z=0 z α/2 - z α /2 α /2 The Critical Value z α/2 is a z-score and number that separates an area α/2 in the each (left and right) tail of the standard normal distribution.
12
-z α/2 zα/2zα/2 95%.95.025 α/ 2 = 2.5% =.025 α = 5% The Critical Value zα/2 The critical value +/- z α/2 sets apart the area or probability for our confidence interval. In this case, we are looking for a 95% confidence interval, so α = 0.05 and α/2 = 0.025
13
-z α/2 z α/2 95%.95.025 The Critical Value zα/2 There are two ways to determine the z α/2 critical value(s). Note that we have +/- z α/2. These are the same number, only one is positive and the other negative. Therefore, if we find one, we know the other by just changing the sign.
14
The Critical Value z α/2 There are two ways to determine z α/2 1. The first and easiest way is to use the Excel function we already learned, NORMSINV(probability) Example: to find the z α/2 critical value for a 90% confidence interval: If the confidence level is 90% then we know alpha = 0.10 (the difference between the confidence level and 1) Therefore, α/2 = 0.10 / 2 = 0.05 Using Excel =NORMSINV(0.05) we get an answer of negative 1.64485 (this is the left-side critical value). The right-side critical value is simply +1.64485 (just change the – to +), or you could calculate the right-side critical value by =NORMSINV(0.95) = 1.64485. 2. The second method is the old-style or traditional method which involves looking up z α/2 in a “normal distribution” table. Since tables are not always available, I suggest you stick with the Excel method
15
Estimating Population Means Calculating the Lower & Upper Limits Lower # µ Upper # x - Ex + E E = z α/2 σ n Now that we understand z α/2 and how to use Excel to find its value, we should be able to construct a confidence interval for a population mean. Important note: In this formula, we only need the positive value of zα/2, NOT the negative value. We take the negative value into account later when we subtract E from the sample mean to calculate the Lower# for the confidence interval
16
Estimating Population Means Calculating the Lower & Upper Limits EXAMPLE: Given a sample of 50 women in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean Confidence Interval for a population mean: (sample mean – E) µ (sample mean + E), where E = zα/2*(σ/sqrt(n)) To solve this problem we must first calculate E (margin of error). The formula for margin of error is: E = zα/2 * (σ / sqrt(n)) Our sample data already provided us the info that s = 29 lbs, and n=50 women, so the only thing missing is to find zα/2. With a confidence level of 95%, we know α = 0.05, so α/2 = 0.025. Using the Excel function NORMSINV We calculate zα/2 as follows: =NORMSINV(0.025) = -1.96 (we ONLY use the positive value in the formula for calculating E)
17
Estimating Population Means Calculating the Lower & Upper Limits EXAMPLE: Given a sample of 50 women in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean Now the we have all the pieces, we can solve for E and then construct the confidence interval. E = zα/2 * (σ / sqrt(n)) = 1.96 * (29 / sqrt(50)) = 1.96 * 4.1012 = 8.04 lbs. Finally, knowing E we can construct our confidence interval as follows: (sample mean – E) µ (sample mean + E) (143 – 8.04) µ (143 + 8.04) 134.96 µ 151.04 We did it. This is the correct form for a confidence interval. We can read this as follows: we are 95% confident that for womens’ weights the true population mean is between 134.96 lbs. And 151.04 lbs.
18
Estimating Population Means Calculating the Lower & Upper Limits PRACTICE, PRACTICE, PRACTICE: You must know how to do and interpret all kinds of confidence interval problems. For confidence intervals that estimate population means of large sample (sample size greater than 30), here are some sample problems. Practice constructing 90%, 95%, and 99% confidence intervals for the population means. 1.A sample of 54 bears in Yellowstone National Park has a mean weight of 182.9 lbs., with a standard deviation of 121.8 lbs. 2.A study of hospital costs among 40 automobile accident victims who were wearing seat belts showed an average hospital cost $9000 with a standard deviation of $5600. 3.Use other data from the data sets provided for this course to create and solve your own problems
19
Calculating E When σ Is Unknown You may have noticed in the sample problems that we did that the formula for E includes σ, which is actually the population standard deviation, NOT the sample standard deviation. However, it is OK to use the sample standard deviation (s) in place of σ, when the population standard deviation is not known (and it usually is not known)
20
1. Find the critical value z α/2 using Excel NORMSINV 2. Calculate E, the margin of error: E = z α/2 * σ/sqrt(n) If σ (population standard deviation) is unknown, use s (sample standard deviation) 3. Find the lower and upper limits of the confidence interval (x - E and x + E). Use the correct form: 4. Round the final answer (do not round intermediate calculations) to one more decimal place than is used in the original sample data SUMMARY - Procedure for Constructing a Confidence Interval for µ ( Based on a Large Sample: n > 30 ) Lower# µ Upper#
21
Confidence Intervals for Population Means in small samples (n 30)
22
Confidence intervals for population means when we have small samples (n 30) is very similar to what we just learned about large samples (n 30) We still calculate the confidence interval as: (sample mean – E) µ (sample mean + E) The main difference is that we now have a slightly different formula for E (margin of error) For small samples (defined as less than or equal to 30): (where t α/2 has n - 1 degrees of freedom) degrees of freedom is discussed a little later n E = t s α/2α/2 Estimating Population Means for small samples Use σ (population std. dev.) if available. Otherwise, use s (sample std. dev.)
23
The Critical Value t α/2 t α/2 is similar to a z α/2, but rather than coming from the standard normal distribution, it comes from a distribution called the “student t distribution”. It should make sense to you that when we have a smaller sample from which to estimate the population mean, our estimate cannot be as accurate as when we have a large sample. The t α/2 value adjusts our E (margin of error) to account for our smaller sample size
24
The Critical Value t α/2 There are two ways to determine t α/2 1. The first and easiest way is to use a new Excel function =TINV(probability,deg_freedom) which we will discuss in more detail in just a minute. 2. The second method is the old-style or traditional method which involves looking up t α/2 in a “student t distribution” table. Since tables are not always available, I suggest you stick with the Excel method
25
Using TINV to find the Critical Value t α/2 To use Excel’s TINV function to find the critical value t α/2 we must input 2 things 1. The area or probability, which is represented by α (alpha) – (NOT α / 2). This means we simply put in the value of alpha 2. Degrees of freedom. This is how we adjust for our small sample. Degrees of Freedom is the sample size (n) minus 1. df = n -1. Example: for a small sample of 25, df = 25 – 1 = 24 Example: for a sample of 11, degrees of freedom (df) = 10
26
Using TINV to find the Critical Value t α/2 Here is an example where we have a 95% confidence interval (α = 0.05) and a sample size of 20. Probability is represented by α (NOT α / 2) when using TINV (assumes two-tail test) df = n – 1 = 20 – 1 = 19
27
EXAMPLE: Given a sample of 15 women (a small sample) in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean Confidence Interval for a population mean: (sample mean – E) µ (sample mean + E), where E = tα/2*(s/sqrt(n)) To solve this problem we must first calculate E (margin of error). Our formula for margin of error is: E = tα/2 * (s / sqrt(n)) Our sample data already provided us the info that s = 29 lbs, and n=15 women, so the only thing missing is to find tα/2. With a confidence level of 95%, we know α = 0.05. UNLIKE when calculating zα/2, we DO NOT need to divide α/2 when using TINV because the TINV function automatically assumes there are two tails, or that the alpha is split evenly between the left and right sides. Thus, using the Excel function TINV we calculate tα/2 as follows: =TINV(0.05,14) = 2.1448 (Note that df = n-1 = 15-1 = 14. Also note that TINV always returns a positive value which we can input directly into our margin of error formula) Estimating Population Means for small samples
28
EXAMPLE: Given a sample of 15 women in which we find an average or mean weight of 143 lbs., with a standard deviation of 29 lbs., construct a 95% confidence interval for the population mean Now the we have all the pieces, we can solve for E and then construct the confidence interval. E = tα/2 * (s / sqrt(n)) = 2.1448 * (29 / sqrt(15)) = 2.1448 * 7.4877 = 16.06 lbs. Finally, knowing E we can construct our confidence interval as follows: (sample mean – E) µ (sample mean + E) (143 – 16.06) µ (143 + 16.06) 126.94 µ 159.06 We did it. This is the correct form for a confidence interval. We can read this as follows: given our sample of 15 women, we are 95% confident that the true population mean of womens’ weights is between 126.94 lbs. And 159.06 lbs. You might notice that this estimate is not as accurate as our estimate when our sample size was 50. That’s indeed how it works – smaller samples yield less-precise estimates of population means
29
Estimating Population Means for small samples PRACTICE, PRACTICE, PRACTICE: You must know how to do and interpret all kinds of confidence interval problems. For confidence intervals that estimate population means of small samples (n 30), here are some sample problems. Practice constructing 90%, 95%, and 99% confidence intervals for the population means. 1.A sample of 24 bears in Yellowstone National Park has a mean weight of 182.9 lbs., with a standard deviation of 121.8 lbs. 2.A study of hospital costs among 20 automobile accident victims who were wearing seat belts showed an average hospital cost $9000 with a standard deviation of $5600. 3.Use other data from the data sets provided for this course to create and solve your own problems
30
Determining the Sample Size required for a given margin or error (E)
31
Sometimes we want to determine in advance how much error (i.e., E or the margin or error) we are willing to have in our estimate of the population mean In fact, we can obtain whatever margin of error we would like, IF we are willing and able to adjust our sample size The relationship between sample size and margin of error is illustrated on the next slide Determining Sample Size given E
32
(solve for n by algebra) Sample Size for Estimating Mean µ Using our basic formula for calculating E (margin of error) we can also find n, given E Where: z α/2 is based on the desired level of confidence E = desired margin of error Use σ if available, otherwise us s (sample std. dev.) z α/2 E = σ n E n = 2 z α/2 σ
33
Round-Off Rule for Sample Size n When finding the sample size n, if the calculated n does not result in a whole number, always increase the value of n to the next larger whole number. n = 116.009 = 117 (round up)
34
Example: If we want to estimate the true population mean IQ for statistics students, how many statistics students would we need to test (i.e., what sample size is needed) so that our estimate is within 2 IQ points of the true population mean with a confidence level of 95%? From previous studies, we believe a conservative estimate of the population standard deviation (σ) is 15. α = 0.05 z α/2 = NORMSINV(.025) = 1.96 E (desired margin of error) = 2 σ = 15 = 216.09 = 217 students n = z α/2 σ = (1.96)(15) E 2 22 We would need to randomly select 217 statistics students and obtain their IQ scores. We would then be 95% confident that our sample mean would be within 2 IQ points of the true mean IQ score for the entire population of statistics students. Determining Sample Size given E
35
In determining sample size (given a desired margin of error) we have assumed that some value or estimate of the population standard deviation (σ) is available. However, many times we have no estimate of σ. In such cases, we have three alternatives: 1. Use the range rule of thumb to estimate a standard deviation as follows: est. standard deviation ≈ range / 4 2. Conduct a pilot study by starting the sampling process. Based on the first collection of at least 31 randomly selected sample values, calculate the sample standard deviation s and use it in place of σ. That value can be refined as more sample data are obtained. 3. Estimate the value of σ by using the results of some other study that was done earlier Determining Sample Size given E
36
Confidence Intervals for Population Proportion
37
Estimating Population Proportion Often we are interested in being able to estimate a population “proportion” Proportion is kind of like an average, but is expressed as a probability (p) or percentage For example, we might want to estimate what proportion (%) of households in the U.S. who are watching the Olympics on television
38
We use the following notation to express the confidence interval or estimate for a population proportion: Lower# p Upper# “p” represents the true population proportion We use the symbol p (p-hat) to represent the sample proportion. Another symbol, q is defined as p -1 ˆ ˆ ˆ Estimating Population Proportion
39
Confidence Interval for Population Proportion p - E p p + E where ˆ ˆ Zα/2 is again found using Excel function NORMSINV(probability) where “probability” is α / 2 and the absolute value of the result is used in the above formula Round the confidence interval limits to three significant digits. p q E = z α/2 n ˆˆ
40
Example: Example: The CBS television show 60 Minutes has a share of 20, meaning that among the TV sets in use, 20% are typically tuned to 60 Minutes (based on Nielsen Media Research data). Assume a sample size of 4,000 (typical for Nielsen surveys). Construct a 97% confidence interval estimate of the population proportion (the proportion of all TV sets in the U.S. tuned to 60 Minutes). p - E p p + E where ˆ ˆ 1.α = 0.03 (97% confidence level) and α/2 = 0.03/2 = 0.015 Thus, zα/2 is found using Excel =NORMSINV(0.015) = - 2.17 (we take the absolute value which is 2.17) 2.p-hat is given as 20% or 0.20. q-hat is simply 1 - p-hat = 1 – 0.20 = 0.80 (remember: p-hat + q-hat = 1 always) Thus, E = 2.17 * sqrt ((0.20*0.80) / 4000) = 2.17 *.0063245 = E = 0.0137241 note: often we are dealing with very small numbers – do not round any intermediate calculations – wait until we have the confidence interval limits to round to 3 significant digits) E = z α/2 p q n ˆˆ
41
Example: Example: The CBS television show 60 Minutes has a share of 20, meaning that among the TV sets in use, 20% are typically tuned to 60 Minutes (based on Nielsen Media Research data). Assume a sample size of 4,000 (typical for Nielsen surveys). Construct a 97% confidence interval estimate of the population proportion (the proportion of all TV sets in the U.S. tuned to 60 Minutes). 1.Finally, our confidence interval is: p-hat – E p p-hat + E 0.20 – 0.0137241 p 0.20 + 0.0137241 0.186 p 0.214 (rounded to 3 significant digits) With this large sample size of 4,000, the margin or error is quite small and we can be 97% confident that the population proportion of all TV viewers in the U.S. tuned to 60 Minutes varies only between 0.186 (18.6%) and 0.214 (21.4%). p - E p p + E where ˆ ˆ E = z α/2 p q n ˆˆ
42
Determining Sample Size when estimating p, given desired E z α/2 E = p q n ˆ ˆ (solving for n by algebra) ˆ p q ( z α/2 ) 2 n =n = ˆ E2E2 Note: If p-hat (the sample proportion) is unknown, use a p-hat = 0.50
43
= [1.645] 2 (0.289)(0.711) n = [z α/2 ] 2 p q E 2 = 347.5195 = 348 households Example: Example: We want to determine, with a margin of error of four percentage points (4%), the current percentage of U.S. households using e-mail. Assuming that we want 90% confidence in our results, how many households must we survey? A recent study indicates 28.9% of U.S. households used e- mail. To be 90% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 348 households. 0.04 2 ˆˆ Use absolute value of the Excel function NORMSINV(probability), with probability = α / 2 = 0.10/2 = 0.05
44
= [1.96] 2 (0.5)(0.5) n = [z α/2 ] 2 p q E 2 = 600.25 = 601 households Example: Example: We want to determine, with a margin of error of four percentage points (4%), the current percentage of U.S. households using e-mail. Assuming that we want 95% confidence in our results, how many households must we survey? We have no idea what percent of U.S. households may be using email (i.e., p is unknown, therefore we should us p = 0.50) To be 95% confident that our sample percentage is within four percentage points of the true percentage for all households, we should randomly select and survey 601 households. 0.04 2 ˆˆ Use absolute value of the Excel function NORMSINV(probability), with probability = α / 2 = 0.05/2 = 0.025 Note: There is an important relationship between margin of error and sample size. That is, to reduce margin of error by half, sample size must be increased four times. In other words, a little less error requires a lot bigger sample. You should remember this.
45
Confidence Intervals for Population Variance and Standard Deviation
46
Estimating Population Variance and Standard Deviation The last population parameters that we will learn how to estimate are Variance and Standard Deviation Variance is simply Standard Deviation squared Thus, Standard deviation is simply the square root of Variance The bad news is that we need a new type of distribution and critical value in order to estimate population variance and standard deviation This new distribution is called the Chi-squared ( 2 ) distribution (see next slide for a density curve graph of the Chi-squared distribution)
47
Properties of the Distribution of the Chi-Square Statistic The chi-square distribution is not symmetric, unlike the normal and Student t distributions. As the number of degrees of freedom increases, the distribution becomes more symmetric The values of chi-squared can be zero or positive, but they cannot ever be negative 0 510 15 202530354045 Chi-Square Distribution for df = 10 and df = 20 df = 10 df = 20 General Chi-Square Distribution All values are nonnegative Not symmetric x2x2 0
48
σ 2 σ 2 Right-tail CV Left-tail CV X (n - 1) s 2 2 L Confidence Interval for the Population Std. Deviation and Variance 2 Note: our Chi-square distribution has one critical value for the left tail and a completely separate critical value for the right-tail X (n - 1) s 2 R 2 σ σ X 2 L X R 2
49
Important: The area or probability associated with a chi-square value is always the area (all the area) to the right of that chi- square value Chi-Square Critical Values These are the two probabilities for which we must find chi- square values 0.025 X L 2 = 2.700 X 2 (df = 9) X R = 19.023 0.975 0.025 2 0 For α = 0.05, there is α/2 = 0.025 in each tail. With 0.025 in the left-tail, there is 0.975 to-the-right of that area. With 0.025 in the right-tail, there is only exactly that much (0.025) to-the- right remaining.
50
Chi-Square Critical Values These are the two probabilities for which we must find chi- square values. We can use the Excel function CHIINV to find these chi-square values 0.025 X L 2 = 2.700 X 2 (df = 9) X R = 19.023 0.975 0.025 2 0
51
Using Excel CHIINV function to find chi-square values In this example with alpha = 0.05 and df = 9 (n = 10), the two values that we plug-in as probabilities in CHIINV are 0.975 and 0.025. We solve these one-at-a-time since we cannot input both of them at once. Degrees of freedom for this example was given as 9, which means n=10 since df = n -1. We use the same df each time
52
Using Excel CHIINV function to find chi-square values Try actually using CHIINV to find the chi-square left and chi-square right values. Chi-square right value (always the larger value) Chi-square left value (always the smaller value)
53
Roundoff Rule for Confidence Interval Estimates of σ or σ 2 1. When using the original set of data to construct a confidence interval, round the confidence interval limits to one more decimal place than is used for the original set of data. 2. When the original set of data is unknown and only the summary statistics ( n, s ) are used, round the confidence interval limits to the same number of decimals places used for the sample standard deviation or variance.
54
σ Confidence Interval for the Population Standard Deviation Example: Example: SAT Match scores were collected from 15 randomly selected women. The mean score is 496, with a standard deviation of 108. Construct a 99% confidence interval for the population standard deviation SAT Match score for all women. (n - 1) s 2 X 2 R X 2 L n = 15 df = n – 1 = 15 – 1 = 14 = df α = 0.01, so α / 2 = 0.005 with area-to-right of left tail = 0.995 and are-to-right of right tail = 0.005. These are the two probabilities for which we need to find chi-square values s = 108 (given in above problem description)
55
Confidence Interval for the Population Standard Deviation Example: Example: SAT Match scores were collected from 15 randomly selected women. The mean score is 496, with a standard deviation of 108. Construct a 99% confidence interval for the population standard deviation SAT Match score for all women. (n - 1) s 2 X 2 R X 2 L 1. Use Excel’s CHIINV to find the chi-square right and chi-square left values σ
56
4.07466 Confidence Interval for the Population Standard Deviation Example: Example: SAT Match scores were collected from 15 randomly selected women. The mean score is 496, with a standard deviation of 108. Construct a 99% confidence interval for the population standard deviation SAT Match score for all women. (n - 1) s 2 X 2 R X σ 2 L 2. Plug-in the values and construct the confidence interval ( 15-1 ) 108 2 31.3194 (15-1)108 2 σ Note: DO NOT square the “chi-square” values
57
Example: Example: SAT Match scores were collected from 15 randomly selected women. The mean score is 496, with a standard deviation of 108. Construct a 99% confidence interval for the population standard deviation SAT Match score for all women. 2. Plug-in the values and construct the confidence interval 4.07466 ( 15-1 ) 108 2 31.3194 (15-1)108 2 σ 4.07466 163296 31.3194 163296 σ 5213.8929 40075.981 σ 72.2 200.2 σ We are 99% confident that the population standard deviation of women’s SAT Math scores is between 72.2 and 2090.2
58
William Christensen, Ph.D.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.