Estimates and Sample Sizes

Estimates and Sample Sizes
Chapter 7

X신문에서 한미 FTA에 대해 조사하였다. 성인 829명의 여론조사를 통해 51%가 한미 FTA가 불필요한 것이라고 말했다면 이 결과가 얼마나 정확한 것일까? (즉 정확하게 우리나라 국민의 의견을 대변한 것일까?) Sample size 829명이 의미 있는 결과를 말해주기에 충분한 숫자인가?

Global Warming → the retreat of glaciers, the reduction in the Arctic region, a rise in sea levels.
Is there a solid evidence that the average temrperature on earth has been increasing over the past few decades? 70% of 1501 randomly selected adults said “yes.” How can the poll results be used to estimate the percentage of all adults who believe that the earth is getting warmer? How accurate is the result of 70% likely to be? Number of Trials(1501번이 의미있는 size인가)? Does the method of selecting the people to be polled have much of an effect on the results?

Overview 2. Estimating a population proportion a population mean: σ Known a population mean: σ Not Known 3. Estimating a population variance

1. Overview

Overview This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics involve the use of sample data to (1) estimate the value of a population parameter, (2) test hypothesis about a population.

Overview This chapter introduces methods for estimating values of the important population parameters: proportions, means, and variances. Also presents methods for determining sample sizes necessary to estimate those parameters.

Keywords Point Estimate (점추정) Confidence Interval (신뢰구간)
Critical Value (임계치) Margin of Error (표본오차) Sample Size (표본크기)

2. Estimating a Population Proportion (여론조사 시 이용)

목적 앞의 신문의 여론조사 결과를 분석하기 위해 829명의 자료를 추정(point estimation)한 후, Confidence Interval(신뢰구간)을 통해서 추정한 결과를 얼마나 신뢰할 수 있는지를 알아본다.

Assumptions 1. The sample is a simple random sample.
2. The conditions for the binomial distribution are satisfied (See Section 5-3.) 3. The normal distribution can be used to approximate the distribution of sample proportions because np≥5 and nq≥5 are both satisfied.

Notation p = population proportion (모집단 비율)
=sample proportion of x successes in a sample of size n (표본집단 비율) =sample proportion of failures in a sample of size n

Point Estimate A point estimate is a single value (or point) used to approximate a population parameter.  The best estimate of p is Because consists of a single value, it is called a point estimate. (e.g.) The point estimate of p 여론조사: 0.51 Global Warming: 0.7

The proportion 0.51 was our best point estimate of the population p, but we have no indication of just how good our best estimate was. If we had a sample size of 20, and 12 are opposed to FTA, a point estimate of p is 12/20 = 0.6, but we would not expect this point estimate to be very good because it is based on such a small sample. Thus, instead of having just a single value, statisticians developed the range of values, called a confidence interval.

Confidence Interval (CI)
A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter.

Confidence Interval A confidence interval is associated with a confidence level, such as 0.95 (or 95%). The confidence level = _________________ : for a 0.95 (or 95%) confidence level,  = 0.05. The most common choices of confidence level: 90%, 95%, and 99%. The choice of 95% is most common because it provides a good balance b/w precision and reliability.

Confidence Interval In the example, 51% of the sample data of 829 are opposed to FTA The 0.95 (or 95%) confidence interval estimate of the population p is < p <

Interpreting Confidence Interval
“We are 95% confident that the interval from to does contain the true value of p.” This means if we were to select many different samples of size 829 and construct the corresponding confidence intervals, 95% of them would actually contain the value of the population proportion p. “There is a 95% chance that the true value of p will fall between and

Interpreting Confidence Interval

Confidence Interval

Critical Values A critical value is the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur. A critical value: a number z/2 = Z score separates an area of /2 in the right tail of the standard normal distribution.

Critical Values Critical value distinguish between sample statistics that are likely occur and those that are unlikely. -Z/2 The z scores are referred to as ________________

Example(p.332) : Finding a Critical Value
Find the critical value zα/2 corresponding to a 95% confidence level. A 95% confidence level corresponds to =0.05. Each of the red-shaded tails is  /2=0.025. The area to its left must be =0.975, results in zα/2 = 1.96. Therefore, for a 95% confidence level, the critical value is zα/2 = 1.96.

 = 5%  2 = 2.5% = .025

Use Table A-2 to find a z score of 1.96
 = 0.05 Use Table A-2 to find a z score of 1.96 z2 =  

Margin of Error (표본오차) For the survey data (51% of 829 respondents opposed to FTA), we can calculate the sample proportion and that sample proportion is typically different from the population proportion p. The difference between the sample proportion and the population proportion can be thought of as an error.

Margin of Error for Proportion (여론조사)

Margin of Error Confidence Interval
There is a probability of 1 -  that a sample proportion will be in error by no more than E. There is a probability of  that the sample proportion will be in error by more than E. Confidence Interval

Procedure for Constructing a Confidence Interval for p
1. Verify that the required assumptions are satisfied. (The sample is a simple random sample, the conditions for the binomial distribution are satisfied, and the normal distribution can be used to approximate the distribution of sample proportions because np≥5, and nq≥5 are both satisfied). 2. Refer to Table A-2 and find the critical value zα/2that corresponds to the desired confidence level. 3. Evaluate the margin of error

Procedure for Constructing a Confidence Interval for p
4. Using the calculated margin of error, E, and the value of the sample proportion, , find the values of – E and + E. Substitute those values in the general format for the confidence interval:

Example 3 p.334 In the Chapter Problem we noted that a Pew Research Center poll of 1501 randomly selected adults showed that 70% of the respondents believe in global warming. The sample results are n = 1501, and a Find the margin of error E that corresponds to a 95% confidence level. b. Find the 95% confidence interval estimate of the population proportion p. c. Based on the results, can we safely conclude that the majority of adults believe in global warming? d. Assuming that you are a newspaper reporter, write a brief statement that accurately describes the results and includes all of the relevant information. See my note.

check for assumptions: __________________ and __________________
a) Find the margin of error E that corresponds to a 95% confidence level. check for assumptions: __________________ and __________________ Find the Margin of Error: n =1501, p = 0.7 q = 0.3 See my note.

b) Find the 95% confidence interval estimate of the
population proportion p. See my note. “The success rate is 70% with a margin of error of 2.3% point.”

c) Based on the results, can we safely conclude that the majority of adults believe in global warming? It appears that the proportion of adults who believe in global warming is greater than 0.5 (or 50%), so we can safely conclude that the majority of adults believe in global warming. Because the limits of and are likely to contain the true population proportion, it appears that the population proportion is a value greater than 0.5.

d) Assuming that you are a newspaper reporter, write a brief statement that accurately describes the results and includes all of the relevant information. Here is one statement that summarizes the results: 70% of adults believe that the earth is getting warmer. That percentage is based on a Pew Research Center poll of 1501 randomly selected adults. In theory, in 95% of such polls, the percentage should differ by no more than 2.3 percentage points in either direction from the percentage that would be found by interviewing all adults.

Determining Sample Size
How do we know how many sample items must be obtained?

When an estimate of p is known: ˆ ˆ When no estimate of p is known:

Example 4 p.336 The Internet is affecting us all in many different ways, so there are many reasons for estimating the proportion of adults who use it. Assume that a manager for E-Bay wants to determine the current percentage of adults who now use the Internet. How many adults must be surveyed in order to be 95% confident that the sample percentage is in error by no more than three percentage points? a. In 2006, 73% of adults used the Internet. b. No known possible value of the proportion.

a) Use To be 95% confident that our sample percentage is within three percentage points of the true percentage for all adults, we should obtain a simple random sample of 842 adults.

b) Use To be 95% confident that our sample percentage is within three percentage points of the true percentage for all adults, we should obtain a simple random sample of 1068 adults.

Class Talk Suppose a sociologist wants to determine the current percentage of households using . How many households must be surveyed in order to be 95% confident that the sample percentage is in error by no more than four percentage points? Use this result from an earlier study: In 1997, 16.9% of U.S. households used . Assume that we have no prior information suggesting a possible value of p.

2. Estimating (ii) a Population Mean :  Known

Assumptions The sample is a simple random sample.
The value of the population standard deviation σ is known. Either or both of these conditions is satisfied: The population is normally distributed or n> 30.

The sample proportion is the best point estimate of the population p.
For similar reasons, the sample mean is the best point estimate of the population mean .

Confidence Interval Estimate of the Population Mean  with known 

Margin of Error The difference between the sample mean and the population mean  is called the margin of error. E = the margin of error for mean based on known .

Example 1 (p.347): Weight of Men
People have died in boat and aircraft accidents because an obsolete estimate of the mean weight of men was used. In recent decades, the mean weight of men has increased considerably, so we need to update our estimate of that mean so that boats, aircraft, elevators, and other such devices do not become dangerously overloaded. Using the weights of men from Data Set 1 in Appendix B, we obtain these sample statistics for the simple random sample: n = 40 and = lb. Research from several other sources suggests that the population of weights of men has a standard deviation given by  = 26 lb.

Find the best point estimate of the mean weight of the population of all men.
The sample mean of lb is the best point estimate of the mean weight of the population of all men. b. Construct a 95% confidence interval estimate of the mean weight of all men. c. What do the results suggest about the mean weight of lb that was used to determine the safe passenger capacity of water vessels in 1960 (as given in the National Transportation and Safety Board safety recommendation M-04-04)?

Construct the confidence interval.
b Construct a 95% confidence interval estimate of the mean weight of all men. A 95% confidence interval or 0.95 implies  = 0.05, so z/2 = 1.96. Calculate the margin of error. Construct the confidence interval.

c. What do the results suggest about the mean weight of 166
c What do the results suggest about the mean weight of lb that was used to determine the safe passenger capacity of water vessels in 1960 (as given in the National Transportation and Safety Board safety recommendation M-04-04)? Based on the confidence interval, it is possible that the mean weight of lb used in 1960 could be the mean weight of men today. However, the best point estimate of lb suggests that the mean weight of men is now considerably greater than lb. Considering that an underestimate of the mean weight of men could result in lives lost through overloaded boats and aircraft, these results strongly suggest that additional data should be collected. (Additional data have been collected, and the assumed mean weight of men has been increased.)

2. Estimating (iii) a Population Mean :  “Not” Known

Either the sample is from a normally distributed population, or n> 30. If  is not known, but the assumptions are satisfied, then use Student t distribution

Because we do not know the value of , we estimate it with the value of the sample standard deviation s, but this introduces another source of unreliability, especially with small samples. In order to keep a confidence interval at some desired level, such as 95%, we compensate for this additional unreliability by making the confidence interval wider. Have to use critical values larger than the critical value z/2  ________

Student t distribution
If the distribution of a population is essentially normal, then the distribution of is essentially a Student t distribution for all samples of size n, and is used to find critical values denoted by tα/2.

To find the critical value tα/2 in Table A-3, we have to find the appropriate number of degree of freedom in the left column. Degree of freedom corresponds to the number of sample values that can vary after certain restrictions have been imposed on all data values.

Degrees of Freedom (df)
If 10 students have quiz scores with a mean of 80, we can freely assign values to the first 9 scores, but the 10the score is then determined. The sum of the 10 scores must be 800, so the 10th score = 800 – sum (the first 9 scores). Because those first 9 scores can be freely selected to be any values, we say that there are 9 degrees of freedom available. The 10th score must be fixed. Hence, df = 10 – 1, where 1 is the 10th score. See my note for the example p table 찾는 방법.

Margin of Error E for the Estimate  with  Not Known

Confidence Interval for the Estimate  with  Not Known

Example 2 p.358 A common claim is that garlic lowers cholesterol levels. In a test of the effectiveness of garlic, 49 subjects were treated with doses of raw garlic, and their cholesterol levels were measured before and after the treatment. The changes in their levels of LDL cholesterol (in mg/dL) have a mean of 0.4 and a standard deviation of Use the sample statistics of n = 49, = 0.4 and s = 21.0 to construct a 95% confidence interval estimate of the mean net change in LDL cholesterol after the garlic treatment. What does the confidence interval suggest about the effectiveness of garlic in reducing LDL cholesterol?

Requirements are satisfied: simple random sample and n = 49 (i. e
Requirements are satisfied: simple random sample and n = 49 (i.e., n > 30). 95% implies a = With n = 49, the df = 49 – 1 = 48 Closest df is 50, two tails, so t/2 = 2.009 Using t/2 = 2.009, s = 21.0 and n = 49 the margin of error is:

Construct the confidence interval:
We are 95% confident that the limits of –5.6 and 6.4 actually do contain the value of , the mean of the changes in LDL cholesterol for the population. Because the confidence interval limits contain the value of 0, it is very possible that the mean of the changes in LDL cholesterol is equal to 0, suggesting that the garlic treatment did not affect the LDL cholesterol levels. It does not appear that the garlic treatment is effective in lowering LDL cholesterol.

Class Talk A study found the body temperatures of 106 healthy adults. The sample mean was 98.2 degrees and the sample standard deviation was 0.62 degrees. (a) Find the margin of error E (b) Find the 95% confidence interval for µ.

Important Properties of Student t distribution
1. The t distribution is different for different sample sizes

2. The Student t distribution has the same general symmetric bell shape as the normal distribution but it reflects the greater variability (with wider distributions) that is expected with small samples. 3. The Student t distribution has a mean of t = 0 (just as the standard normal distribution has a mean of z= 0).

The standard deviation of the Student t distribution varies with the sample size and is greater than 1 (unlike the standard normal distribution, which has a σ = 1). 5. As the sample size n gets larger, the Student t distribution gets closer to the normal distribution.

Using the Normal and t Distribution

3. Estimating a Population Variance

2. The population must have normally distributed values (even if the sample is large).

Chi-Square Distribution

Properties of the Distribution of the Chi-square Statistic
The chi-square distribution is not symmetric, unlike the normal and Student t distributions. All values are nonnegative.

2. As the number of degrees of freedom increases, the distribution becomes more symmetric and approaches a normal distribution. Chi-Square Distribution for df = 10 and df = 20

Properties of the Distribution of the Chi-square Statistic
3 In Table A-4, each critical value of 2 corresponds to an area given in the top row of the table, and that area represents the cumulative area located to the right of the critical value.

Example p.372 Find the critical values of that determine critical regions containing an area of in each tail. Assume that the relevant sample size is 10 so that the number of degrees of freedom is 10 –1, or 9. α= 0.05 α/2= 0.025 1 − α/2= 0.975

For a sample of 10 values taken from a normally distributed population, the chi-square statistic 2 = (n – 1)s2/2 has a 0.95 probability of falling between the chi-square critical values of and

Estimators of 2 The sample variance s2 is the best point estimate of the population variance 2.

Confidence Interval for 2
Although s2 is the best point estimate, there is no indication how good it is.

Example p.372 The proper operation of typical home appliances requires voltage levels that do not vary much. Listed below are ten voltage levels (in volts) recorded in the author’s home on ten different days. These ten values have a standard deviation of s = 0.15 volt. Use the sample data to construct a 95% confidence interval estimate of the standard deviation of all voltage levels.

n = 10 so df = 10 – 1 = 9 Use table A-4 to find: Construct the confidence interval: n = 10, s = 0.15

Evaluation the preceding expression yields:
Finding the square root of each part (before rounding), then rounding to two decimal places, yields this 95% confidence interval estimate of the population standard deviation: Based on this result, we have 95% confidence that the limits of 0.10 volt and 0.27 volt contain the true value of .

Class Talk A study found the body temperatures of 106 healthy adults. The sample mean was 98.2 degrees and the sample standard deviation was 0.62 degrees. Find the 95% confidence interval for σ. n = 106 = 98.2 s = 0.62  = 0.05 /2 = 0.025 1 – /2 = 0.975

Instead of presenting the much more complex procedures for determining sample size necessary to estimate the population variance, this table will be used.

Example 3 p.377 We want to estimate , the standard deviation of all voltage levels in a home. We want to be 95% confident that our estimate is within 20% of the true value of . How large should the sample be? Assume that the population is normally distributed.

Find the margin of error for the 95% confidence interval used to estimate the population proportion with n = 163 and x = 96. 0.0680 0.0755 0.132

459 randomly selected light bulbs were tested in a laboratory, 291 lasted more than 500 hours. Find a point estimate of the true proportion of all light bulbs that last more than 500 hours. 0.632 0.366 0.388 0.634

Use the confidence level and sample data to find the margin of error E.
College students’ annual earnings: 99% confidence, n = 74 x = $3967, σ = $874 $262 $9 $237 $1187

You want to be 95% confident that the sample variance is within 40% of the population variance. Find the appropriate sample size. 224 11 14 56

To find the standard deviation of the diameter of wooden dowels, the manufacturer measures 19 randomly selected dowels and finds the standard deviation of the sample to be s = Find the 95% confidence interval for the population standard deviation σ. 0.13 < σ < 0.22 0.12 < σ < 0.24 0.11 < σ < 0.25 0.15 < σ < 0.21

Estimates and Sample Sizes

Similar presentations

Presentation on theme: "Estimates and Sample Sizes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Estimates and Sample Sizes

Similar presentations

Presentation on theme: "Estimates and Sample Sizes"— Presentation transcript:

Similar presentations

About project

Feedback