Estimates and Sample Sizes

Slides:



Advertisements
Similar presentations
Confidence Intervals This chapter presents the beginning of inferential statistics. We introduce methods for estimating values of these important population.
Advertisements

Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Section 7-3 Estimating a Population Mean:  Known Created by.
Estimates and sample sizes Chapter 6 Prof. Felix Apfaltrer Office:N763 Phone: Office hours: Tue, Thu 10am-11:30.
Lecture Slides Elementary Statistics Twelfth Edition
7-2 Estimating a Population Proportion
7-3 Estimating a Population Mean
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides Elementary Statistics Tenth Edition and the.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Copyright © 2010, 2007, 2004 Pearson Education, Inc Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 8-5 Testing a Claim About a Mean:  Not Known.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 7 Estimates and Sample Sizes 7-1 Review and Preview 7-2 Estimating.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sections 6-1 and 6-2 Overview Estimating a Population Proportion.
1 Chapter 6. Section 6-1 and 6-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
Lecture Slides Elementary Statistics Eleventh Edition
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-3 Estimating a Population Mean:  Known.
Estimates and Sample Sizes Lecture – 7.4
Chapter 7 Estimates and Sample Sizes
Estimating a Population Proportion
Chapter 7 Estimates and Sample Sizes
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Estimating a Population Mean: σ Known 7-3, pg 355.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. 7-1 Review and Preview 7-2 Estimating a Population Proportion 7-3 Estimating a Population.
1 Chapter 6 Estimates and Sample Sizes 6-1 Estimating a Population Mean: Large Samples / σ Known 6-2 Estimating a Population Mean: Small Samples / σ Unknown.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Sections 7-1 and 7-2 Review and Preview and Estimating a Population Proportion.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Lecture Slides 11 th Edition Chapter 7.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Copyright © 2010, 2007, 2004 Pearson Education, Inc Section 8-2 Basics of Hypothesis Testing.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-5 Estimating a Population Variance.
Estimating a Population Mean
Estimating a Population Mean:  Known
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-4 Estimating a Population Mean:  Not Known.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Chapters 6 & 7 Overview Created by Erin Hodgess, Houston, Texas.
Use of Chebyshev’s Theorem to Determine Confidence Intervals
Chapter 7 Estimates and Sample Sizes 7-1 Overview 7-2 Estimating a Population Proportion 7-3 Estimating a Population Mean: σ Known 7-4 Estimating a Population.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7.4: Estimation of a population mean   is not known  This section.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
SECTION 7.2 Estimating a Population Proportion. Where Have We Been?  In Chapters 2 and 3 we used “descriptive statistics”.  We summarized data using.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture Slides Elementary Statistics Twelfth Edition
Estimating a Population Mean:  Not Known
Chapter Eight Estimation.
Copyright © 2004 Pearson Education, Inc.
Chapter 9: Inferences Involving One Population
Elementary Statistics
Elementary Statistics
Elementary Statistics
Lecture Slides Elementary Statistics Eleventh Edition
Elementary Statistics
Lecture Slides Elementary Statistics Tenth Edition
Chapter 7 Estimates and Sample Sizes
Lecture Slides Elementary Statistics Eleventh Edition
Estimating a Population Mean:  Known
Estimating a Population Mean:  Not Known
Estimates and Sample Sizes Lecture – 7.4
Lecture Slides Elementary Statistics Twelfth Edition
Estimating a Population Variance
Estimates and Sample Sizes Civil and Environmental Engineering Dept.
Presentation transcript:

Estimates and Sample Sizes Chapter 7

X신문에서 한미 FTA에 대해 조사하였다. 성인 829명의 여론조사를 통해 51%가 한미 FTA가 불필요한 것이라고 말했다면 이 결과가 얼마나 정확한 것일까? (즉 정확하게 우리나라 국민의 의견을 대변한 것일까?) Sample size 829명이 의미 있는 결과를 말해주기에 충분한 숫자인가?

Global Warming → the retreat of glaciers, the reduction in the Arctic region, a rise in sea levels. Is there a solid evidence that the average temrperature on earth has been increasing over the past few decades? 70% of 1501 randomly selected adults said “yes.” How can the poll results be used to estimate the percentage of all adults who believe that the earth is getting warmer? How accurate is the result of 70% likely to be? Number of Trials(1501번이 의미있는 size인가)? Does the method of selecting the people to be polled have much of an effect on the results?

Overview 2. Estimating a population proportion a population mean: σ Known a population mean: σ Not Known 3. Estimating a population variance

1. Overview

Overview This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics involve the use of sample data to (1) estimate the value of a population parameter, (2) test hypothesis about a population.

Overview This chapter introduces methods for estimating values of the important population parameters: proportions, means, and variances. Also presents methods for determining sample sizes necessary to estimate those parameters.

Keywords Point Estimate (점추정) Confidence Interval (신뢰구간) Critical Value (임계치) Margin of Error (표본오차) Sample Size (표본크기)

2. Estimating a Population Proportion (여론조사 시 이용)

목적 앞의 신문의 여론조사 결과를 분석하기 위해 829명의 자료를 추정(point estimation)한 후, Confidence Interval(신뢰구간)을 통해서 추정한 결과를 얼마나 신뢰할 수 있는지를 알아본다.

Assumptions 1. The sample is a simple random sample. 2. The conditions for the binomial distribution are satisfied (See Section 5-3.) 3. The normal distribution can be used to approximate the distribution of sample proportions because np≥5 and nq≥5 are both satisfied.

Notation p = population proportion (모집단 비율) =sample proportion of x successes in a sample of size n (표본집단 비율) =sample proportion of failures in a sample of size n

Point Estimate A point estimate is a single value (or point) used to approximate a population parameter.  The best estimate of p is . Because consists of a single value, it is called a point estimate. (e.g.) The point estimate of p 여론조사: 0.51 Global Warming: 0.7

The proportion 0.51 was our best point estimate of the population p, but we have no indication of just how good our best estimate was. If we had a sample size of 20, and 12 are opposed to FTA, a point estimate of p is 12/20 = 0.6, but we would not expect this point estimate to be very good because it is based on such a small sample. Thus, instead of having just a single value, statisticians developed the range of values, called a confidence interval.

Confidence Interval (CI) A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population parameter.

Confidence Interval A confidence interval is associated with a confidence level, such as 0.95 (or 95%). The confidence level = _________________ : for a 0.95 (or 95%) confidence level,  = 0.05. The most common choices of confidence level: 90%, 95%, and 99%. The choice of 95% is most common because it provides a good balance b/w precision and reliability.

Confidence Interval In the example, 51% of the sample data of 829 are opposed to FTA The 0.95 (or 95%) confidence interval estimate of the population p is 0.476 < p < 0.544.

Interpreting Confidence Interval “We are 95% confident that the interval from 0.476 to 0.544 does contain the true value of p.” This means if we were to select many different samples of size 829 and construct the corresponding confidence intervals, 95% of them would actually contain the value of the population proportion p. “There is a 95% chance that the true value of p will fall between 0.476 and 0.544.

Interpreting Confidence Interval

Confidence Interval

Critical Values A critical value is the number on the borderline separating sample statistics that are likely to occur from those that are unlikely to occur. A critical value: a number z/2 = Z score separates an area of /2 in the right tail of the standard normal distribution.

Critical Values Critical value distinguish between sample statistics that are likely occur and those that are unlikely. -Z/2 The z scores are referred to as ________________

Example(p.332) : Finding a Critical Value Find the critical value zα/2 corresponding to a 95% confidence level. A 95% confidence level corresponds to =0.05. Each of the red-shaded tails is  /2=0.025. The area to its left must be 1-0.025=0.975, results in zα/2 = 1.96. Therefore, for a 95% confidence level, the critical value is zα/2 = 1.96.

 = 5%  2 = 2.5% = .025

Use Table A-2 to find a z score of 1.96  = 0.05 Use Table A-2 to find a z score of 1.96 z2 = 1.96  

Margin of Error (표본오차) For the survey data (51% of 829 respondents opposed to FTA), we can calculate the sample proportion and that sample proportion is typically different from the population proportion p. The difference between the sample proportion and the population proportion can be thought of as an error.

Margin of Error for Proportion (여론조사)

Margin of Error Confidence Interval There is a probability of 1 -  that a sample proportion will be in error by no more than E. There is a probability of  that the sample proportion will be in error by more than E. Confidence Interval

Procedure for Constructing a Confidence Interval for p 1. Verify that the required assumptions are satisfied. (The sample is a simple random sample, the conditions for the binomial distribution are satisfied, and the normal distribution can be used to approximate the distribution of sample proportions because np≥5, and nq≥5 are both satisfied). 2. Refer to Table A-2 and find the critical value zα/2that corresponds to the desired confidence level. 3. Evaluate the margin of error

Procedure for Constructing a Confidence Interval for p 4. Using the calculated margin of error, E, and the value of the sample proportion, , find the values of – E and + E. Substitute those values in the general format for the confidence interval:

Example 3 p.334 In the Chapter Problem we noted that a Pew Research Center poll of 1501 randomly selected adults showed that 70% of the respondents believe in global warming. The sample results are n = 1501, and a Find the margin of error E that corresponds to a 95% confidence level. b. Find the 95% confidence interval estimate of the population proportion p. c. Based on the results, can we safely conclude that the majority of adults believe in global warming? d. Assuming that you are a newspaper reporter, write a brief statement that accurately describes the results and includes all of the relevant information. See my note.

check for assumptions: __________________ and __________________ a) Find the margin of error E that corresponds to a 95% confidence level. check for assumptions: __________________ and __________________ Find the Margin of Error: n =1501, p = 0.7 q = 0.3 See my note.

b) Find the 95% confidence interval estimate of the population proportion p. See my note. “The success rate is 70% with a margin of error of 2.3% point.”

c) Based on the results, can we safely conclude that the majority of adults believe in global warming? It appears that the proportion of adults who believe in global warming is greater than 0.5 (or 50%), so we can safely conclude that the majority of adults believe in global warming. Because the limits of 0.677 and 0.723 are likely to contain the true population proportion, it appears that the population proportion is a value greater than 0.5.

d) Assuming that you are a newspaper reporter, write a brief statement that accurately describes the results and includes all of the relevant information. Here is one statement that summarizes the results: 70% of adults believe that the earth is getting warmer. That percentage is based on a Pew Research Center poll of 1501 randomly selected adults. In theory, in 95% of such polls, the percentage should differ by no more than 2.3 percentage points in either direction from the percentage that would be found by interviewing all adults.

Determining Sample Size How do we know how many sample items must be obtained?

Determining Sample Size When an estimate of p is known: ˆ ˆ When no estimate of p is known:

Example 4 p.336 The Internet is affecting us all in many different ways, so there are many reasons for estimating the proportion of adults who use it. Assume that a manager for E-Bay wants to determine the current percentage of adults who now use the Internet. How many adults must be surveyed in order to be 95% confident that the sample percentage is in error by no more than three percentage points? a. In 2006, 73% of adults used the Internet. b. No known possible value of the proportion.

a) Use To be 95% confident that our sample percentage is within three percentage points of the true percentage for all adults, we should obtain a simple random sample of 842 adults.

b) Use To be 95% confident that our sample percentage is within three percentage points of the true percentage for all adults, we should obtain a simple random sample of 1068 adults.

Class Talk Suppose a sociologist wants to determine the current percentage of households using e-mail. How many households must be surveyed in order to be 95% confident that the sample percentage is in error by no more than four percentage points? Use this result from an earlier study: In 1997, 16.9% of U.S. households used e-mail. Assume that we have no prior information suggesting a possible value of p.

2. Estimating (ii) a Population Mean :  Known

Assumptions The sample is a simple random sample. The value of the population standard deviation σ is known. Either or both of these conditions is satisfied: The population is normally distributed or n> 30.

The sample proportion is the best point estimate of the population p. For similar reasons, the sample mean is the best point estimate of the population mean .

Confidence Interval Estimate of the Population Mean  with known 

Margin of Error The difference between the sample mean and the population mean  is called the margin of error. E = the margin of error for mean based on known .

Example 1 (p.347): Weight of Men People have died in boat and aircraft accidents because an obsolete estimate of the mean weight of men was used. In recent decades, the mean weight of men has increased considerably, so we need to update our estimate of that mean so that boats, aircraft, elevators, and other such devices do not become dangerously overloaded. Using the weights of men from Data Set 1 in Appendix B, we obtain these sample statistics for the simple random sample: n = 40 and = 172.55 lb. Research from several other sources suggests that the population of weights of men has a standard deviation given by  = 26 lb.

Find the best point estimate of the mean weight of the population of all men. The sample mean of 172.55 lb is the best point estimate of the mean weight of the population of all men. b. Construct a 95% confidence interval estimate of the mean weight of all men. c. What do the results suggest about the mean weight of 166.3 lb that was used to determine the safe passenger capacity of water vessels in 1960 (as given in the National Transportation and Safety Board safety recommendation M-04-04)?

Construct the confidence interval. b Construct a 95% confidence interval estimate of the mean weight of all men. A 95% confidence interval or 0.95 implies  = 0.05, so z/2 = 1.96. Calculate the margin of error. Construct the confidence interval.

c. What do the results suggest about the mean weight of 166 c What do the results suggest about the mean weight of 166.3 lb that was used to determine the safe passenger capacity of water vessels in 1960 (as given in the National Transportation and Safety Board safety recommendation M-04-04)? Based on the confidence interval, it is possible that the mean weight of 166.3 lb used in 1960 could be the mean weight of men today. However, the best point estimate of 172.55 lb suggests that the mean weight of men is now considerably greater than 166.3 lb. Considering that an underestimate of the mean weight of men could result in lives lost through overloaded boats and aircraft, these results strongly suggest that additional data should be collected. (Additional data have been collected, and the assumed mean weight of men has been increased.)

2. Estimating (iii) a Population Mean :  “Not” Known

Assumptions The sample is a simple random sample. Either the sample is from a normally distributed population, or n> 30. If  is not known, but the assumptions are satisfied, then use Student t distribution

Because we do not know the value of , we estimate it with the value of the sample standard deviation s, but this introduces another source of unreliability, especially with small samples. In order to keep a confidence interval at some desired level, such as 95%, we compensate for this additional unreliability by making the confidence interval wider. Have to use critical values larger than the critical value z/2  ________

Student t distribution If the distribution of a population is essentially normal, then the distribution of is essentially a Student t distribution for all samples of size n, and is used to find critical values denoted by tα/2.

To find the critical value tα/2 in Table A-3, we have to find the appropriate number of degree of freedom in the left column. Degree of freedom corresponds to the number of sample values that can vary after certain restrictions have been imposed on all data values.

Degrees of Freedom (df) If 10 students have quiz scores with a mean of 80, we can freely assign values to the first 9 scores, but the 10the score is then determined. The sum of the 10 scores must be 800, so the 10th score = 800 – sum (the first 9 scores). Because those first 9 scores can be freely selected to be any values, we say that there are 9 degrees of freedom available. The 10th score must be fixed. Hence, df = 10 – 1, where 1 is the 10th score. See my note for the example p.333. table 찾는 방법.

Margin of Error E for the Estimate  with  Not Known

Confidence Interval for the Estimate  with  Not Known

Example 2 p.358 A common claim is that garlic lowers cholesterol levels. In a test of the effectiveness of garlic, 49 subjects were treated with doses of raw garlic, and their cholesterol levels were measured before and after the treatment. The changes in their levels of LDL cholesterol (in mg/dL) have a mean of 0.4 and a standard deviation of 21.0. Use the sample statistics of n = 49, = 0.4 and s = 21.0 to construct a 95% confidence interval estimate of the mean net change in LDL cholesterol after the garlic treatment. What does the confidence interval suggest about the effectiveness of garlic in reducing LDL cholesterol?

Requirements are satisfied: simple random sample and n = 49 (i. e Requirements are satisfied: simple random sample and n = 49 (i.e., n > 30). 95% implies a = 0.05. With n = 49, the df = 49 – 1 = 48 Closest df is 50, two tails, so t/2 = 2.009 Using t/2 = 2.009, s = 21.0 and n = 49 the margin of error is:

Construct the confidence interval: We are 95% confident that the limits of –5.6 and 6.4 actually do contain the value of , the mean of the changes in LDL cholesterol for the population. Because the confidence interval limits contain the value of 0, it is very possible that the mean of the changes in LDL cholesterol is equal to 0, suggesting that the garlic treatment did not affect the LDL cholesterol levels. It does not appear that the garlic treatment is effective in lowering LDL cholesterol.

Class Talk A study found the body temperatures of 106 healthy adults. The sample mean was 98.2 degrees and the sample standard deviation was 0.62 degrees. (a) Find the margin of error E (b) Find the 95% confidence interval for µ.

Important Properties of Student t distribution 1. The t distribution is different for different sample sizes

Important Properties of Student t distribution 2. The Student t distribution has the same general symmetric bell shape as the normal distribution but it reflects the greater variability (with wider distributions) that is expected with small samples. 3. The Student t distribution has a mean of t = 0 (just as the standard normal distribution has a mean of z= 0).

Important Properties of Student t distribution The standard deviation of the Student t distribution varies with the sample size and is greater than 1 (unlike the standard normal distribution, which has a σ = 1). 5. As the sample size n gets larger, the Student t distribution gets closer to the normal distribution.

Using the Normal and t Distribution

3. Estimating a Population Variance

Assumptions The sample is a simple random sample. 2. The population must have normally distributed values (even if the sample is large).

Chi-Square Distribution

Properties of the Distribution of the Chi-square Statistic The chi-square distribution is not symmetric, unlike the normal and Student t distributions. All values are nonnegative.

2. As the number of degrees of freedom increases, the distribution becomes more symmetric and approaches a normal distribution. Chi-Square Distribution for df = 10 and df = 20

Properties of the Distribution of the Chi-square Statistic 3 In Table A-4, each critical value of 2 corresponds to an area given in the top row of the table, and that area represents the cumulative area located to the right of the critical value.

Example p.372 Find the critical values of that determine critical regions containing an area of 0.025 in each tail. Assume that the relevant sample size is 10 so that the number of degrees of freedom is 10 –1, or 9. α= 0.05 α/2= 0.025 1 − α/2= 0.975

For a sample of 10 values taken from a normally distributed population, the chi-square statistic 2 = (n – 1)s2/2 has a 0.95 probability of falling between the chi-square critical values of 2.700 and 19.023.

Estimators of 2 The sample variance s2 is the best point estimate of the population variance 2.

Confidence Interval for 2 Although s2 is the best point estimate, there is no indication how good it is.

Example p.372 The proper operation of typical home appliances requires voltage levels that do not vary much. Listed below are ten voltage levels (in volts) recorded in the author’s home on ten different days. These ten values have a standard deviation of s = 0.15 volt. Use the sample data to construct a 95% confidence interval estimate of the standard deviation of all voltage levels. 123.3 123.5 123.7 123.4 123.6 123.5 123.5 123.4 123.6 123.8

n = 10 so df = 10 – 1 = 9 Use table A-4 to find: Construct the confidence interval: n = 10, s = 0.15

Evaluation the preceding expression yields: Finding the square root of each part (before rounding), then rounding to two decimal places, yields this 95% confidence interval estimate of the population standard deviation: Based on this result, we have 95% confidence that the limits of 0.10 volt and 0.27 volt contain the true value of .

Class Talk A study found the body temperatures of 106 healthy adults. The sample mean was 98.2 degrees and the sample standard deviation was 0.62 degrees. Find the 95% confidence interval for σ. n = 106 = 98.2 s = 0.62  = 0.05 /2 = 0.025 1 – /2 = 0.975

Determining Sample Size Instead of presenting the much more complex procedures for determining sample size necessary to estimate the population variance, this table will be used.

Example 3 p.377 We want to estimate , the standard deviation of all voltage levels in a home. We want to be 95% confident that our estimate is within 20% of the true value of . How large should the sample be? Assume that the population is normally distributed.

Find the margin of error for the 95% confidence interval used to estimate the population proportion with n = 163 and x = 96. 0.0680 0.0755 0.132 0.00291

459 randomly selected light bulbs were tested in a laboratory, 291 lasted more than 500 hours. Find a point estimate of the true proportion of all light bulbs that last more than 500 hours. 0.632 0.366 0.388 0.634

Use the confidence level and sample data to find the margin of error E. College students’ annual earnings: 99% confidence, n = 74 x = $3967, σ = $874 $262 $9 $237 $1187

You want to be 95% confident that the sample variance is within 40% of the population variance. Find the appropriate sample size. 224 11 14 56

To find the standard deviation of the diameter of wooden dowels, the manufacturer measures 19 randomly selected dowels and finds the standard deviation of the sample to be s = 0.16. Find the 95% confidence interval for the population standard deviation σ. 0.13 < σ < 0.22 0.12 < σ < 0.24 0.11 < σ < 0.25 0.15 < σ < 0.21