Chapter 9 Confidence Intervals.

Chapter 9 Confidence Intervals

Section 9.1 Point Estimation

How might we go about estimating this proportion?
Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl. How might we go about estimating this proportion? We could take a sample of candies and compute the proportion of blue candies in our sample. We would have a sample proportion or a statistic – a single value for the estimate. Create a jar with different types of coins . . .

Point Estimate A single number (a statistic) based on sample data that is used to estimate a population characteristic But not always to the population characteristic due to sampling variation “point” refers to the single value on a number line. Different samples may produce different statistics. Population characteristic

Contacts Example A sample of 200 students at a large university is selected to estimate the proportion of students that wear contact lenses. In this sample 47 wore contact lenses. Let p = the true proportion of all students at this university who wear contact lenses. Consider “success” being a student who wears contact lens. The statistic Is a reasonable choice for a formula to obtain a point estimate for p.

Weight Example A sample of weights of 34 male freshman students was obtained. If one wanted to estimate the true mean of all male freshman students, you might use the sample mean as a point estimate for the true mean.

Weight Example--Continued
After looking at a histogram and boxplot of the data (below) you might notice that the data seems reasonably symmetric with an outlier, so you might use either the sample median or a sample trimmed mean as a point estimate. 260 220 180 140 Calculated using Minitab

The paper “U.S. College Students’ Internet Use: Race, Gender and Digital Divides” (Journal of Computer-Mediated Communication, 2009) reports the results of 7421 students at 40 colleges and universities. (The sample was selected in such a way that it is representative of the population of college students.) The authors want to estimate the proportion (p) of college students who spend more than 3 hours a day on the Internet. 2998 out of 7421 students reported using the Internet more than 3 hours a day. This is a point estimate for the population proportion of college students who spend more than 3 hours a day on the Internet.

The dotplot suggests this data is approximately symmetric.
The paper “The Impact of Internet and Television Use on the Reading Habits and Practices of College Students” (Journal of Adolescence and Adult Literacy, 2009) investigates the reading habits of college students. The following observations represent the number of hours spent on academic reading in 1 week by 20 college students. The dotplot suggests this data is approximately symmetric. 1.7 3.8 4.7 9.6 11.7 12.3 12.4 12.6 13.4 14.1 14.2 15.8 15.9 18.7 19.4 21.2 21.9 23.3 28.2

College Reading Continued . . .
1.7 3.8 4.7 9.6 11.7 12.3 12.4 12.6 13.4 14.1 14.2 15.8 15.9 18.7 19.4 21.2 21.9 23.3 28.2 So which of these point estimates should we use? The mean of the middle 16 observations.

Bias A statistic with a mean value equal to the value of the population characteristic being estimated is said to be an unbiased statistic. A statistic that is not unbiased is said to be biased. Sampling distribution of an unbiased statistic Original distribution Sampling distribution of a biased statistic

Choosing a Statistic for Computing an Estimate
Choose a statistic that is unbiased (accurate) Unbiased, since the distribution is centered at the true value Unbiased, since the distribution is centered at the true value Biased, since the distribution is NOT centered at the true value

Choosing a Statistic for Computing an Estimate
Choose a statistic that is unbiased (accurate) Choose a statistic with the smallest standard deviation Unbiased, but has a smaller standard deviation so it is more precise. Unbiased, but has a larger standard deviation so it is not as precise. If the population distribution is normal, then x has a smaller standard deviation than any other unbiased statistic for estimating m.

Large-Sample Confidence Interval for a Population Proportion
Section 9.2 Large-Sample Confidence Interval for a Population Proportion

Suppose we wanted to estimate the proportion of blue candies in a VERY large bowl.
We could take a sample of candies and compute the proportion of blue candies in our sample. How much confidence do you have in the point estimate? Would you have more confidence if your answer were an interval? Create a jar with different types of coins . . .

Confidence intervals A confidence interval (CI) for a population characteristic is an interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the actual value of the characteristic will be between the lower and upper endpoints of the interval. The primary goal of a confidence interval is to estimate an unknown population characteristic.

Rate your confidence 0 – 100%
How confident (%) are you that you can ... Guess my age within 10 years? . . . within 5 years? . . . within 1 year? What does it mean to be within 10 years? What happened to your level of confidence as the interval became smaller? Adapted from an activity from Michael Legacy.

Confidence level The confidence level associated with a confidence interval estimate is the success rate of the method used to construct the interval. If this method was used to generate an interval estimate over and over again from different samples, in the long run 95% of the resulting intervals would include the actual value of the characteristic being estimated. Our confidence is in the method – NOT in any one particular interval! The most common confidence levels are 90%, 95%, and 99% confidence.

Recall the General Properties for Sampling Distributions of p
1. 2. These are the conditions that must be true in order to calculate a large-sample confidence interval for p As long as the sample size is less than 10% of the population

Let’s develop the equation for the large-sample confidence interval.
We can generalize this to normal distributions other than the standard normal distribution – About 95% of the values are within 1.96 standard deviations of the mean To begin, we will use a 95% confidence level. Use the table of standard normal curve areas to determine the value of z* such that a central area of .95 falls between –z* and z*. 95% of these values are within 1.96 standard deviations of the mean. Central Area = .95 Lower tail area = .025 Upper tail area = .025 -1.96 1.96

Developing a Confidence Interval Continued . . .
And this will happen for 95% of all possible samples!

Developing a Confidence Interval Continued . . .
Approximate sampling distribution of p Suppose we get this p Suppose we get this p and create an interval p Create an interval around p Suppose we get this p and create an interval Using this method of calculation, the confidence interval will not capture p 5% of the time. p p This line represents 1.96 standard deviations below the mean. This line represents 1.96 standard deviations above the mean. When n is large, a 95% confidence interval for p is Notice that the length of each half of the interval equals Here is the mean of the sampling distribution p This p doesn’t fall within 1.96 standard deviations of the mean AND its confidence interval does NOT “capture” p. This p fell within 1.96 standard deviations of the mean AND its confidence interval “captures” p.

Field House Example For a project, a student randomly sampled 182 other students at a large university to determine if the majority of students were in favor of a proposal to build a field house. He found that 75 were in favor of the proposal. Let p = the true proportion of students that favor the proposal.

Field House Example -- continued
So np = 182(0.4121) = 75 >10 and n(1-p)=182(0.5879) = 107 >10 we can use the formulas given on the previous slide to find a 95% confidence interval for p. The 95% confidence interval for p is (0.341, 0.484).

The diagram to the right is 100 confidence intervals for p computed from 100 different random samples. Note that the ones with asterisks do not capture p. If we were to compute 100 more confidence intervals for p from 100 different random samples, would we get the same results?

The Large-Sample Confidence Interval for p
Now let’s look at a more general formula.

The general formula for a confidence interval for a population proportion p is The standard error of a statistic is the estimated standard deviation of the statistic. point estimate

The Large-Sample Confidence Interval for p The general formula for a confidence interval for a population proportion p is This is called the bound on the error estimation. This is also called the margin of error.

Terminology The standard error of the statistic is the estimated standard deviation of the statistic and is calculated by: The bound on error of estimation, B, associated with a 95% confidence interval is .

Terminology The bound on error of estimation, B, associated with any confidence interval is

Finding a z Critical Value
Finding a z critical value for a 98% confidence interval. 2.33 Looking up the cumulative area or in the body of the table we find z = 2.33

The article “How Well Are U.S.
Colleges Run?” (USA Today, February 17, ) describes a survey of 1031 adult Americans. The survey was carried out by the National Center for Public Policy and the sample was selected in a way that makes it reasonable to regard the sample as representative of adult Americans. Of those surveyed, 567 indicated that they believe a college education is essential for success. What is a 95% confidence interval for the population proportion of adult Americans who believe that a college education is essential for success? The point estimate is Before computing the confidence interval, we need to verify the conditions.

College Education Continued . . .
What is a 95% confidence interval for the population proportion of adult Americans who believe that a college education is essential for success? Conditions: 2) The sample size of n = 1031 is much smaller than 10% of the population size (adult Americans). 3) The sample was selected in a way designed to produce a representative sample. So we can regard the sample as a random sample from the population. All of our conditions are verified so it is safe to proceed with the calculation of the confidence interval.

What does this interval mean in the context of this problem?
College Education Continued . . . What is a 95% confidence interval for the population proportion of adult Americans who believe that a college education is essential for success? Calculation: Conclusion: We are 95% confident that the population proportion of adult Americans who believe that a college education is essential for success is between 52.1% and 57.9% What does this interval mean in the context of this problem?

Recall the “Rate your Confidence” Activity
College Education Revisited . . . A 95% confidence interval for the population proportion of adult Americans who believe that a college education is essential for success is: Compute a 90% confidence interval for this proportion. Compute a 99% confidence interval for this proportion. Recall the “Rate your Confidence” Activity What do you notice about the relationship between the confidence level of an interval and the width of the interval?

What value should be used for the unknown value p?
Choosing a Sample Size The bound on error estimation for a 95% confidence interval is If we solve this for n . . . Sometimes, it is feasible to perform a preliminary study to estimate the value for p. In other cases, prior knowledge may suggest a reasonable estimate for p. If there is no prior knowledge and a preliminary study is not feasible, then the conservative estimate for p is 0.5. Before collecting any data, an investigator may wish to determine a sample size needed to achieve a certain bound on error estimation. What value should be used for the unknown value p?

Why is the conservative estimate for p = 0.5?
.1(.9) = .09 .2(.8) = .16 .3(.7) = .21 .4(.6) = .24 .5(.5) = .25 By using .5 for p, we are using the largest value for p(1 – p) in our calculations.

In spite of the potential safety hazards, some people would like to have an internet connection in their car. Determine the size required to estimate the proportion of adult Americans who would like an internet connection in their car to within 0.03 with 95% confidence. What value should be used for p? This is the value for the bound on error estimate B. Always round the sample size up to the next whole number.

A sample of 545 or more would be needed.
NYPD Blue Example If a TV executive would like to find a 95% confidence interval estimate within 0.03 for the proportion of all households that watch NYPD Blue regularly, how large of a sample is needed if a prior estimate for p was 0.15? We have B = 0.03 and the prior estimate of p = 0.15 A sample of 545 or more would be needed.

NYPD Blue Example--Continued
Suppose a TV executive would like to find a 95% confidence interval estimate within 0.03 for the proportion of all households that watch NYPD Blue regularly. How large a sample is needed if we have no reasonable prior estimate for p? We have B = 0.03 and should use p = 0.5 in the formula. The required sample size is now 1068. Notice, a reasonable ball park estimate for p can lower the needed sample size.

Field House Example A college professor wants to estimate the proportion of students at a large university who favor building a field house with a 99% confidence interval accurate to If one of his students performed a preliminary study and estimated p to be 0.412, how large a sample should he take? We have B = 0.02, a prior estimate p = and we should use the z critical value 2.58 (for a 99% confidence interval) The required sample size is 4032.

Confidence Interval for a Population Mean
Section 9.3 Confidence Interval for a Population Mean

Now let’s look at confidence intervals to estimate the mean m of a population.

Confidence intervals for m when s is known
This confidence interval is appropriate even when n is small, as long as it is reasonable to think that the population distribution is normal in shape. Is this typically known? Bound on error of estimation These are the properties of the sampling distribution of x. Standard deviation of the statistic Point estimate

First, verify that the conditions are met.
Cosmic radiation levels rise with increasing altitude, promoting researchers to consider how pilots and flight crews might be affected by increased exposure to cosmic radiation. A study reported a mean annual cosmic radiation dose of 219 mrems for a sample of flight personnel of Xinjiang Airlines. Suppose this mean is based on a random sample of 100 flight crew members. Let s = 35 mrems. Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members. Data is from a random sample of crew members Sample size n is large (n > 30) s is known First, verify that the conditions are met.

What does this mean in context?
Cosmic Radiation Continued . . . Let x = 219 mrems n = 100 flight crew members s = 35 mrems. Calculate and interpret a 95% confidence interval for the actual mean annual cosmic radiation exposure for Xinjiang flight crew members. What would happen to the width of this interval if the confidence level was 90% instead of 95%? What does this mean in context? We are 95% confident that the actual mean annual cosmic radiation exposure for Xinjiang flight crew members is between mrems and mrems.

Ketchup Bottling Example
A certain filling machine has a true population standard deviation  = ounces when used to fill ketchup bottles. A random sample of 36 “6 ounce” bottles of ketchup was selected from the output from this machine and the sample mean was 6.018 ounces. Find a 90% confidence interval estimate for the true mean fills of ketchup from this machine.

Ketchup Bottling Example--Continued
The z critical value is 1.645 90% Confidence Interval (5.955, 6.081)

Confidence intervals for m when s is unknown
When s is unknown, we use the sample standard deviation s to estimate s. In place of z-scores, we must use the following to standardize the values: The use of the value of s introduces extra variability. Therefore the distribution of t values has more variability than a standard normal curve.

Important Properties of t Distributions
t distributions are described by degrees of freedom (df). The t distribution corresponding to any particular number of degrees of freedom is bell shaped and centered at zero (just like the standard normal (z) distribution). Each t distribution is more spread out than the standard normal distribution. z curve Why is the z curve taller than the t curve for 2 df? t curve for 2 df

Important Properties of t Distributions Continued . . .
3) As the number of degrees of freedom increases, the spread of the corresponding t distribution decreases. t curve for 8 df t curve for 2 df

Important Properties of t Distributions Continued . . .
3) As the number of degrees of freedom increases, the spread of the corresponding t distribution decreases. 4) As the number of degrees of freedom increases, the corresponding sequence of t distributions approaches the standard normal distribution. For what df would the t distribution be approximately the same as a standard normal distribution? z curve t curve for 2 df t curve for 5 df

t Distributions Notice: As df increase, t distributions approach the standard normal distribution. Since each t distribution would require a table similar to the standard normal table, we usually only create a table of critical values for the t distributions.

Confidence intervals for m when s is unknown
The general formula for a confidence interval for a population mean m based on a sample of size n when : 1) x is the sample mean from a random sample, 2) the population distribution is normal, or the sample size n is large (n > 30), and 3) s, the population standard deviation, is unknown is Where the t critical value is based on df = n - 1. This confidence interval is appropriate for small n ONLY when the population distribution is (at least approximately) normal. t critical values are found in Table 3

First verify that the conditions for a t-interval are met.
The article “Chimps Aren’t Charitable” (Newsday, November 2, 2005) summarized the results of a research study published in the journal Nature. In this study, chimpanzees learned to use an apparatus that dispersed food when either of two ropes was pulled. When one of the ropes was pulled, only the chimp controlling the apparatus received food. When the other rope was pulled, food was dispensed both to the chimp controlling the apparatus and also a chimp in the adjoining cage. The accompanying data represent the number of times out of 36 trials that each of seven chimps chose the option that would provide food to both chimps (charitable response). First verify that the conditions for a t-interval are met. Compute a 99% confidence interval for the mean number of charitable responses for the population of all chimps.

Let’s use a normal probability plot.
Chimps Continued . . . Number of Charitable Responses Normal Scores The plot is reasonably straight, so it seems plausible that the population distribution of number of charitable responses is approximately normal. Let’s suppose it is reasonable to regard this sample of seven chimps as representative of the chimp population. Since n is small, we need to verify if it is plausible that this sample is from a population that is approximately normal. Let’s use a normal probability plot.

Chimps Continued . . . x = and s = df = 7 – 1 = 6 We are 99% confident that the mean number of charitable responses for the population of all chimps is between and

Television Watching Example
Ten randomly selected shut-ins were each asked to list how many hours of television they watched per week. The results are Find a 90% confidence interval estimate for the true mean number of hours of television watched per week by shut-ins.

Calculating the sample mean and standard deviation we have n = 10, = 86, s = We find the critical t value of by looking on the t table in the row corresponding to df = 9, in the column with bottom label 90%. Computing the confidence interval for  is

To calculate the confidence interval, we had to make the assumption that the distribution of weekly viewing times was normally distributed. Consider the normal plot of the 10 data points produced with Minitab that is given on the next slide.

Notice that the normal plot looks reasonably linear so it is reasonable to assume that the number of hours of television watched per week by shut-ins is normally distributed. Typically if the p-value is more than 0.05 we assume that the distribution is normal P-Value: A-Squared: 0.226 Anderson-Darling Normality Test

Choosing a Sample Size When s is unknown, a preliminary study can be performed to estimate s OR make an educated guess of the value of s. A rough estimate for s (used with distributions that are not too skewed) is the range divided by 4. The bound on error of estimation associated with a 95% confidence interval is Solve this for n: This requires s to be known – which is rarely the case! We can use this to find the necessary sample size for a particular bound on error of estimation.

The financial aid office wishes to estimate the mean cost of textbooks per quarter for students at a particular university. For the estimate to be useful, it should be within $20 of the true population mean. How large a sample should be used to be 95% confident of achieving this level of accuracy? The financial aid office believes that the amount spent on books varies with most values between $150 to $550. To estimate s :

Always round sample size up to the next whole number!
The financial aid office wishes to estimate the mean cost of textbooks per quarter for students at a particular university. For the estimate to be useful, it should be within $20 of the true population mean. How large a sample should be used to be 95% confident of achieving this level of accuracy? Always round sample size up to the next whole number!

Chapter 9 Confidence Intervals.

Similar presentations

Presentation on theme: "Chapter 9 Confidence Intervals."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 9 Confidence Intervals.

Similar presentations

Presentation on theme: "Chapter 9 Confidence Intervals."— Presentation transcript:

Similar presentations

About project

Feedback