CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION

CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION
CONFIDENCE INTERVAL FOR THE DIFFERENCE IN TWO SAMPLE POPULATION

Point Estimate and Interval Estimate
A point estimate is a single number that is our “best guess” for the parameter. Point estimation produces a number (an estimate) which is believed to be close to the value of the unknown parameter. An interval estimate is an interval of numbers within which the parameter value is believed to fall. Interval estimation produces an interval that contains the estimated parameter with a prescribed confidence.

Point Estimate and Interval Estimate (Figure 10.1)

Figure 10.1 A point estimate predicts a parameter by a single number. An interval estimate is an interval of numbers that are believable values for the parameter. Question: Why is a point estimate alone not sufficiently informative?

A point estimate doesn’t tell us how close the estimate is likely to be to the parameter. An interval estimate is more useful, it incorporates a margin of error which helps us to gauge the accuracy of the point estimate.

Properties of Point Estimators
Property 1: A good estimator has a sampling distribution that is centered at the parameter. An estimator with this property is unbiased. The sample mean is an unbiased estimator of the population mean. The sample proportion is an unbiased estimator of the population proportion.

S P SOME POINT ESTIMATORS PARAMETER UNBIASED ESTIMATOR PROPORTION MEAN
STANDARD DEVIATION S

Properties of Point Estimators
Property 2: A good estimator has a small standard deviation compared to other estimators. This means it tends to fall closer than other estimates to the parameter. The sample mean has a smaller standard error than the sample median when estimating the population mean of a normal distribution.

The Logic behind Constructing a Confidence Interval
To construct a confidence interval for a population proportion, start with the sampling distribution of a sample proportion. Gives the possible values for the sample proportion and their probabilities. The sampling distribution: Is approximately a normal distribution for large random samples by the CLT. Has mean equal to the population proportion. Has standard deviation called the standard error.

Constructing a Confidence Interval to Estimate a Population Proportion
We symbolize a population proportion by p. The point estimate of the population proportion is the sample proportion. We symbolize the sample proportion by called “p-hat”.

A CONFIDENCE INTERVAL OFTEN HAS THE FORM: IT IS CONSTRUCTED WITH A PRESCRIBED CONFIDENCE KNOWN AS THE CONFIDENCE LEVEL

Confidence Interval or Interval Estimate
Sample estimate  Multiplier × Standard Error Sample estimate  Margin of error Multiplier is a number based on the confidence level desired and determined from the standard normal distribution (for proportions) or Student’s t-distribution (for means).

The Multiplier Multiplier, denoted as z*, is the standardized score such that the area between -z* and z* under the standard normal curve corresponds to the desired confidence level. Note: Increase confidence level => larger multiplier

The Multiplier

For 90% Confidence Level

SOME CRITICAL VALUES FOR STANDARD NORMAL DISTRIBUTION
C % CONFIDENCE LEVEL CRITICAL VALUE 80% 1.282 90% 1.645 95% 1.960 98% 2.326 99% 2.576

Interpretation of the Confidence Level
So what does it mean to say that we have “95% confidence”? The meaning refers to a long-run interpretation—how the method performs when used over and over with many different random samples. If we used the 95% confidence interval method over time to estimate many population proportions, then in the long run about 95% of those intervals would give correct results, containing the population proportion.

WHAT DOES C% CONFIDENCE REALLY MEAN?
FORMALLY, WHAT WE MEAN IS THAT C% OF SAMPLES OF THIS SIZE WILL PRODUCE CONFIDENCE INTERVALS THAT CAPTURE THE TRUE PROPORTION. C% CONFIDENCE MEANS THAT ON AVERAGE, IN C OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER. E.G. A 95% CONFIDENCE MEANS THAT ON THE AVERAGE, IN 95 OUT OF 100 ESTIMATIONS, THE INTERVAL WILL CONTAIN THE TRUE ESTIMATED PARAMETER.

CONFIDENCE INTERVAL FOR PROPORTION P [ONE-PROPORTION Z-INTERVAL]
ASSUMPTIONS AND CONDITIONS RANDOMIZATION CONDITION 10% CONDITION SAMPLE SIZE ASSUMPTION OR SUCCESS/FAILURE CONDITION INDEPENDENCE ASSUMPTION NOTE: PROPER RANDOMIZATION CAN HELP ENSURE INDEPENDENCE.

CONSTRUCTING CONFIDENCE INTERVALS
ESTIMATOR SAMPLE PROPORTION STANDARD ERROR C% MARGIN OF ERROR C% CONFIDENCE INTERVAL

Compact Formula For a Confidence Interval For a Population Proportion p
is the sample proportion. z* denotes the multiplier. where is the standard error of .

The exact standard deviation of a sample proportion equals: This formula depends on the unknown population proportion, p. In practice, we don’t know p, and we need to estimate the standard error as

Margin of Error The margin of error measures how accurate the point estimate is likely to be in estimating a parameter. It is a multiple of the standard error of the sampling distribution when the sampling distribution is a normal distribution. The distance of 1.96 standard errors is the margin of error for a 95% confidence interval for a parameter from a normal distribution.

Intuitive Explanation of Margin of Error
Margin of Error Characteristics: The difference between the sample proportion and the population proportion is less than the margin of error about 95% of the time, or for about 19 of every 20 sample estimates. The difference between the sample proportion and the population proportion is more than the margin of error about 5% of the time, or for about 1 of every 20 sample estimates

SAMPLE SIZE NEEDED TO PRODUCE A CONFIDENCE INTERVAL WITH A GIVEN MARGIN OF ERROR, ME
SOLVING FOR n GIVES WHERE IS A REASONABLE GUESS. IF WE CANNOT MAKE A GUESS, WE TAKE

EXAMPLE 1 A MAY 2002 GALLUP POLL FOUND THAT ONLY 8% OF A RANDOM SAMPLE OF 1012 ADULTS APPROVED OF ATTEMPTS TO CLONE A HUMAN. FIND THE MARGIN OF ERROR FOR THIS POLL IF WE WANT 95% CONFIDENCE IN OUR ESTIMATE OF THE PERCENT OF AMERICAN ADULTS WHO APPROVE OF CLONING HUMANS. EXPLAIN WHAT THAT MARGIN OF ERROR MEANS. IF WE ONLY NEED TO BE 90% CONFIDENT, WILL THE MARGIN OF ERROR BE LARGER OR SMALLER? EXPLAIN. FIND THAT MARGIN OF ERROR. IN GENERAL, IF ALL OTHER ASPECTS OF THE SITUATION REMAIN THE SAME, WOULD SMALLER SAMPLES PRODUCE SMALLER OR LARGER MARGINS OF ERROR?

SOLUTION

EXAMPLE 2 DIRECT MAIL ADVERTISERS SEND SOLICITATIONS (a.k.a. “junk mail”) TO THOUSANDS OF POTENTIAL CUSTOMERS IN THE HOPE THAT SOME WILL BUY THE COMPANY’S PRODUCT. THE RESPONSE RATE IS USUALLY QUITE LOW. SUPPOSE A COMPANY WANTS TO TEST THE RESPONSE TO A NEW FLYER, AND SENDS IT TO 1000 PEOPLE RANDOMLY SELECTED FROM THEIR MAILING LIST OF OVER 200,000 PEOPLE. THEY GET ORDERS FROM 123 OF THE RECIPIENTS. CREATE A 90% CONFIDENCE INTERVAL FOR THE PERCENTAGE OF PEOPLE THE COMPANY CONTACTS WHO MAY BUY SOMETHING. EXPLAIN WHAT THIS INTERVAL MEANS. EXPLAIN WHAT “90% CONFIDENCE” MEANS. THE COMPANY MUST DECIDE WHETHER TO NOW DO A MASS MAILING. THE MAILING WON’T BE COST-EFFECTIVE UNLESS IT PRODUCES AT LEAST A 5% RETURN. WHAT DOES YOUR CONFIDENCE INTERVAL SUGGEST? EXPLAIN.

SOLUTION

EXAMPLE 3 IN 1998 A SAN DIEGO REPRODUCTIVE CLINIC REPORTED 49 BIRTHS TO 207 WOMEN UNDER THE AGE OF 40 WHO HAD PREVIOUSLY BEEN UNABLE TO CONCEIVE. FIND A 90% CONFIDENCE INTERVAL FOR THE SUCCESS RATE AT THIS CLINIC. INTERPRET YOUR INTERVAL IN THIS CONTEXT. EXPLAIN WHAT “90 CONFIDENCE” MEANS. WOULD IT BE MISLEADING FOR THE CLINIC TO ADVERTISE A 25% SUCCESS RATE? EXPLAIN. THE CLINIC WANTS TO CUT THE STATED MARGIN OF ERROR IN HALF. HOW MANY PATIENTS’ RESULTS MUST BE USED? DO YOU HAVE ANY CONCERNS ABOUT THIS SAMPLE? EXPLAIN.

SOLUTION

How Can We Use Confidence Levels Other than 95%?
In practice, the confidence level 0.95 is the most common choice. But, some applications require greater (or less) confidence. To increase the chance of a correct inference, we can use a larger confidence level, such as 0.99.

A 99% Confidence Interval Is Wider Than a 95% Confidence Interval.

Question: If you want greater confidence, why would you expect a wider interval?
In using confidence intervals, we must compromise between the desired margin of error and the desired confidence of a correct inference. As the desired confidence level increases, the margin of error gets larger.

Effects of Confidence Level and Sample Size on Margin of Error
The margin of error for a confidence interval: Increases as the confidence level increases Decreases as the sample size increases For instance, a 99% confidence interval is wider than a 95% confidence interval, and a confidence interval with 200 observations is narrower than one with 100 observations at the same confidence level. These properties apply to all confidence intervals, not just the one for the population proportion.

What is the Error Probability for the Confidence Interval Method?
The general formula for the confidence interval for a population proportion is: Sample estimate  Multiplier × Standard Error which in symbols is

What is the Error Probability for the Confidence Interval Method?

Confidence Intervals for the Difference Between Two Proportions
where z* is the value of the standard normal variable with area between -z* and z* equal to the desired confidence level.

Necessary Conditions Condition 1: Sample proportions are available based on independent, randomly selected samples from the two populations. Condition 2: All of the quantities – – are at least 10.

Example: Age and Using the Internet
Young:92 of 262 use Internet as main news source = .351 Old: 59 of 632 use Internet as main news source = .093 Approximate 95% Confidence Interval:  1.96(.0317)  .196 to .320 We are 95% confident that somewhere between 19.6% and 32.0% more young adults than older adults use the Internet as their main news source.

Using Confidence Intervals to Guide Decisions
Principle 1. A value not in a confidence interval can be rejected as a possible value of the population proportion. A value in a confidence interval is an “acceptable” possibility for the value of a population proportion. Principle 2. When a confidence interval for the difference in two population proportions does not cover 0, it is reasonable to conclude the two population proportions are different. Principle 3. When the confidence intervals for proportions in two different populations do not overlap, it is reasonable to conclude the two population proportions are different.

Example: Which Drink Tastes Better?
Taste Test: A sample of 60 people taste both drinks and 55% like taste of Drink A better than Drink B Makers of Drink A want to advertise these results. Makers of Drink B make a 95% confidence interval for the population proportion who prefer Drink A. 95% Confidence Interval: Note: Since .50 is in the interval, there is not enough evidence to claim that Drink A is preferred by a majority of population represented by the sample.

ESTIMATING MEANS WITH CONFIDENCE
CHAPTER 11 ESTIMATING MEANS WITH CONFIDENCE

CONFIDENCE INTERVALS FOR ONE POPULATION MEAN
The confidence interval again has the form Point estimate margin of error The sample mean is the point estimate of the population mean. The exact standard error of the sample mean is In practice, we estimate σ by the sample standard deviation, s, so

Confidence Intervals for One Population Mean
For large n… from any population and also For small n from an underlying population that is normal… The confidence interval for the population mean is:

Confidence Intervals for One Population Mean
In practice, we don’t know the population standard deviation . Substituting the sample standard deviation s for to get introduces extra error. To account for this increased error, we must replace the z-score by a slightly larger score, called a t –score. The confidence interval is then a bit wider. This distribution is called the t distribution.

Summary: Properties of the t-Distribution
The t-distribution is bell shaped and symmetric about 0. The probabilities depend on the degrees of freedom, . The t-distribution has thicker tails than the standard normal distribution, i.e., it is more spread out. A t -score multiplied by the standard error gives the margin of error for a confidence interval for the mean.

t - Distribution

t - Distribution The t Distribution Relative to the Standard Normal Distribution: The t distribution gets closer to the standard normal as the degrees of freedom ( df ) increase. The two are practically identical when Question: Can you find z -scores (such as 1.96) for a normal distribution on the t table?

t - Distribution

t – Distribution Part of t - Table Displaying t-Scores. The scores have right-tail probabilities of 0.100, 0.050, 0.025, 0.010, 0.005, and When and is the t -score with right-tail probability = and two-tail probability = It is used in a 95% confidence interval,

t - Distribution

t - Distribution The t Distribution with df = 6. 95% of the distribution falls between and These t -scores are used with a 95% confidence interval when n = 7. Question: Which t -scores with df = 6 contain the middle 99% of a t distribution (for a 99% confidence interval)?

Using the t Distribution to Construct a Confidence Interval for a Mean
Summary: 95% Confidence Interval for a Population Mean When the standard deviation of the population is unknown, a 95% confidence interval for the population mean m is: To use this method, you need: Data obtained by randomization An approximately normal population distribution

SUMMARY

ASSUMPTIONS AND CONDITIONS
INDEPENDENCE ASSUMPTION: THE DATA VALUES SHOULD BE INDEPENDENT. THERE’S REALLY NO WAY TO CHECK INDEPENDENCE OF THE DATA BY LOOKING AT THE SAMPLE, BUT WE SHOULD THINK ABOUT WHETHER THE ASSUMPTION IS REASONABLE. RANDOMIZATION CONDITION: THE DATA SHOULD ARISE FROM A RANDOM SAMPLE OR SUITABLY A RANDOMIZED EXPERIMENT.

ASSUMPTIONS AND CONDITIONS
10% CONDITION: THE SAMPLE IS NO MORE THAN 10% OF THE POPULATION. NORMAL POPULATION ASSUMPTION OR NEARLY NORMAL CONDITION: THE DATA COME FROM A DISTRIBUTION THAT IS UNIMODAL AND SYMMETRIC. REMARK: CHECK THIS CONDITION BY MAKING A HISTOGRAM OR NORMAL PROBABILITY PLOT.

CONSTRUCTING CONFIDENCE INTERVALS FOR MEANS
POINT ESTIMATOR: STANDARD ERROR: C% MARGIN OF ERROR:

WHERE tn-1* IS A CRITICAL VALUE FOR STUDENT’S t – MODEL WITH n – 1 DEGREES OF FREEDOM THAT CORRESPONDS TO C% CONFIDENCE LEVEL.

REMARK

ILLUSTRATIVE PICTURE

FINDING CRITICAL t - VALUES
Using t tables (Table T) and/or calculator, find or estimate the 1. critical value t7* for 90% confidence level if number of degrees of freedom is 7 2. one tail probability if t = 2.56 and number of degrees of freedom is 7 3. two tail probability if t = 2.56 and number of degrees of freedom is 7 NOTE: If t has a Student's t-distribution with degrees of freedom, df, then TI-83 function tcdf(a,b,df) , computes the area under the t-curve and between a and b.

EXAMPLES FROM MIDTERM EXAM III PRACTICE EXERCISES

Choosing the Sample Size for Estimating a Population Mean
In practice, you don’t know the value of the standard deviation, . You must substitute an educated guess for . Sometimes you can use the sample standard deviation from a similar study. When no prior information is known, a crude estimate that can be used is to divide the estimated range of the data by 6 since for a bell-shaped distribution we expect almost all of the data to fall within 3 standard deviations of the mean.

Other Factors That Affect the Choice of the Sample Size
The first is the desired precision, as measured by the margin of error, m. The second is the confidence level. The third factor is the variability in the data. The fourth factor is cost.

What if You Have to Use a Small n?
The t methods for a mean are valid for any n. However, you need to be extra cautious to look for extreme outliers or great departures from the normal population assumption. In the case of the confidence interval for a population proportion, the method works poorly for small samples because the CLT no longer holds.

Confidence Intervals for Difference in Two Population Means (Independent Samples)

Approximate CI for m1 – m2:
Confidence Intervals for Difference for the Difference Between Two Population Means Approximate CI for m1 – m2: where t* is the value in a t-distribution with area between -t* and t* equal to the desired confidence level. Approximate df difficult to specify. Use computer software or conservatively use the smaller of the two sample sizes and subtract 1.

Degrees of Freedom The t-distribution is only approximately correct and df formula is complicated (Welch’s approximation): Statistical software can use the above approximation, but if done by-hand then use a conservative df = smaller of n1 – 1 and n2 – 1.

Necessary Conditions Two samples must be independent and either:
Situation 1: Populations of measurements both bell-shaped, and random samples of any size are measured. Situation 2: Large (n  30) random samples are measured. But if there are extreme outliers, or extreme skewness, it is better to have an even larger sample than n = 30.

Example: Effect of a Stare on Driving
Randomized experiment: Researchers either stared or did not stare at drivers stopped at a campus stop sign; Timed how long (sec) it took driver to proceed from sign to a mark on other side of the intersection. No Stare Group (n = 14): 8.3, 5.5, 6.0, 8.1, 8.8, 7.5, 7.8, , 5.7, 6.5, 4.7, 6.9, 5.2, 4.7 Stare Group (n = 13): 5.6, 5.0, 5.7, 6.3, 6.5, 5.8, 4.5, , 4.8, 4.9, 4.5, 7.2, 5.8 Task: Make a 95% CI for the difference between the mean crossing times for the two populations represented by these two independent samples.

Checking Conditions Boxplots show … No outliers and no strong skewness. Crossing times in stare group generally faster and less variable.

Example: Effect on a Stare on Driving
Note: The df = 21 was reported by the computer package based on the Welch’s approximation formula.

Equal Variance Assumption and the Pooled Standard Error
May be reasonable to assume the two populations have equal population standard deviations, or equivalently, equal population variances: Estimate of this variance based on the combined or “pooled” data is called the pooled variance. The square root of the pooled variance is called the pooled standard deviation:

Pooled Standard Error

Pooled Degrees of Freedom (df)
Note: Pooled df = (n1 – 1) + (n2 – 1) = (n1 + n2 – 2).

Pooled Confidence Interval
Pooled CI for the Difference Between Two Means (Independent Samples): where t* is found using a t-distribution with df = (n1 + n2 – 2) and sp is the pooled standard deviation.

Example: Male and Female Sleep Times
Q: How much difference is there between how long female and male students slept the previous night? Data: The 83 female and 65 male responses from students in an intro stat class. Task: Make a 95% CI for the difference between the two population means sleep hours for females versus males. Note: We will assume equal population variances.

Two-sample T for sleep [with “Assume Equal Variance” option] Sex N Mean StDev SE Mean Female Male Difference = mu (Female) – mu (Male) Estimate for difference: % CI for difference: (-0.103, 1.025) T-Test of difference = 0 (vs not =): T-Value = P = DF = 146 Both use Pooled StDev = 1.72

Notes: Two sample standard deviations are very similar. Sample mean for females higher than for males. 95% confidence interval contains 0 so cannot rule out that the population means may be equal.

Pooled Standard Deviation and Pooled Standard Error “by – hand”:

Pooled or Unpooled? If the larger sample size produced the larger standard deviation, the pooled procedure is acceptable because it will be conservative. If the smaller standard deviation accompanies the larger sample size, the pooled test can be quite misleading and not recommended. If sample sizes are equal, the pooled and unpooled standard errors are equal. Unless the sample standard deviations are quite similar, it is best to use the unpooled procedure.

Confidence Interval for the Difference in Two Population Means
Make sure appropriate conditions apply checking sample size and/or a shape picture of the differences. Choose a confidence level. Compute the mean and std dev for each sample. Determine whether the std devs are similar enough to pooled procedure can be used. Calculate the appropriate standard error (pooled or unpooled). Calculate the appropriate df. Use Table A.2 (or software) to find the multiplier t*.

Examples From Midterm Exam III Practice Sheet

CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION

Similar presentations

Presentation on theme: "CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION

Similar presentations

Presentation on theme: "CHAPTER 10 CONFIDENCE INTERVALS FOR ONE SAMPLE POPULATION"— Presentation transcript:

Similar presentations

About project

Feedback