Estimating with Confidence

Estimating with Confidence
Chapter 6.1 Estimating with Confidence

Point estimation Sample mean is the natural estimator of the unknown population mean. Is the point estimation a good method? 1. It may never hit the true value (population mean). 2. We have no idea about the variability of the estimation. Therefore, we have no confidence about how close our estimator is to the true value. Net Gun (flyswatter) Taser Gun Idea: It is better to use an INTERVAL than a POINT estimator.

Review: Chapter 1.3: All Normal curves N(m,s) share 68-95-99.7 Rule
About 68% of all observations are within 1 SD (s) of mean (m). Called: C=68%, z*≈1 About 95% of all observations are within 2 s of the mean m. Called: C=95%, z* ≈ 2 Almost all (99.7%) observations are within 3 s of the mean. Called: C=99.7%, z* ≈ 3 Going to an example from the book on women’s heights, the mean here was 64.5, standard deviation 2.5 inches. When we talk about the mean and standard deviation with respect to the curve instead of the actual sample, we use different notation. Mu for mean, sigma for sd. If you consider the area under the curve to represent all of the individuals, then you can divide it into chunks to represent parts of the whole. Like if you divided it down the middle, half of the people are in each half. Here it is divided up into parts not through the middle but by lines that are 1, 2 or 3 standard deviations away from the mean. If you look at the center, pink part, it is the area 1 sd on either side of the mean. By definition for normal curves, this area is 68% of the total. So if you know the mean and sd, you also know that 68% of women are between 62 and 67 inches tall. Similarly for the areas defined by lines drawn 2 or 3 sd from the mean. We might want to know what percent of women are over 72 inches tall. That is 3 sd. We can see that 99.7 percent of women are less than 72 or greater than 57. Or that .3 percent of women are really tall or really short. Since the distribution is symmetric, we can divide by two to find the percent of women that are really tall: .15% You need to be able to work problems like I just did - bunch in book. But what if you want to know something not defined by the sd? Like, what percentage of women are taller than 68 inches? Know that half are smaller than And that half of this middle area, 34%, are smaller than 67 inches, so = 84% are smaller than 67, or 16% are larger than 67 inches. But you want to know the proportion larger than 68 inches. You can look this up on a table, but first you have to do something called standardizing. The reason is that although all normal curves share the properties shown above, they differ by their mean and standard deviation. You would have to have a different table for every curve. When you standardize a normal distribution, you change it so the mean is 0 and the sd is 1. Any normal distribution can be standardized. Standard Normal Distribution N(0, 1) Reminder: µ (mu) is the mean of the idealized curve, while x¯ is the mean of a sample. s (sigma) is the standard deviation of the idealized curve, while s is the s.d. of a sample.

Confidence levels Confidence intervals contain the population mean m in C% of samples. Different areas under the curve give different confidence levels C. z*: z* is related to the chosen confidence level C. C is the area under the standard normal curve between −z* and z*. C The confidence interval is thus: −z* z* Example: For an 80% confidence level C, 80% of the normal curve’s area is contained in the interval.

Point estimation versus interval
When population mean (µ) is unknown, it is better to use an interval than a point to estimate it. The theory behind interval estimation looks at the sampling distribution of the statistic. Confidence level C- CI for the population mean µ is : For a particular confidence level, C, the appropriate z* value is given in the last row of Table D. Example: For a 98% confidence level, z*=2.326

Specific Confidence Intervals for population mean
99% CI for the population mean µ is : i.e.: C=99%, z*=2.576 95% CI for the population mean µ is : i.e.: C=95%, z*=1.960 90% CI for the population mean µ is : i.e.: C=90%, z*=1.645

Link between confidence level and margin of error
The margin of error depends on z. Higher confidence C implies a larger margin of error m (thus less precision in our estimates). A lower confidence level C produces a smaller margin of error m (thus better precision in our estimates). C z* −z* m m

Example 1 The average lifetime of 36 randomly selected certain brand TVs is 20 years. Suppose the SD of all TVs is 2 years. Construct a 95% CI for the average lifetime of all TVs from this brand.

Example 2 1. The average height of 100 randomly selected UNCW students is 5.9 feet. Suppose the SD of the heights of all students is 1.2 feet. Construct 99%, 95% and 90% CIs for the average height of all students. Note: Confidence level C gets smaller, CI gets smaller

Example 2 (Continue) 1. The average height of 100 randomly selected UNCW students is 5.9 feet. Suppose the SD of the heights of all students is 1.2 feet. Find MOE and construct a 95% CI for average height of all students. Note: Confidence level C gets smaller, CI gets smaller 2. (Continue…) Select another set of 100 UNCW students randomly. The average height of second set of 100 students is 5.5 feet. Suppose the SD of the heights of all students is 1.2 feet. Find MOE and construct 95% CIs for average height of all students.

Outlines for Z* Z* depends on the level of confidence C.
What does “confidence” mean? This idea is only true for simple random samples and completely randomized experiments. Margin of error: Z*/√(n)

Understanding of Confidence Intervals
With 95% confidence, we can say that µ should be within roughly 2 standard deviations (that is, 2*s/√n) from our sample mean . About 95% of all possible samples of this size n, µ will indeed fall in our confidence interval. About only 5% of samples would be farther from µ. applet.

Example 3

Summary to Confidence Interval
If Confidence level C gets larger and n stays the same, what will happen to z*, MOE, CI, and prediction precision? If Z* and  stay the same, when n goes bigger, what will happen to MOE and CI?

Chapter 6.2 Hypothesis Testing

6.2 Tests of hypothesis 5 Steps to Hypothesis Testing
State the hypothesis State the level of significance Calculate the test statistic Find the p-value Conclusion (both statistical and non-statistical)

Hypothesis Testing The idea of hypothesis testing is to use the data to make a decision. In hypothesis testing, there are only two “decisions”, also called hypotheses, in which the data could support. The two hypotheses are called the null hypothesis and the alternative hypothesis. Forms of Null and Alternative Hypotheses

Null Hypothesis Expectation --- what somebody believes or claims before the sample available. Null hypothesis: the hypothesis you assume to be true, the one you are comparing against your data. denoted by H0. Many times the null hypothesis is a statement of “no effect” or of “no difference”… “Being fair” E.g.: Last year, your company’s service technicians took an average of 2.6 hours to response to trouble calls from business customers who had purchased service contracts. Do this year’s data show a lower average response time?

Alternative Hypothesis
Expectation is not correct --- the difference between the expectation and sample statistic is real. Alternative hypothesis: express the hopes or suspicions we bring to data. The test is designed to assess the strength of evidence against the null hypothesis, denoted by Ha . It’s a statement that “supports” the information from the data. E.g.: Last year, your company’s service technicians took an average of 2.6 hours to response to trouble calls from business customers who had purchased service contracts. Do this year’s data show a lower average response time?

Example 1 Exercise 6.55 (p. 391): Translate each of the following research questions into appropriate H0 and Ha. Census Bureau data show that the mean household income in the area served by a shopping mall is $62,500 per year. A market research firm questions shoppers at the mall to find out whether the mean household income of mall shoppers is higher than that of the general population.

Example 1 cont. Exercise 6.55 (p. 391):
Translate each of the following research questions into appropriate H0 and Ha. b) Last year, your company’s service technicians took an average of 2.6 hours to response to trouble calls from business customers who had purchased service contracts. Do this year’s data show a different average response time? b) H0 : µ = 2.6 hours verse Ha : µ ≠ 2.6 hours.

6.2 Tests of hypothesis--Review
E.g.: Last year, your company’s service technicians took an average of 2.6 hours to response to trouble calls from business customers who had purchased service contracts. Do this year’s data show a lower average response time? E.g.: Census Bureau data show that the mean household income in the area served by a shopping mall is $62,500 per year. A market research firm questions shoppers at the mall to find out whether the mean household income of mall shoppers is higher than that of the general population. E.g.: Last year, your company’s service technicians took an average of 2.6 hours to response to trouble calls from business customers who had purchased service contracts. Do this year’s data show a different average response time?

6.2 Tests of hypothesis 5 Steps to Hypothesis Testing
State the hypothesis State the level of significance (α=0.05 unless otherwise stated) Calculate the test statistic Find the p-value Conclusion (both statistical and non-statistical)

One-sided and two-sided tests for P-value
A two-tail or two-sided test of the population mean has these null and alternative hypotheses: H0 : µ = [a specific number] Ha : µ  [a specific number] A one-tail or one-sided test of a population mean has these null and alternative hypotheses: H0 : µ = [a specific number] Ha : µ < [a specific number] OR H0 : µ = [a specific number] Ha : µ > [a specific number]

Find P-value The P-value is the area under the sampling distribution for values at least as extreme, in the direction of Ha, as that of our random sample. Use Table A, or NORMALCDF in calculator. e.g. H0 : µ = 2.6 hours verse Ha : µ < 2.6 hours gives test statistic Z=-1.6. Q: Find the p-value. µ defined by H0 Sampling distribution σ/√n

P-value in one-sided and two-sided tests
One-sided (one-tailed) test Two-sided (two-tailed) test To calculate the P-value for a two-sided test, use the symmetry of the normal curve. Find the P-value for a one-sided test, and double it.

Example 3: One-sample Z-test
A test of the null hypothesis H0 : µ = µ0 gives test statistic Z=-1.6 a) What is the P-value if the alternative is Ha : µ > µ0 ? b) What is the P-value if the alternative is Ha : µ < µ0 ? c) What is the P-value if the alternative is Ha : µ ≠ µ0 ?

How to do 5 steps State H0 and Ha
State the level of significance (Usually α is 5% ). Calculate the test statistic (ASSUMING THE NULL HYPOTHESIS IS TRUE) Find the P-value, that is the probability in the direction of Ha. Draw Conclusion: If P-value ≤ α, then we reject H0 (Enough evidence). If P-value > α, then we do not reject H0 (No Enough evidence). Note: The two possible conclusions are rejecting or not rejecting H0.

Example 2: The P-value for a significance test is 0.032
a) Do you reject the null hypothesis at level α = 0.05? b) Do you reject the null hypothesis at level α = 0.01? c) Explain your answers. Note that: If P-value ≤ α, then we reject H0 (Enough evidence). If P-value > α, then we do not reject H0 (No Enough evidence).

Chap 5: Sampling distribution of a sample mean=distribution of
Population

One-sample Z-test for population mean:
Test statistics is a Z-score to the sampling distribution of the sample mean (see chapter 5.1)

Example 4: One-sample Z-Test (one sided)
The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 with a population SD=15. The medical director of a company looks at the medical records of 72 company executives in this age group and finds that the mean systolic blood pressure in this sample is Is this evidence that executives blood pressures are lower than the national average?

Example 5: One-sample Z-Test(two sided)
A new medicine treating cancer was introduced to the market decades ago and the company claimed that on average it will prolong a patient’s life for 5.2 years. Suppose the SD of all cancer patients is 2.52. In a 10 years study with 64 patients, the average prolonged lifetime is 4.6 years. With normality assumption, do the 10-year study’s data show a different average prolonged lifetime?

Example 5.1 (Based on Example 5)
Find a 95% confidence interval for the average prolonged lifetime for all patients. A 95% CI for the average prolonged lifetime for all patients is given by: [3.9826, ] Note: Since H0 : µ =5.2, we have µ 0 =5.2 which falls inside the 95% CI. We are therefore 95% confident that µ is equal to 5.2. Therefore, we did not reject H0 at the level of 5%.

Confidence intervals to test hypotheses
For a level a two-sided significance test: Rejects H0: m = m0 exactly when the hypothesized value m0 falls outside a level (1-a)100% confidence interval for m . α /2 In a two-sided test, C = 1 – α. C confidence level α significance level

Logic of confidence interval test
Ex: Your sample gives a 99% confidence interval of With 99% confidence, could samples be from populations with µ = 0.86? µ = 0.85? Cannot reject H0: m = 0.85 Reject H0 : m = 0.86 99% C.I.

Example 6: The P-value for a two-sided test of the null hypothesis
H0 : µ = 30 is 0.04. Does the 95% confidence interval include the value 30? Why? Does the 90% confidence interval include the value 30? Why? Does the 99% confidence interval include the value 30? Why? Note that In a two-sided test, C = 1 – α. C confidence level , α significance level

Example 7: A 90% confidence interval for a population mean is (12, 15). a) Can you reject the null hypothesis that H0 : µ = 13 at the 10% significance level? Why? b) Can you reject the null hypothesis that H0 : µ = 10 at the 10% significance level? Why?

Multiple Choice Questions
The P-value for a two-sided test of the null hypothesis is 0.09, a) the 99% confidence interval includes the value 30. b) the 95% confidence interval includes the value 30. c) the 90% confidence interval does not include the value 30. d) All of the above are correct.

Estimating with Confidence

Similar presentations

Presentation on theme: "Estimating with Confidence"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Estimating with Confidence

Similar presentations

Presentation on theme: "Estimating with Confidence"— Presentation transcript:

Similar presentations

About project

Feedback