Confidence Interval (CI) for a Proportion

Slides:



Advertisements
Similar presentations
Inference about a Population Proportion
Advertisements

Chapter 10: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 10: Estimating with Confidence
The Diversity of Samples from the Same Population Thought Questions 1.40% of large population disagree with new law. In parts a and b, think about role.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Confidence Intervals with proportions a. k. a
Inference for a Population Proportion
Chapter 19 Confidence Intervals for Proportions.
Confidence Intervals for Proportions
Chapter 19: Confidence Intervals for Proportions
Confidence Intervals for
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 10: Estimating with Confidence
1 CHAPTER 7 Homework:5,7,9,11,17,22,23,25,29,33,37,41,45,51, 59,65,77,79 : The U.S. Bureau of Census publishes annual price figures for new mobile homes.
10.3 Estimating a Population Proportion
From Sample to Population Often we want to understand the attitudes, beliefs, opinions or behaviour of some population, but only have data on a sample.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
7-1 Estim Unit 7 Statistical Inference - 1 Estimation FPP Chapters 21,23, Point Estimation Margin of Error Interval Estimation - Confidence Intervals.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
PARAMETRIC STATISTICAL INFERENCE
Significance Tests: THE BASICS Could it happen by chance alone?
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Ch 8 Estimating with Confidence. Today’s Objectives ✓ I can interpret a confidence level. ✓ I can interpret a confidence interval in context. ✓ I can.
Statistical Interval for a Single Sample
Section 2 Part 2.   Population - entire group of people or items for which we are collecting data  Sample – selections of the population that is used.
Chapter 18: Sampling Distribution Models
1 Chapter 18 Sampling Distribution Models. 2 Suppose we had a barrel of jelly beans … this barrel has 75% red jelly beans and 25% blue jelly beans.
Making Inferences. Sample Size, Sampling Error, and 95% Confidence Intervals Samples: usually necessary (some exceptions) and don’t need to be huge to.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Section 10.1 Confidence Intervals
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Inference about a Population Proportion BPS chapter 19 © 2010 W.H. Freeman and Company.
Copyright © 2010 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Chapter 8: Estimating with Confidence
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
+ DO NOW. + Chapter 8 Estimating with Confidence 8.1Confidence Intervals: The Basics 8.2Estimating a Population Proportion 8.3Estimating a Population.
Chapter 19 Confidence intervals for proportions
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Ch. 18 – Sampling Distribution Models (Day 1 – Sample Proportions) Part V – From the Data at Hand to the World at Large.
CONFIDENCE INTERVALS: THE BASICS Unit 8 Lesson 1.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Warm Up In May 2006, the Gallup Poll asked 510 randomly sampled adults the question “Generally speaking, do you believe the death penalty is applied fairly.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Statistics 19 Confidence Intervals for Proportions.
Chapter 8: Estimating with Confidence
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
Confidence Intervals for Proportions
Confidence Intervals with proportions a. k. a
Ch. 18 – Sampling Distribution Models (Day 1 – Sample Proportions)
The Diversity of Samples from the Same Population
Confidence Intervals for Proportions
Chapter 8: Inference for Proportions
Chapter 8: Estimating with Confidence
Chapter 10: Estimating with Confidence
Estimating a Population Proportion
Pull 2 samples of 10 pennies and record both averages (2 dots).
Confidence Intervals for Proportions
Confidence Intervals for Proportions
Lecture Slides Elementary Statistics Twelfth Edition
Confidence Intervals for Proportions
Presentation transcript:

Confidence Interval (CI) for a Proportion http://www.rossmanchance.com/iscam2/applets/BinomDist/BinomDist.html http://www.shodor.org/interactivate/activities/AdjustableSpinner/?version=1.5.0_06&browser=MSIE&vendor=Sun_Microsystems_Inc

Critical Values (95% confidence)

Critical Values (95% confidence)

Critical Values (95% confidence) Look up 0.975 (area to the left) to get the positive z score; 0.025 for the negative. 0.95 0.025 0.025

Mean / S D of a Sample Proportion The sample proportion (a statistic) is the count X divided by the sample size n. The sample proportion If both the expected count of Successes and Failures are at least 10 (np and n(1 – p) both  10), has approximate Normal distribution.

C% Confidence Interval With approximately C% = (1 – )100% probability Given results from an appropriately obtained sample… With approximately C% confidence

Confidence Interval for a (Population) Proportion p This is the sample proportion. The Z value comes from the confidence C%.

C% Confidence Interval for p Conditions for use of this method… Random sample from a categorical population – 2 categories (S / F) If sample w/o replacement: Population at least 20 times the sample size At least 10 Successes and Failures in the sample (this ensures that the Normal is appropriate) When we collect data from 1 random sample and compute the sample proportion, the interval of values is a C% confidence interval (CI) for p.* * which, in this type of application, is unknown.

Example What proportion of students smoke? p = ?? N = ?? p = probability a student smokes = proportion of all students who smoke A simple random sample of 368 students is surveyed. n = 368 X = # of sampled students who smoke varies depending on the sample

p = ?? n = 368 X = # of sampled students who N = ?? smoke. Varies. The survey is conducted: 79 of the 368 are smokers. X = 79 is observed for one sample. (Other samples would yield (somewhat) different values.) n – X = 368 – 79 = 289 79/368 = 0.215 = Not p (nearly impossible it’s exactly p). = 0.215 is the statistic estimating the parameter p. the sample (observed) proportion the point estimate of p

The book would say…true proportion of students who smoke. A simple random sample of 368 students finds that 79 smoke. Obtain a 95% confidence interval for the proportion of all students who smoke. The book would say…true proportion of students who smoke. Each person in the population is a Success (smoker) or Failure (nonsmoker); the sample is random The population is huge (much bigger than 20(368)) There are 79 smokers; 289 nonsmokers. Both are well above 5. The confidence interval based on the Normal distribution can be used.

0.215 ± 0.042 (within 0.042 of 0.215) 0.215 – 0.042 = 0.173 0.215 + 0.042 = 0.257 0.173 < p < 0.257 Between 0.173 and 0.257. Interpretation We are approximately 95% confident that the proportion of all* students who smoke is between 0.173 and 0.257. *It’s important to say “all” or “population proportion”

Confidence Interval Example Proper formatting of CIs: 0.215  0.042 (0.173, 0.257) 0.173 to 0.257 0.173  p  0.257 For the last three: The low value is written first. All CIs should be accompanied by a statement interpreting them, including the confidence level (here 95%) and an indication that you are making a statement about an unknown parameter p = population proportion.

Confidence Interval for p To use this formula we need Random sample from a categorical population – 2 categories (S / F) If sampling w/o replacement: Population at least 20 times the sample size At least 5 Successes and Failures in the sample (this ensures that the Normal is appropriate; 10 is better) If one of these is violated: The confidence is not really the value of C used in the formula. If the sample is not random the confidence associated with this method could be anything – but is likely to be much lower than C.

Example 2 A marketer works for an electronics store. He wishes to estimate the percent of coupons that will be redeemed at the stores. 927 customers are randomly sampled and sent coupons; 27 of them redeem their coupon. Obtain a 90% CI for the proportion of all customers that redeem this coupon. Check conditions:

Example 2 Of the 927 coupons, 27 are redeemed… 1.64 or 1.65 either is fine Don’t overround. Keep at least 3 significant figures in intermediate results.

Example 2 Error margin…

Example 2 Error margin… 0.9709 = (1 – 0.0291) proportion not redeemed

Example 2 Error margin… Be careful. These can be small. Keep at least 3 significant figures.

Example 2 Error margin…

Example 2 Of the 927 coupons, 27 are redeemed… We are (approximately) 90% confident that between 2.00% and 3.82% of all coupons will be redeemed.

Example 3 What proportion of voters currently approve of the President’s handling of the economic situation? p = _____________________________ p = probability voter approves = proportion of all voters who approve A random sample of 1000 likely voters is taken, using random digit dialing. n = 1000 X = # of sampled voters approve

Example 3 p = ?? n = 1000 X = # of sampled voters who approve. Varies. The survey is conducted: 557 of the sampled voters approve. Compute X/n = __________ . Which of the following is correct? Fill in the blanks with the appropriate terms… 0.557 is the statistic estimating the parameter p. 0.557

Example 3 p = ?? n = 100 X = # of sampled voters who approve. Varies. The survey is conducted: 557 of the sampled voters approve. Compute X/n = __________ . Which of the following is correct? Fill in the blanks 0.557 is the statistic estimating the parameter p. 0.557

Example 3 Obtain a 95% confidence interval for the proportion of all people who approve. 1st: Check Conditions Random sample from a categorical population – 2 categories (S / F) Population at least 20 times the sample size At least 5 Successes and Failures in the sample (this ensures that the Normal is appropriate; 10 is better) The number of Successes and Failures are 557 and 443 respectively, both well above 5. The confidence interval based on the Normal distribution can be used.

Summary of Information

Summary of Information

Final Numbers Within 0.031of 0.557… 0.557 – 0.031 = 0.526 0.557 + 0.031 = 0.588 Between 0.526 and 0.588. Any of these suffices… 0.557  0.031 (0.527, 0.588) 0.527 to 0.588 0.527  p  0.588

Assessing the Error Margin The error margin covers random sampling errors. It does not account for errors due to improper sampling, or inaccurate data collection. Is the sample drawn from a collection of units that may not be representative of the entire population? If so, perhaps the interval is appropriate for the population defined by that collection. That is: Define a new (reduced) population. Is any judgment required in categorizing the units as Success and Failure?

What Confidence Means Imagine a population for which 45% of the population approves of the state’s governor. Consider all samples of size n = 1000 from this population. For each sample a 90% CI is obtained. Before any sampling is done…before any data is collected: The probability of a randomly chosen sample giving a CI that “covers” the parameter of p = 0.45 is 0.90.

What Confidence Means 90% of black intervals cover the blue line at p. The probability of a random sample giving a CI that “covers” the parameter of p = 0.45 is 0.90. 90% of black intervals cover the blue line at p. 90% of all 90% CIs “cover” the estimated parameter.

What Confidence Means A histogram of the black dots would be Normal, with mean 0.45. Approximately 10% of the time, the black dot would be far enough from 0.45 so that the interval (roughly  0.026) would not cover 0.45.

What Confidence Means Example 4: A random sample of 2136 adults was asked, “Do you favor or oppose abolishing the penny?” 59% answered “oppose.” 0.59  2136 = 1260.24 1260 answered “oppose.” 1260/2136 = 0.5899 = 0.590 to 3 significant digits. 59.0%  2.1% is a 95% confidence interval.

What Confidence Means We are 95% confident that between 56.9% and 61.1% of all Americans oppose abolishing the penny. (p represents this unknown proportion.) In a real study: Exactly one random sample is chosen. Once the data is recorded there is nothing random (certainly p is not random). The location of the “blue line” is unknown. It exists: We just don’t know where. We don’t know whether or not p is covered (the probability is either 0 or 1). We use the word confidence after the random sample is drawn. We don’t use the word probability (unless we are explaining what confidence is).

Quiz A sample of jokes from The Daily Show found that 83 of 252 were of a political nature. Assume this was a random sample from all jokes. Then a 95% confidence interval is (0.271, 0.387). Answer true or false.

Quiz A sample of jokes from The Daily Show found that 83 of 252 were of a political nature. Assume this was a random sample from all jokes. Then a 95% confidence interval is (0.271, 0.387). Answer true or false. 95% of jokes are political in nature. FALSE. 95% is the confidence we have in the result, it has nothing to do with the prevalence (in the sample or for the entire population) of political jokes on The Daily Show).

Quiz A sample of jokes from The Daily Show found that 83 of 252 were of a political nature. Assume this was a random sample from all jokes. Then a 95% confidence interval is (0.271, 0.387). Answer true or false. We are 95% confident that between 27.1% and 38.7% of the sampled jokes were political in nature.

Quiz A sample of jokes from The Daily Show found that 83 of 252 were of a political nature. Assume this was a random sample from all jokes. Then a 95% confidence interval is (0.271, 0.387). Answer true or false. We are 95% confident that between 27.1% and 38.7% of the sampled jokes were political in nature. FALSE. 83 / 252 = 0.329. The probability is 100% that the sample proportion lies within the bounds of the interval – it centers the interval and always falls within the bounds.

Quiz A sample of jokes from The Daily Show found that 83 of 252 were of a political nature. Assume this was a random sample from all jokes. Then a 95% confidence interval is (0.271, 0.387). Answer true or false. The confidence is 0.95 that another random sample of jokes would have between 0.271 and 0.387 of the jokes political in nature. FALSE. Confidence intervals are not intended to predict what will happen with other random samples. They estimate a parameter (in this case, p).

Quiz A sample of jokes from The Daily Show found that 83 of 252 were of a political nature. Assume this was a random sample from all jokes. Then a 95% confidence interval is (0.271, 0.387). Answer true or false. The probability is 0.95 that between 0.271 and 0.387 of all jokes on The Daily Show are political in nature. FALSE. The probability is either 0 or 1 – we just don’t know what p is. Probability refers to an outcome that has uncertainty due to randomness. The uncertainty here is due to ignorance.

Quiz Answer true or false. That’s it! TRUE! A sample of jokes from The Daily Show found that 83 of 252 were of a political nature. Assume this was a random sample from all jokes. Then a 95% confidence interval is (0.271, 0.387). Answer true or false. 95% of all samples of The Daily Show jokes give an interval that cover p = the proportion of all jokes that are political in nature. Our 1 sample, randomly drawn, gives (0.271, 0.387). We don’t know if p is in there or not, but we are 95% confident it is. That’s it! TRUE!

Polls apart: Why polls vary on presidential race The groups pollsters randomly choose to interview are bound to differ from each other, and sometimes do significantly. Every poll has a margin of sampling error, usually around 3 percentage points for 1,000 people.* That means the results of a poll of 1,000 people should fall within 3 points of the results you would expect had the pollster instead interviewed the entire population of the U.S. But — and this is important — the results are expected to be that accurate only 95 percent of the time. That means that one time in 20, pollsters expect to interview a group whose views are not that close** to the overall population's views. * Using p = 0.5 at 95% confidence gives n  1068 ** not within the error margin ^

Example Suppose we randomly sample people for a telephone poll on the issue of Presidential approval. We’ll sample 1000 people, using 95% confidence. People of different political leanings have systematically different behaviors. Refusing telephone surveys is one such behavior.

Example Suppose (to oversimplify) that in the population 88 million people approve of the President and 72 million disapprove. So the President’s approval rating is p = 88/160 = 0.55. But… The people that approve of the President are crankier than those that do not. They are less likely to put up with an intruding phone call. In fact, 40% of the approvers will not respond (that’s 35.2 million people). The disapprovers are more willing to take the call: only 10% of them will refuse (that’s 7.2 million people).

Example Approve Disapprove Total Respond 52.8 64.8 117.6 Refuse 35.2 7.2 42.4 88.0 72.0 160.0 Among everyone the approval rate is 55%. Among responders, the approval rate is 52.8 / 117.6 = 45% The CI formed from the data estimates 45% (not 55%).

How Confident Are We?

How Confident Are We? The probability of a random sample giving a CI that “covers” the parameter of p = 0.55 is essentially 0 (and certainly not even close to 0.95). The sample proportion is a biased (to the low side) estimate of the population proportion p = 0.55. Statistical bias is procedural, not “individual.” You may (but it’s probably not likely) use the wrong method and get the right answer. This is a biased method. If you use the right method and get the wrong answer (which happens only 5% of the time) your method is not biased.

Our Confidence is Shot We’d have 0% confidence in such a procedure. CIs handle only “errors” due to randomization. It should not be (but IS!) called “margin of error.” It should be called “margin of variability (at 95% confidence).” If other errors exist and aren’t accounted for, the confidence you have should be (probably much) lower than the stated confidence. Many other types of errors are very difficult to account for in a scientific way.

Polls apart: Why polls vary on presidential race Q: Don't pollsters simply ask questions, tally the answers and report them? No. …they adjust the answers* to make sure they reflect Census Bureau data… But some pollsters make these adjustments differently than others. * Not really. They adjust the percentages. The individual respondents answers are sacred.

Polls apart: Why polls vary on presidential race …in a country where barely more than half of eligible voters usually show up for presidential elections, pollsters want their polls to reflect the views of those likeliest to vote. Q: Is that hard to do? A: Quite hard… …nobody is 100 percent sure how to do this properly. And the challenge is being compounded this year because many think Obama's candidacy could spark higher turnout than usual from certain voters, including young voters and minorities. The question pollsters face is whether, and how, to adjust their tests for likely voters to reflect this.

Polls apart: Why polls vary on presidential race Q: Are people always willing to tell pollsters who they're supporting for president? A: No, and that's another possible source of discrepancies. Some polling organizations gently prod people who initially say they're undecided for a presidential preference, others do it more vigorously. The AP's poll, for example, found 9 percent of likely voters were undecided, while the ABC-Post survey had 2 percent.

Love, Sex and the Changing Landscape of Infidelity …surveys appearing in sources like women’s magazines may overstate the adultery rate, because they suffer from what pollsters call selection bias: the respondents select themselves and may be more likely to report infidelity.