Hypothesis Testing for Proportions

Hypothesis Testing for Proportions

Some Terminology Because we are doing proportions today, there is going to be lots of notation involving the letter p, so let’s be clear about what we mean 𝑝: the “true” proportion in the population (which we don’t know) 𝑝 : the proportion in the sample 𝑝 0 : the null hypothesis for the true proportion P-value: the probability of getting a result at least as extreme as the one we get, if the null hypothesis is true

Refresher from last class
Remember that we are asking “if the null hypothesis were true, how likely would we be to get a result as extreme as the one we got?” We answer this with a p-value If the p-value is less than α, we reject the null hypothesis And accept the alternative

Z-scores Remember z-scores?
(value – mean) / standard deviation 𝑍= 𝑥−µ σ The formula that we are going to use today for a test statistic looks different than this But it really isn’t It is just using the mean and standard deviation of the sampling distribution Sampling distribution: distribution of means if we took MANY samples from the population So the mean of the sampling distribution is like the mean of the means And the standard deviation is the standard deviation of means

Which Sampling Distribution?
Remember from 2 chapters ago (and last chapter) The mean of our sampling distribution is p Since we typically don’t know p, we use 𝑝 as an estimate of p The standard deviation of our sampling distribution is 𝑝(1−𝑝) 𝑛 Since we typically don’t know p, we have used 𝑝 (1− 𝑝 ) 𝑛 Known as the standard error

Which Sampling Distribution?
When doing hypothesis tests, we are (temporarily) assuming that the null hypothesis is true So we will actually use 𝑃 0 instead of P or 𝑝 when calculating the standard deviation So the standard deviation of our sampling distribution would be 𝑃 0 (1− 𝑃 0) 𝑛

Basic Idea Here’s what we are about to do:
Once we have a (normal) sampling distribution, we can calculate the probability of being in a certain range of that distribution Specifically, the probability of getting a result at least as far from the mean as the one we actually got THIS IS A P-VALUE! We are going to do this in 2 ways: By hand using z-scores (“test statistics”) Using a calculator function

One Sample Z Test for a Proportion (by hand)
Step 1: Decide on your hypotheses and alpha level Step 2: take a random sample from your population of interest Step 3: make sure your conditions/requirements are met Since we are doing proportions: 𝑛 1− 𝑝 0 ≥10, 𝑛 𝑝 0 ≥10, and 𝑛≤ 𝑝𝑜𝑝. 10 Step 4: Calculate your z-score (“test statistic”): General form of a z-score: 𝑍= 𝑥−µ σ Our ‘value’ (x) is the proportion that we get in our sample: 𝑝 The mean (µ) is the mean of the sampling distribution: 𝑝 0 The standard deviation (σ) is the standard deviation of the sampling distribution: 𝑃 0 (1− 𝑃 0) 𝑛

This leaves us with a z-score (“test statistic”) formula of: 𝑍= 𝑝 − 𝑝 𝑝 0 (1− 𝑝 0 ) 𝑛 My recommendations: Don’t think of this as one whole formula—break it up into chunks We will do this in a minute Doing hypothesis tests by hand is the more tedious way to do it, but it also MUCH better at helping us to understand what is really going on

Step 5: Once you have a test statistic, you can use Table A or Normalcdf to figure out the probability of getting a result at least that extreme This is your p-value Step 6: reject or fail to reject your null hypothesis based on the p-value

Let’s try an example Remember Karl from last class? He’s back, and he’s still claiming that he makes 80% of his free throws. You still don’t believe him—you think he makes less than 80%. When you go to the gym with him, he only makes 32/50 free throws. What can we conclude about his claim? Step 1: State our hypotheses and alpha level 𝐻 0 :𝑝=0.80 𝐻 𝑎 :𝑝<0.80 α=0.05 (assumed, since the question didn’t tell us)

Let’s try an example Remember Karl from last class? He’s back, and he’s still claiming that he makes 80% of his free throws. You still don’t believe him—you think he makes less than 80%. When you go to the gym with him, he only makes 32/50 free throws. What can we conclude about his claim? Step 2: Take a random sample DONE—we are assuming that these 50 free throws are a random sample from all of his potential free throws

Let’s try an example Remember Karl from last class? He’s back, and he’s still claiming that he makes 80% of his free throws. You still don’t believe him—you think he makes less than 80%. When you go to the gym with him, he only makes 32/50 free throws. What can we conclude about his claim? Step 3: Check conditions: 𝑛 1− 𝑝 0 ≥10, 𝑛 𝑝 0 ≥10, and 𝑛≤ 𝑝𝑜𝑝. 10 50 1−0.8 =10, ≥10 =40, ≥10 50≤ 𝑖𝑛𝑓. 10

Let’s try an example Remember Karl from last class? He’s back, and he’s still claiming that he makes 80% of his free throws. You still don’t believe him—you think he makes less than 80%. When you go to the gym with him, he only makes 32/50 free throws. What can we conclude about his claim? Step 4: Calculate the test statistic Value: 0.64 Mean: 0.8 St dev: (1−0.8) 50 =0.0566 So 𝑍= 0.64− =−2.827

Let’s try an example Remember Karl from last class? He’s back, and he’s still claiming that he makes 80% of his free throws. You still don’t believe him—you think he makes less than 80%. When you go to the gym with him, he only makes 32/50 free throws. What can we conclude about his claim? Step 4: Calculate the z-score Value: 0.64 Mean: 0.8 St dev: (1−0.8) 50 =0.0566 So 𝑍= 0.64− =−2.83 So our result is 2.83 standard deviations BELOW the mean of the sampling distribution If the null hypothesis were true

Definition Test Statistic: How many standard deviations a sample deviates from what we would expect if the null hypothesis were true It is just a Z-score

Let’s try an example Remember Karl from last class? He’s back, and he’s still claiming that he makes 80% of his free throws. You still don’t believe him—you think he makes less than 80%. When you go to the gym with him, he only makes 32/50 free throws. What can we conclude about his claim? Step 5: Find the probability of getting a result at least extreme as the one we did (Z=-2.83) Table A: find Z=-2.83 along the left side P=value=0.0023 Normalcdf: normalcdf( , -2.83, 0, 1) P-value= Or just So this means that if the null hypothesis were true, there would only be a 0.23% chance of getting a sample as extreme as the one we got

Let’s try an example Remember Karl from last class? He’s back, and he’s still claiming that he makes 80% of his free throws. You still don’t believe him—you think he makes less than 80%. When you go to the gym with him, he only makes 32/50 free throws. What can we conclude about his claim? Step 6: reject or fail to reject the null hypothesis Earlier we said that alpha was 0.05. Because <0.05, we can reject the null hypothesis, and accept the alternative instead We conclude that Karl makes less than 80% of his free throws

Another Example

Solving it: Step 1: State Hypotheses Step 2: Take a random sample
Step 3: Check conditions Step 4: compute test statistic Step 5: compute p-value Step 6: Draw conclusions (reject or fail to reject)

𝐻 0 :𝑝=0.08, 𝐻 𝑎 :𝑝>0.08 Already complete =40, =460, In order to satisfy the 10% condition, there must be at least 5,000 potatoes in the truck, which is probably reasonable 𝑍=1.15 Normalcdf(1.15,BIG, 0, 1)= p-value=0.125 Because 0.125>0.10, we FAIL TO REJECT THE NULL HYPOTHESIS that only 8% of the potatoes have blemishes. The producer therefore decides NOT to send these potatoes back—they will be made into potato chips.

Another Example (Two-sided test)
Step 1: Hypotheses

Step 1: Hypotheses 𝐻 0 :𝑝=0.50 𝐻 𝑎 :𝑝≠0.50 Step 2: complete Step 3: 150(0.5)≥10, large high school, so at least 1500 students Do step 4 on your own—find the test statistic

Z=2.449

Two-sided tests So, in general, when we are doing a 2-sided test
Also known as a two-tailed test We find the p-value just like we would for a one-sided test Then we double it to get the two-sided p-value So for the one-sided p-value we get 0.007 Once we double it, our actual p-value is 0.014 So at the α=0.05 level, we would reject the null hypothesis that 50% of students at the high school have never smoked. We would therefore conclude that the true proportion of students who have never smoked IS NOT EQUAL TO 50% Don’t say that it is greater than 50%, because then it would be a one-sided test

Using a Confidence Interval to Provide Extra Information
When we do a hypothesis test, we either reject the null or we don’t When we do reject the null, we might want a bit more information about what the true value actually might be Providing a confidence interval can accomplish this So for the last example, if we construct a confidence interval Either by hand or using 1-propZint on our calculator 1-PropZint(90, 150, .95) We get (.5216, .6784) So we are 95% confident that the proportion of students at the school who have never smoked is between 52.16% and 67.84%

Another way to Use a Confidence Interval
Thinking back to last chapter, we would construct a confidence interval, and if a value was not in that interval, it was not considered a “plausible” value, so we concluded that it wasn’t likely to be true This is essentially the same idea as a hypothesis test So if we use our sample to construct a confidence interval, and then the value for the null hypothesis is not in the range, we conclude that it is not a plausible value Saying that this null hypothesis is not plausible is roughly the same thing as rejecting the null hypothesis So if a null hypothesis is not in a 95% confidence interval, it is roughly the same thing as rejecting at an α=0.05 level If a hypothesis is not in a 99% confidence interval, it is roughly the same thing as rejecting at an α=0.01 level Etc.

Using your Calculator STAT---TESTS--- 1-PropZTest
𝑝 0 : Your null hypothesis 𝑥= 90 Your number of “successes” 𝑛=150 Your sample size Prop: ≠ 𝑝 0 What type of alternative hypothesis do you have? Then hit calculate. It gives us Z= and p-value= 0.014 Same as when we did it by hand

By hand vs. on our calculator
1-PropZTest Pros: Easier to show work—just write down what you’re doing Actually understand the calculation that is being done Cons: Takes more time More calculations Have to remember how to get the test statistic Pros: Quicker Don’t have to remember how to calculate the test statistic Cons: Less obvious how to show work Sometimes we don’t understand what the result actually means

Getting Full Credit if you use your Calculator
Still need to state your hypotheses Still need to show that conditions have been met (unless the questions tells you to “assume that all conditions for inference have been met”) State the type of test that you’re doing (“one proportion z test”) Once you do the test on your calculator, report the test statistic (Z) and the p- value Draw conclusions in context (“We reject the null hypothesis that 50% of students at the school have never smoked. We conclude instead that the proportion of students at the school who have never smoked is different from 50%”)

Samples vs Populations
Remember, all of this work is just because we don’t know the true population value If we did…we wouldn’t need to confidence intervals or hypothesis tests—we would already know the value, so we wouldn’t need to estimate or assume anything So, we only use these methods for SAMPLE values, not population values If we know the population value, much easier to test a null hypothesis: See next slide for an example

An Example Where we know the population value
A student fails the chapter 8 AP Stats test, and goes to Mr. McClendon to complain about Mr. Wetherbee. The student claims that the class is unfair, and that at least half of students got a D or an F on the test. Mr. Wetherbee provides data showing that out of the 34 students that have taken the test, there were 6 D’s and and 3 F’s. Can we reject the student’s claim? 𝐻 𝑜 :𝑝=0.5, 𝐻 𝑎 :𝑝<0.5 p: proportion of students that got a D or an F YES!!!!!! 9/34 (26.5%) of students got a D or an F Since this is the entire population of Mr. Wetherbee’s stats students that have taken the test, I do not need a fancy hypothesis test The null hypothesis said that p was 0.5 The true value is 0.265 Therefore, the null hypothesis is clearly FALSE

A Final Example In 2016, 33 manatees were found dead in Volusia County, Florida. Of these, 12 were killed by boats/propellers. There were 520 total manatee deaths in Florida that year. Assume that the 33 manatees constitute a random sample from the 520 statewide. One Manatee conservation organization claims that 50% of all manatees die from boat-related incidents. Use a hypothesis test to test this claim at the α=0.05 level.

Answers (showing work for full credit)
𝐻 0 :𝑝=0.50, 𝐻 𝑎 :𝑝≠ p: proportion of manatee deaths from boat- related incidents Check conditions: random, 33(0.5)=16.5, 33(1-0.5)=16.5, 33< One proportion Z Test: x: 12 N: 33 Z= P-value: 0.117 Because is greater than 0.05, we fail to reject the organization’s claim that 50% of manatees die from boating incidents. We do not have sufficient evidence against this claim, and therefore cannot accept the alternative hypothesis.

Hypothesis Testing for Proportions

Similar presentations

Presentation on theme: "Hypothesis Testing for Proportions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hypothesis Testing for Proportions

Similar presentations

Presentation on theme: "Hypothesis Testing for Proportions"— Presentation transcript:

Similar presentations

About project

Feedback