Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inference with Proportions II Hypothesis Testing Using a Single Sample.

Similar presentations


Presentation on theme: "Inference with Proportions II Hypothesis Testing Using a Single Sample."— Presentation transcript:

1 Inference with Proportions II Hypothesis Testing Using a Single Sample

2 Write down the first number that you think of for the following... Pick a two-digit number between 10 and 50, where both digits are ODD and the digits do not repeat.

3 What possible values fit this description? Record your answer on the dotplot on the board. What do you notice about this distribution? Did you expect this to happen?

4 What proportion of the time would I expect to get the value 37 if the values were equally likely to occur? Let’s simulate what the distribution would look like if the true proportion of people who would answer 37 is 0.125.

5 Let: n = class sizep = 0.125 x = number of students who select 37 Generate random values of x by using: randBin(n,p) Generate 4 random values of x and find the for each. Plot each on the dotplot on the board.

6 Is the difference in these proportions significant? This is what a hypothesis test does.

7 Sharing prescription drugs with others can be dangerous. Is this a common occurrence among teens? OR The National Association of Colleges and Employers stated that the average starting salary for students graduating with a bachelor’s degree in 2010 is $48,351. Is this true for your college? How do we answers questions like these using sample data? In this unit, we will use sample data to test some claim or hypothesis about the population characteristic to see if it is plausible. To do this, we use a test of hypotheses or test procedure.

8 What is a test of hypotheses? A test of hypotheses is a method that uses sample data to decide between two competing claims (hypotheses) about the population characteristic. Is the value of the sample statistic... –a random occurrence due to sampling varability? OR Is it one of the values of the sample statistic that are likely to occur? Is it one that isn’t likely to occur? – a value that would be considered surprising?

9 Hypothesis statements: The null hypothesis, denoted by H 0, is a claim about a population characteristic that is initially assumed to be true. The alternative hypothesis, denoted by H a, is the competing claim. You are usually trying to determine if this claim is believable. To determine what the alternative hypothesis should be, you need to keep the research objectives in mind. The hypothesis statements are ALWAYS about the population – NEVER about a sample!

10 Let’s consider a murder trial... What is the null hypothesis? What is the alternative hypothesis? H 0 : the defendant is innocent H a : the defendant is guilty To determine which hypothesis is correct, the jury will listen to the evidence. Only if there is “evidence beyond a reasonable doubt” would the null hypothesis be rejected in favor of the alternative hypothesis. If there is not convincing evidence, then we would “fail to reject” the null hypothesis. Remember that the actually verdict that is returned is “GUILTY” or “NOT GUILTY”. We never end up determining the null hypothesis is true – only that there is not enough evidence to say it’s not true. This is what you assume is true before you begin. You are trying to determine if the evidence supports this claim. So we will make one of two decisions: Reject the null hypothesis Fail to reject the null hypothesis

11 The Form of Hypotheses: Null hypothesis H 0 : parameter = hypothesized value Alternative hypothesis H a : parameter > hypothesized value H a : parameter < hypothesized value H a : parameter ≠ hypothesized value This hypothesized value is a specific number determined by the context of the problem The null hypothesis always includes the equal case. Notice that the alternative hypothesis uses the same population characteristic and the same hypothesized value as the null hypothesis. This sign is determined by the context of the problem. This one is considered a two-tailed test because you are interested in both direction. These are considered one- tailed tests because you are only interested in one direction. Let’s practice writing hypothesis statements.

12 Sharing prescription drugs with others can be dangerous. A survey of a representative sample of 592 U.S. teens age 12 to 17 reported that 118 of those surveyed admitted to having shared a prescription drug with a friend. Is this sufficient evidence that more than 10% of teens have shared prescription medication with friends? State the hypotheses : The true proportion p of teens who have shared prescription medication with friends H 0 : p =.1 H a : p >.1 What is the hypothesized value? What words indicate the direction of the alternative hypothesis? What is the population characteristic of interest?

13 Compact florescent (cfl) lightbulbs are much more energy efficient than regular incandescent lightbulbs. Ecobulb brand 60-watt cfl lightbulbs state on the package “Average life 8000 hours”. People who purchase this brand would be unhappy if the bulbs lasted less than 8000 hours. A sample of these bulbs will be selected and tested. State the hypotheses : The true mean (  ) life of the cfl lightbulbs H 0 :  = 8000 H a :  < 8000 What is the hypothesized value? What words indicate the direction of the alternative hypothesis? What is the population characteristic of interest?

14 Because in variation of the manufacturing process, tennis balls produced by a particular machine do not have the same diameters. Suppose the machine was initially calibrated to achieve the specification of  = 3 inches. However, the manager is now concerned that the diameters no longer conform to this specification. If the mean diameter is not 3 inches, production will have to be halted. State the hypotheses : The true mean  diameter of tennis balls H 0 :  = 3 H a :  ≠ 3 What words indicate the direction of the alternative hypothesis? What is the population characteristic of interest?

15 For each pair of hypotheses, indicate which are not legitimate and explain why Must use a population characteristic - x is a statistics (sample) Must use same number as in H 0 ! H 0 MUST be “=“ ! Must be only greater than!

16 Facts to remember about hypotheses: Hypotheses ALWAYS refer to populations (use parameters – never statistics) The alternative hypothesis should be what you are trying to prove! ALWAYS define your parameter in context!

17 Large-Sample Hypothesis Test for a Population Proportion The fundamental idea behind hypothesis testing is: We reject H 0 if the observed sample is very unlikely to occur if H 0 is true.

18 3. When n is large, the sampling distribution of p is approximately normal. Recall the General Properties for Sampling Distributions of p 1. 2. As long as the sample size is less than 10% of the population

19 Steps: 1)Hypothesis statements & define parameters 2)Check Assumptions 3)Calculations 4)Conclusion, in context

20 Summary of the Large-Sample z Test for p Continued... Assumptions: 1. p is a sample proportion from a random sample 2.The sample size n is large. (np > 10 and n(1 - p) > 10) 3.If sampling is without replacement, the sample size is no more than 10% of the population size YEA!

21 In June 2006, an Associated Press survey was conducted to investigate how people use the nutritional information provided on food packages. Interviews were conducted with 1003 randomly selected adult Americans, and each participant was asked a series of questions, including the Question 1: When purchasing packaged food, how often do you check the nutritional labeling on the package? Question 2: How often do you purchase food that is bad for you, even after you’ve checked the nutrition labels? following two: It was reported that 582 responded “frequently” to the question about checking labels and 441 responded “very often” or “somewhat often” to the question about purchasing bad foods even after checking the labels. Based on these data, is it reasonable to conclude that a majority of adult Americans frequently check nutritional labels when purchasing packaged foods?

22 Nutritional Labels Continued... p = true proportion of adult Americans who frequently check nutritional labels H 0 : p =.5 H a : p >.5 We use p >.5 to test for a majority of adult Americans who frequently check nutritional labels. What is the parameter of interest? What are the hypotheses? What is the sample proportion?

23 Nutritional Labels Continued... H 0 : p =.5 H a : p >.5 Assumptions: Given random sample of adults np = (1003)(.5) = 501.5 ≥ 10 n(1-p) = 501.5 ≥ 10so distribution is approximately normal Population of adults is at least 10,030 Random? Normal? 10% rule? Notice that we used p since it is stated in H 0.

24 Formula from the card A test statistic indicates how many standard deviations the sample statistic (p) is from the population characteristic (p).

25 Nutritional Labels Continued... H 0 : p =.5 H a : p >.5 p = true proportion of adult Americans who frequently check nutritional labels Calculate the test statistic. What does this value tell us?

26 Level of Significance Activity

27 Level of significance - Is the amount of evidence necessary before we begin to doubt that the null hypothesis is true Is the probability that we will reject the null hypothesis, assuming that it is true Denoted by  –Can be any value –Usual values: 0.1, 0.05, 0.01 –Most common is 0.05

28 Computing P-values The calculation of the P-value depends on the form of the inequality in the alternative hypothesis. H a : p > hypothesize value Calculated z z curve P-value = area in upper tail What function on the calculator would we use to find this area? The P-value is the probability of obtaining a test statistic as extreme or more extreme as was observed, assuming H 0 is true.

29 Computing P-values The calculation of the P-value depends on the form of the inequality in the alternative hypothesis. H a : p < hypothesize value Calculated z z curve P-value = area in lower tail

30 Computing P-values The calculation of the P-value depends on the form of the inequality in the alternative hypothesis. H a : p ≠ hypothesize value Calculated z and –z z curve P-value = sum of area in two tails

31 Using P-values to make a decision: To decide whether or not to reject H 0, we compare the P-value to the significance level  If the P-value > , we “fail to reject” the null hypothesis. If the P-value < , we “reject” the null hypothesis. When we reject H 0, we say that the result is statistically significant!

32 Never “accept” the null hypothesis!

33 Writing Conclusions: 1)A statement of the decision being made (reject or fail to reject H 0 ) & why (linkage) 2)A statement of the results in context. (state in terms of H a ) AND

34 “Since the p-value ) , I reject (fail to reject) the H 0. There is (is not) sufficient evidence to suggest that H a.” Be sure to write H a in context (words)!

35 Since the P-value is so small, we reject H 0. There is convincing evidence to suggest that the majority of adult Americans frequently check the nutritional labels on packaged foods. Nutritional Labels Continued... H 0 : p =.5 H a : p >.5 p = true proportion of adult Americans who frequently check nutritional labels For this sample: Next we find the P-value for this test statistic. 0 In the standard normal curve, seeing a value of 5.08 or larger is unlikely. It’s probability is approximately 0. P-value ≈ 0

36 A report states that nationwide, 61% of high school graduates go on to attend a two-year or four-year college the year after graduation. Suppose a random sample of 1500 high school graduates in 2009 from a particular state estimated the proportion of high school graduates that attend college the year after graduation to be 58%. Can we reasonably conclude that the proportion of this state’s high school graduates in 2009 who attended college the year after graduation is different from the national figure? Use  =.01. H 0 : p =.61 H a : p ≠.61 Where p is the proportion of all 2009 high school graduates in this state who attended college the year after graduation State the hypotheses.

37 College Attendance Continued... H 0 : p =.61 H a : p ≠.61 Where p is the proportion of all 2009 high school graduates in this state who attended college the year after graduation Assumptions: Given a random sample of 1500 high school graduates Since 1500(.61) > 10 and 1500(.39) > 10, sample size is large enough. Population size is much larger than the sample size.

38 College Attendance Continued... H 0 : p =.61 H a : p ≠.61 Where p is the proportion of all 2009 high school graduates in this state who attended college the year after graduation Test statistic: P-value = 2(.0087) =.0174 Since P-value > , we fail to reject H 0. The evidence does not suggest that the proportion of 2009 high school graduates in this state who attended college the year after graduation differs from the national value. Use  =.01 The area to the left of -2.38 is approximately.0087

39 In December 2009, a county-wide water conservation campaign was conducted in a particular county. In 2010, a random sample of 500 homes was selected and water usage was recorded for each home in the sample. Suppose the sample results were that 220 households had reduced water consumption. The county supervisors wanted to know if their data supported the claim that fewer than half the households in the county reduced water consumption. H 0 : p =.5 H a : p <.5 where p is the proportion of all households in the county with reduced water usage State the hypotheses. Calculate p.

40 Water Usage Continued... H 0 : p =.5 H a : p <.5 2. Sample size n is large because np = 250 >10 and n(1-p) = 250 > 10 3. It is reasonable that there are more than 5000 (10n) households in the county. 1. p is from a random sample of households Verify assumptions where p is the proportion of all households in the county with reduced water usage

41 Water Usage Continued... H 0 : p =.5 H a : p <.5 Calculate the test statistic and P-value where p is the proportion of all households in the county with reduced water usage P-value =.0037 Since P-value < , we reject H 0. There is convincing evidence that the proportion of households with reduced water usage is less than half. Use  =.01

42 Water Usage Continued... H 0 : p =.5 H a : p <.5 Compute a 98% confidence interval: where p is the proportion of all households in the county with reduced water usage Since P-value < , we reject H 0. Used  =.01 Let’s create a confidence interval with this data. What is the appropriate confidence level to use? Since we are testing H a : p <.05,  would also be in the lower tail. Confidence intervals are two- tailed, so we need to put.01 in the upper tail (since the curve is symmetrical)..5 With.01 in each tail, that puts.98 in the middle – this is the appropriate confidence level.98 Notice that the hypothesized value of.5 is NOT in the 98% confidence interval and that we “rejected” H 0 !

43 College Attendance Revisited... H 0 : p =.61 H a : p ≠.61 Where p is the proportion of all 2009 high school graduates in this state who attended college the year after graduation Since P-value > , we fail to reject H 0. Use  =.01 Let’s compute a confidence interval for this problem. This is a two-tailed test so  gets split evenly into both tails, leaving 99% in the middle..99 Notice that the hypothesized value of.61 IS in the 98% confidence interval and that we “failed to reject” H 0 !


Download ppt "Inference with Proportions II Hypothesis Testing Using a Single Sample."

Similar presentations


Ads by Google