9.1 Significance Tests: The Basics Objectives SWBAT: STATE the null and alternative hypotheses for a significance test about a population parameter. INTERPRET.

9.1 Significance Tests: The Basics Objectives SWBAT: STATE the null and alternative hypotheses for a significance test about a population parameter. INTERPRET a P-value in context. DETERMINE whether the results of a study are statistically significant and MAKE an appropriate conclusion using a significance level. INTERPRET a Type I and a Type II error in context and GIVE a consequence of each.

9.2 Tests about a Population Proportion Objectives SWBAT: STATE and CHECK the Random, 10%, and Large Counts conditions for performing a significance test about a population proportion. PERFORM a significance test about a population proportion. INTERPRET the power of a test and DESCRIBE what factors affect the power of a test. DESCRIBE the relationship among the probability of a Type I error (significance level), the probability of a Type II error, and the power of a test.

The Reasoning of a Statistical Test Applet (textbook website) The Reasoning of a Statistical Test Applet Ed Monix claims to make 80% of the free throws that he attempts. We think he might be exaggerating. To test this claim, we’ll ask him to shoot some free throws, virtually, using the applet. 1)Set the applet to take 25 shots. Click “Shoot.” How many of the 25 shots did he make? Do we have enough data to decide whether his claim is valid? 2)Click “Shoot” again for 25 more shots. Keep doing this until you are convinced either that Monix makes less than 80% of his shots or that the claim is true. How large a sample of shots did you need to make your decision? 3)Click “Show true probability” to reveal the truth. Was your conclusion correct?

The true average is less than 98.6. The true average is 98.6 and we got a sample mean this low by chance. The true proportion of people with temperatures less than 98.6 is more than 50%. The true proportion is 50% and we got a sample proportion this high by chance. Several years ago, researchers conducted a study to determine whether the “accepted” value for normal body temperature, 98.6 degrees Fahrenheit, is accurate. The results:

A recent study on “The relative age effect and career success: Evidence from corporate CEOs” (Economics Letters 117 (2012)) suggests that people born in June and July are under-represented in the population of corporate CEOs. This “is consistent with the ‘relative-age effect’ due to school admissions grouping together children with age differences up to one year, with children born in June and July disadvantaged throughout life by being younger than their classmates born in other months.” In their sample of 375 corporate CEOs, only 45 (12%) were born in June and July. Is this convincing evidence that the true proportion p of all corporate CEOs born in June and July is smaller than 2/12? Give two explanations for why the sample proportion was below 2/12 (16.67%). June and July kids are disadvantaged. There is no disadvantage—the lower percentage was due to chance alone.

Regarding the CEO problem on the previous page, how can we decide which of the two explanations is more plausible? We have to determine how likely it is to get a sample proportion as small as 12% by chance alone, assuming that there is no birthday disadvantage (in other words, assuming p = 2/12). There are several ways to do this: Random number generator On the 84 do randInt(1, 12, 375). Let two numbers represent June and July (you can use 6 and 7, but it probably would be easier to use 1 and 2). After the numbers are generated, store them in a list. Sort them, count the 1’s and 2’s, and divide that count by 375. Repeat this simulation many times. Notecards Get 12 notecards and write a month on each card. Shuffle them up and randomly select one card. Replace the card, shuffle and draw again. Repeat this 375 times. Tally up the June and July cards. Divide the result by 375. Repeat this many times.

Let’s say we ran the simulation and saw it was very unlikely to get the sample proportion of 12% alone by random chance. Our interpretation would be: Assuming that there is no birthday disadvantage, it is very unlikely to get a sample proportion as low as 0.12 by chance alone. Thus, we have convincing evidence that people born in June/July are underrepresented in the population of corporate CEOs. How else could we have answered this question? We could have created a confidence interval. If 0.12 was not a plausible value in our interval, we would have convincing evidence that June/July people are underrepresented. For example, if our interval was (0.13, 0.19).

What is the difference between a null and alternative hypothesis? What notation is used for each? What are some common mistakes when stating hypotheses? The claim we weigh evidence against in a statistical test is called the null hypothesis (H 0 ). Often the null hypothesis is a statement of “no difference.” The claim about the population that we are trying to find evidence for is the alternative hypothesis (H a ). The claim we weigh evidence against in a statistical test is called the null hypothesis (H 0 ). Often the null hypothesis is a statement of “no difference.” The claim about the population that we are trying to find evidence for is the alternative hypothesis (H a ). A significance test is a formal procedure for comparing observed data with a claim (also called a hypothesis) whose truth we want to assess. The claim is a statement about a parameter, like the population proportion p or the population mean µ. In any significance test, the null hypothesis has the form H 0 : parameter = value The alternative hypothesis has one of the forms H a : parameter < value H a : parameter > value H a : parameter ≠ value To determine the correct form of H a, read the problem carefully.

Going back to our free-throw example, our hypotheses would be: In the free-throw shooter example, our hypotheses are H 0 : p = 0.80 H a : p < 0.80 where p is the long-run proportion of made free throws. As far as mistakes, two things to remember are: You must write hypotheses in terms of parameters, not statistics. You must give both the null and alternative

For each scenario, define the parameter of interest and state appropriate hypotheses. a)The CEO study from the previous slides parameter of interest: p = the true proportion of all corporate CEOs born in June and July

Additional Carucci Examples: Write a null and alternative hypothesis for each situation. a) The psychological study to measure students’ attitudes toward school ranged from 0 to 200. The mean score for US college students is about 115. A teacher suspects that older students have better attitudes toward school. b) Buddy read a newspaper report claiming that 12% of all adults in the US are left-handed. He believes that the proportion of lefties at Seton Hall is not equal to 12%.

What is the difference between a one-sided and a two-sided alternative hypothesis? How can you decide which to use? The alternative hypothesis is one-sided if it states that a parameter is larger than the null hypothesis value or if it states that the parameter is smaller than the null value. It is two-sided if it states that the parameter is different from the null hypothesis value (it could be either larger or smaller). The alternative hypothesis is one-sided if it states that a parameter is larger than the null hypothesis value or if it states that the parameter is smaller than the null value. It is two-sided if it states that the parameter is different from the null hypothesis value (it could be either larger or smaller). Hypotheses are formed BEFORE collecting the data! Use the wording of the question to guide you, not the data. “Changed” and “different than” are two key phrases to look for that might indicate two-side tests.

What is p-value? A p-value is the probability of getting a statistic, at least as extreme as the observed statistic in the study (by chance), assuming the null hypothesis is true. Small P-values are evidence against H 0 because they say that the observed result is unlikely to occur when H 0 is true. Large P-values fail to give convincing evidence against H 0 because they say that the observed result is likely to occur by chance when H 0 is true. In the CEO example, the p-value = 0.008. Interpret this value. There is a 0.008 probability of getting a sample proportion as small or smaller than 0.12 by chance alone, assuming that the birthdays of corporate CEOs are uniformly distributed throughout the year.

b) A significance test using the sample data produced a p-value of 0.28. Interpret the p-value in this context. There is a 0.28 probability of getting a sample standard deviation 13.9 yards or smaller by chance alone, assuming the true standard deviation is 15 yards. The true standard deviation of the distances is less than 15. The true standard deviation is 15 and we got a smaller standard deviation by chance.

What are the two possible conclusions for a significance test? reject H 0 or fail to reject H 0 Statistics works the same way as the American justice system: someone is “innocent until proven guilty.” We have to operate under the assumption that the null is true (innocent). If we do not find convincing evidence to support the alternative, then we stick with the “innocent” assumption and fail to reject the null. However, if we do find convincing evidence to support the alternative, then we can claim “guilt” and reject the null. p-value

What are some common errors that students make in their conclusions? Not including context (you have to include the context of the problem) Accepting the null hypothesis. Again, think about court. The decision is always guilty or not guilty, never innocent. The same applies here. We either reject or fail to reject the null. We never accept it. Not linking the conclusion to the p-value.

What is a significance level? When are the results of a study statistically significant? There is no rule for how small a P-value we should require in order to reject H 0. But we can compare the P-value with a fixed value that we regard as decisive, called the significance level. We write it as α, the Greek letter alpha. When we use a fixed level of significance to draw a conclusion in a significance test, P-value < α → reject H 0 → convincing evidence for H a P-value ≥ α → fail to reject H 0 → not convincing evidence for H a When we use a fixed level of significance to draw a conclusion in a significance test, P-value < α → reject H 0 → convincing evidence for H a P-value ≥ α → fail to reject H 0 → not convincing evidence for H a If the P-value is smaller than alpha, we say that the data are statistically significant at level α. In that case, we reject the null hypothesis H 0 and conclude that there is convincing evidence in favor of the alternative hypothesis H a. The most common significance level is 0.05, but other common ones include 0.01 and 0.10.

In a jury trial, what two errors could a jury make? convicting the defendant, when in reality the defendant is not guilty not convicting a defendant, when in reality he is guilty In a significance test, what two errors can we make? Which error is worse? When we draw a conclusion from a significance test, we hope our conclusion will be correct. But sometimes it will be wrong. There are two types of mistakes we can make. If we reject H 0 when H 0 is true, we have committed a Type I error. If we fail to reject H 0 when H a is true, we have committed a Type II error. If we reject H 0 when H 0 is true, we have committed a Type I error. If we fail to reject H 0 when H a is true, we have committed a Type II error. Thinking about the errors in terms of the alternative hypothesis: A Type I error is finding convincing evidence that the alternative is true but it really isn’t. A Type II error is not finding evidence that the alternative is true when it really is. Hint: Type II is when we fail “II” reject the null. In the courtroom example above, the first error would be a Type I error and the second error would be a Type II error.

It’s important to understand that the errors aren’t anyone's fault. You performed the correct calculations, but unfortunately your conclusion was an error. As far as which is worse, it really all depends on the context. Think about this scenario. You go to the doctor for a checkup. The null is that you’re healthy, and the alternative is that you’re sick. A Type I error would be the doctor tells you that you are sick (perhaps that you have cancer), when in reality you are not. A Type II error would be the doctor tells you that you are not sick, when in reality you are sick. Is one worse than the other? That all depends on your opinion. Truth about the population H 0 true H 0 false (H a true) Conclusion based on sample Reject H 0 Type I error Correct conclusion Fail to reject H 0 Correct conclusion Type II error

Describe a Type I and Type II error in the context of the CEO example. Which error could the researchers have made? Explain. Reminder: P-value: 0.008 Type I: If we find convincing evidence that the true proportion of CEO’s born in June and July is less than 2/12, when really it isn’t. Type II: If we don’t find convincing evidence that the true proportion of CEO’s born in June and July is less than 2/12, when really it is. Since the p-value is low, we would have rejected the null hypothesis. Therefore, if we made an error, we would have made a Type I error.

What is the probability of a Type I error? What can we do to reduce the probability of a Type I error? Are there any drawbacks to this? The probability of a Type I error is the probability of rejecting H 0 when it is really true…this is exactly the significance level of the test. Significance and Type I Error The significance level α of any fixed-level test is the probability of a Type I error. That is, α is the probability that the test will reject the null hypothesis H 0 when H 0 is actually true. To reduce the probability of getting a Type I error, demand that evidence be very convincing before rejecting your null (meaning use a smaller alpha). For example, demand that a murder conviction requires 1000 eye witnesses. It would be very, very unlikely for an innocent man to be convicted. What is the downside of this? In the conviction context, many guilty people would go free because there aren’t 1000 eye witnesses. Making it harder to reject the null means that we are more likely to make a Type II error.

What are the three conditions for conducting a significance test for a population proportion? How are these different than the conditions for constructing a confidence interval for a population proportion? Conditions For Performing A Significance Test About A Proportion Random: The data come from a well-designed random sample or randomized experiment. o 10%: When sampling without replacement, check that n ≤ (1/10)N. Large Counts: Both np 0 and n(1 − p 0 ) are at least 10. The conditions are the same as the conditions for constructing a confidence interval. However, we always do the calculations ASSUMING the null is true, so we use p instead of p hat.

What is a test statistic? What does it measure? Is the formula on the formula sheet? A test statistic measures how far a sample statistic diverges from what we would expect if the null hypothesis H 0 were true, in standardized units. That is, Values of the statistic far from the null parameter value in the direction specified by the alternative hypothesis give evidence against H 0. This is on the formula sheet, but again, just this general formula. It does not get specific for each test statistic.

Let’s go back to our free-throw shooter example. Reminder: H 0 : p = 0.80 H a : p < 0.80 Let’s say we take an SRS of 50 free throws and the shooter makes 32. Let’s calculate our test statistic! Reminder: for proportions, we use z.

What are the four steps for conducting a significance test? What is required in each step? To perform a significance test, we state hypotheses, check conditions, calculate a test statistic and P-value, and draw a conclusion in the context of the problem. The four-step process is ideal for organizing our work. Significance Tests: A Four-Step Process State: What hypotheses do you want to test, and at what significance level? Define any parameters you use. Plan: Choose the appropriate inference method. Check conditions. Do: If the conditions are met, perform calculations. Compute the test statistic. Find the P-value. Conclude: Make a decision about the hypotheses in the context of the problem.

What is the test statistic for a population proportion? Is this on the formula sheet? One Sample z-Test for a Proportion Choose an SRS of size n from a large population that contains an unknown proportion p of successes. To test the hypothesis H 0 : p = p 0, compute the z statistic Find the P-value by calculating the probability of getting a z statistic this large or larger in the direction specified by the alternative hypothesis H a : The formula sheet only contains the general test statistic formula.

What happens when the data (evidence) don’t support the alternative at all? If the data doesn’t support the alternative there is no need to continue with the significance test. For example, let’s say we have the following hypotheses: Let’s say we take an SRS which yields a sample proportion of 33/500 = 0.066. This doesn’t give us any evidence at all to support our alternative. Therefore, we should automatically fail to reject our null.

Alternate Example: According to an article in the San Gabriel Valley Tribune, “Most people are kidding the ‘right way.’” That is, according to the study, the majority of couples prefer to tilt their heads to the right when kissing. In the study, a researched observed a random sample of 124 kissing couples and found that 83/124 of the couples tilted to the right. Is this convincing evidence that couples really do prefer to kiss the right way? where p = the true proportion of couples that kiss the ‘right way’

Alternate Example: According to an article in the San Gabriel Valley Tribune, “Most people are kidding the ‘right way.’” That is, according to the study, the majority of couples prefer to tilt their heads to the right when kissing. In the study, a researched observed a random sample of 124 kissing couples and found that 83/124 of the couples tilted to the right. Is this convincing evidence that couples really do prefer to kiss the right way? Plan: If the conditions are met, we will perform a one-sample z test for p.

Do: Alternate Example: According to an article in the San Gabriel Valley Tribune, “Most people are kidding the ‘right way.’” That is, according to the study, the majority of couples prefer to tilt their heads to the right when kissing. In the study, a researched observed a random sample of 124 kissing couples and found that 83/124 of the couples tilted to the right. Is this convincing evidence that couples really do prefer to kiss the right way?

Conclude: Alternate Example: According to an article in the San Gabriel Valley Tribune, “Most people are kidding the ‘right way.’” That is, according to the study, the majority of couples prefer to tilt their heads to the right when kissing. In the study, a researched observed a random sample of 124 kissing couples and found that 83/124 of the couples tilted to the right. Is this convincing evidence that couples really do prefer to kiss the right way?

On the calculator: STAT > Tests > 1-PropZTest (option 5) P0: 0.5 x: 83 n: 124 prop: >p0 Calculate test statistic: 3.772p-value: 0.00008 Note: If you give the calculator answer with no work, and one or more of your values is incorrect, you will probably get no credit for the “do” step If you opt for the calculator-only method, be sure to name the procedure (one-proportion z test) and to report the test statistic (z = …) and p-value. This answers the Can you use your calculator for the Do step? question

Alternate Example: Benford’s law and fraud When the accounting firm AJL and Associates audits a company’s financial records for fraud, they often use a test based on Benford’s law. Benford’s law states that the distribution of first digits in many real-life sources of data is not uniform. In fact, when there is no fraud, about 30.1% of the numbers in financial records begin with the digit 1. However, if the proportion of first digits that are 1 is significantly different from 0.301 in a random sample of records, AJL and Associates does a much more thorough investigation of the company. Suppose that a random sample of 300 expenses from a company’s financial records results in only 68 expenses that begin with the digit 1. Should AJL and Associates do a more thorough investigation of this company?

Alternate Example: Benford’s law and fraud When the accounting firm AJL and Associates audits a company’s financial records for fraud, they often use a test based on Benford’s law. Benford’s law states that the distribution of first digits in many real-life sources of data is not uniform. In fact, when there is no fraud, about 30.1% of the numbers in financial records begin with the digit 1. However, if the proportion of first digits that are 1 is significantly different from 0.301 in a random sample of records, AJL and Associates does a much more thorough investigation of the company. Suppose that a random sample of 300 expenses from a company’s financial records results in only 68 expenses that begin with the digit 1. Should AJL and Associates do a more thorough investigation of this company? We have to double the area because the area on the table is for one-tail, but this is a two-tailed test.

Alternate Example: Benford’s law and fraud When the accounting firm AJL and Associates audits a company’s financial records for fraud, they often use a test based on Benford’s law. Benford’s law states that the distribution of first digits in many real-life sources of data is not uniform. In fact, when there is no fraud, about 30.1% of the numbers in financial records begin with the digit 1. However, if the proportion of first digits that are 1 is significantly different from 0.301 in a random sample of records, AJL and Associates does a much more thorough investigation of the company. Suppose that a random sample of 300 expenses from a company’s financial records results in only 68 expenses that begin with the digit 1. Should AJL and Associates do a more thorough investigation of this company?

Describe a Type I and Type II error in this context. Type I: Finding convincing evidence that the true proportion of expenses that begin with 1 is different than 0.301, when it really isn’t. Type II: Not finding convincing evidence that the true proportion of expenses that begin with 1 is different than 0.301, when it really is.

Can you use confidence intervals to decide between two hypotheses? What is an advantage to using confidence intervals for this purpose? Why don’t we always use confidence intervals? The result of a significance test is basically a decision to reject H 0 or fail to reject H 0. When we reject H 0, we’re left wondering what the actual proportion p might be. A confidence interval might shed some light on this issue. Taeyeon found that 90 of an SRS of 150 students said that they had never smoked a cigarette. The number of successes and the number of failures in the sample are 90 and 60, respectively, so we can proceed with calculations. Our 95% confidence interval is: We are 95% confident that the interval from 0.522 to 0.678 captures the true proportion of students at Taeyeon’s high school who would say that they have never smoked a cigarette. Our 95% confidence interval is: We are 95% confident that the interval from 0.522 to 0.678 captures the true proportion of students at Taeyeon’s high school who would say that they have never smoked a cigarette.

Since confidence intervals give us more info, why don’t we always use them for hypothesis testing? 1)They can only be used for two-sided tests 2)The standard errors are slightly different The standard error used for the confidence interval is based on the sample proportion, while the denominator of the test statistic is based on the value of p0 from the null hypothesis.

Alternate Example: Benford’s law and fraud A 95% confidence interval for the true proportion of expenses that begin with the digit 1 for the company in the previous Alternate Example is (0.180,0.274). Does this interval provide convincing evidence that the company should be investigated for fraud? Yes. Because 0.301 is not in the interval, 0.301 is not a plausible value for the true proportion of expenses that begin with the digit 1. Thus, this company should be investigated for fraud.

What is the power of a test? How is power related to the probability of a Type II error? Will you be expected to calculate the power of a test on the AP exam? A significance test makes a Type II error when it fails to reject a null hypothesis H 0 that really is false. The probability of making a Type II error depends on several factors, including the actual value of the parameter. A high probability of Type II error for a specific alternative parameter value means that the test is not sensitive enough to usually detect that alternative. The power of a test against a specific alternative is the probability that the test will reject H 0 at a chosen significance level α when the specified alternative value of the parameter is true. Calculating a Type II error probability or power by hand is not easy, so you wouldn’t be able to calculate it, just answer questions about it. Power and Type II Error The power of a test against any alternative is 1 minus the probability of a Type II error for that alternative; that is, power = 1 − β.

A potato chip producer has hypotheses H0: p = 0.08 versus Ha: p > 0.08, with p being the true proportion of blemished potatoes. Suppose that the true proportion of blemished potatoes is p = 0.10. This means that we should reject the null because p = 0.10 > 0.08. c) Suppose that a second shipment of potatoes arrives and the proportion of blemished potatoes p = 0.50. Will the inspector be more likely to find convincing evidence that p > 0.08 for the first shipment (p = 0.10) or the second shipment (p = 0.50)? How does “effect size” affect power? “Effect size” is the difference between the hypothesized value and the truth. Bigger effect size = more power It is easier to see the shipment is bad when the actual proportion of bad potatoes is far from the hypothesized value (0.50 is farther from 0.08 than 0.10). d) Is there anything else that affects power? Other sources of variability (good use of control/blocking/stratifying) helps to account for sources of variability, making power go up.

9.1 Significance Tests: The Basics Objectives SWBAT: STATE the null and alternative hypotheses for a significance test about a population parameter. INTERPRET.

Similar presentations

Presentation on theme: "9.1 Significance Tests: The Basics Objectives SWBAT: STATE the null and alternative hypotheses for a significance test about a population parameter. INTERPRET."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

9.1 Significance Tests: The Basics Objectives SWBAT: STATE the null and alternative hypotheses for a significance test about a population parameter. INTERPRET.

Similar presentations

Presentation on theme: "9.1 Significance Tests: The Basics Objectives SWBAT: STATE the null and alternative hypotheses for a significance test about a population parameter. INTERPRET."— Presentation transcript:

Similar presentations

About project

Feedback