Chapter 8 Hypothesis Tests
Hypothesis Testing We now begin the phase of this course that discusses the highest achievement of statistics. Statistics, as the analytic branch of science, has provided scientists with a tool that, since the early part of the last century, has made possible many of the huge achievements of that century. You are aware, from science classes, of terms like conjecture, hypothesis, theory, and law. It is at the stage of “hypothesis” that most scientific research is done. Until now, you may not have been aware of the important role of statistics in this process. A whole new world is about to open up to you.
What is a Hypothesis? In scientific parlance, a hypothesis is an educated guess about something research may reveal, or a potential answer to a question that is being investigated. In science, this is often called the “research hypothesis” and is a statement of the anticipated conclusion of the experiment. In the statistical analysis of an experiment, we state two hypotheses: The null hypothesis and the alternative hypothesis.
The Alternative Hypothesis The alternative hypothesis usually corresponds to the scientist’s research hypothesis. It is often a statement of what one hopes to prove in the experiment. In statistics, the alternative hypothesis may be written symbolically. Its name will be H a or H 1 (in case of multiple hypotheses, they can be numbered)
The Null Hypothesis The null hypothesis is a statement that expresses the conclusion if the experiment doesn’t prove anything. It is often the “status quo,” what is currently accepted, or a standard we hope to beat. The null hypothesis is given the symbolic name H 0 (h-naught).
Clear Thinking There are strong philosophical reasons for stating hypotheses in this way. Science progresses by proposing new ideas, which are tested, and accepted only if there is sufficient evidence to support the new idea over the old. The effect is to prevent science from going off on wild tangents. The benefit of the doubt goes to currently accepted beliefs, which ensures some stability and enforces a standard of proof for new ideas.
Courtroom Analogy An important analogy is found in the American system of justice: Innocent until proven guilty. Here, the H 0 is innocence. That is what will be accepted in the event that evidence is insufficient or inconclusive. H a is guilt. If evidence is sufficient (beyond a reasonable doubt), the alternative will be accepted. Note that a conviction is a conclusion that H a is true, and thus a rejection of the innocence hypothesis, but an acquittal is not a declaration of innocence, only a conclusion that there is insufficient evidence to convict. We avoid saying “accept H 0,” since H 0 was assumed to begin with. We prefer to say either that we “reject H 0 ” or “do not reject H 0.” It is OK to say “accept H a.”
Example Testing Problems In the previous chapter, we discussed estimating parameters. For example, use a sample mean to estimate μ, giving both a point estimate and a CI. Now we take a different approach. Suppose we have an existing belief about the value of μ. This could come from previous research, or it could be a standard that needs to be met. Examples: –Previous corn hybrids have achieved 100 bu/acre. We want to show that our new hybrid does better. –Advertising claims have been made that there are 20 chips in every chocolate chip cookie. Support or refute this claim.
Stating the Null Hypothesis We start with a null hypothesis. The null hypothesis is denoted by H 0 : μ=μ 0 where μ 0 corresponds to the current belief or status quo. Example: –In the corn problem, if our hybrid is not better, it doesn’t beat the previous yield achievement of 100 bu/acre. Then we have H 0 : μ=100 or possibly H 0 : μ≤100. –In the cookie problem, if the advertising claims are correct, we have H 0 : μ=20 or possibly H 0 : μ≥20. Notice the choice of null hypothesis is not based on what we hope to prove, but on what is currently accepted.
Stating the Alternative The alternative hypothesis is the result that you will get if your research proves something is different from status quo or from what is expected. It is denoted by H a : μ≠μ 0. Sometimes there is more than one alternative, so we can write H 1 : μ≠μ 0, H 2 : μ>μ 0, and H 3 : μ<μ 0. In the corn problem, if our yield is more than 100 we have proved that our hybrid is better, so the alternative H a : μ>100 is appropriate.
Stating the Alternative For the cookie example, if there are less than 20 chips per cookie, the advertisers are wrong and possibly guilty of false advertising, so we want to prove H a : μ<20. A jar of peanut butter is supposed to have 16 oz in it. If there is too much, the cost goes up, while if it is too little, consumers will complain. Therefore we have H 0 : μ=16 and H a : μ≠16. From these examples, we can see that some tests focus on one direction and some do not.
Comparison with Confidence Intervals In a confidence interval, our focus is to provide an estimate of a parameter. A hypothesis test makes use of an estimate, such as the sample mean, but is not directly concerned with estimation. The point is to determine if a proposed value of the parameter is likely to be untrue.
Test of the Mean, σ Known The null hypothesis is initially assumed true. It states that the mean has a particular value, μ 0. Therefore, it follows that the distribution of x-bar has the same mean, μ 0. The logic goes something like this. If we take a sample, we get a particular sample mean. If the null hypothesis is true, that mean is not likely to be “far away” from the hypothesized mean. It could happen, but it’s not likely. Therefore, if the sample mean is “too far away,” we will suspect something is wrong, and reject the null hypothesis. The next slide shows this graphically.
Comments on the Graph What we see in the previous graph is the idea that lots of sample means will fall close to the true mean. About 68% fall within one standard deviation. There is still a 32% chance of getting a sample mean farther away than that. So, if a mean occurs more than one standard deviation away, we may still consider it quite possible that this is a random fluctuation, rather than a sign that something is wrong with the null hypothesis.
More Comments If we go to two standard deviations, about 95% of observed means would be included. There is only a 5% chance of getting a sample mean farther away than that. So, if a far-away mean occurs (more than two standard deviations out), we think it is more likely that it comes from a different distribution, rather than the one specified in the null hypothesis.
Choosing a Significance Level The next graph shows what it means to choose a 5% significance level. If the null hypothesis is true, there is only a 5% chance that the standardized sample mean will be above 1.96 or below These values will serve as a cutoff for the test. We are dealing only with cases where the sample mean can be assumed normal.
Decision Time We have already shown that we can use a standardized value instead of to decide when to reject. We will call this value Z*, the standard normal test statistic. The criterion by which we decide when to reject the null hypothesis is called a “decision rule.” We establish a cutoff value, beyond which is the rejection region. If Z* falls into that region, we will reject H o. The next slide shows this for α=.05.
One-tailed Tests Our graphs so far have shown tests with two tails. We have also seen that the alternative hypothesis could be of the form H 2 : μ>μ 0, or H 3 : μ<μ 0. These are one-tailed tests. The rejection region only goes to one side, and all of α goes into one tail (it doesn’t split).
Making Mistakes Hypothesis testing is a statistical process, involving random events. As a result, we could make the wrong decision. A Type I Error occurs if we reject H 0 when it is true. The probability of this is known as α, the level of significance. A Type II Error occurs when we fail to reject a false null hypothesis. The probability of this is known as β. The Power of a test is 1-β. This is the probability of rejecting the null hypothesis when it is false.
Classification of Errors Actual Decision H o TrueH o False Reject Type I Err P(Error)= α Type B Correct Do Not Reject Type A Correct Type II Err P(Error)=β
Two important numbers The significance level of a test is α, the probability of rejecting H o if it is true. The power of a test is 1-β, the probability of rejecting H o if it is false. There is a kind of trade-off between significance and power. We want significance small and power large, but they tend to increase or decrease together.
Steps in Hypothesis Testing 1.State the null and alternative hypotheses 2.Determine the appropriate type of test (check assumptions) 3.Define the rejection region 4.Calculate the test statistic 5.State the conclusion in terms of the original problem
p-Value Testing Say you are reporting some research in biology and in your paper you state that you have rejected the null hypothesis at the.10 level. Someone reviewing the paper may say, “What if you used a.05 level? Would you still have rejected?” To avoid this kind of question, researchers began reporting the p-value, which is actually the smallest α that would result in a rejection. It’s kind of like coming at the problem from behind. Instead looking at α to determine a critical region, we let the estimate show us the critical region that would “work.”
How p-Values Work To simplify the explanation, let’s look at a right-tailed means test. We assume a distribution with mean μ 0 and we calculate a sample mean. What if our sample mean fell right on the boundary of the critical region? This is just at the point where we would reject H 0. So if we calculate the probability of a value greater than, this corresponds to the smallest α that results in a rejection. If the test is two tailed, we have to double the probability, because marks one part of the rejection region, but its negative marks the other part, on the other side (other tail).
Using a p-Value Using a p-Value couldn’t be easier. If p<α, we reject H 0. That’s it. p-Values tell us something about the “strength” of a rejection. If p is really small, we can be very confident in the decision. In real world problems, many p-Values turn out to be like.001 or even less. We can feel very good about a rejection in this case. However, if p is around.05 or.1, we might be a little nervous. When Fischer originally proposed these ideas early in the last century, he suggested three categories of decision: –p <.05 Reject H 0 –.05 ≤ p ≤.20 more research needed –p >.20 Accept H 0