Testing Hypotheses About Proportions

Testing Hypotheses About Proportions
Unit 4 Testing Hypotheses About Proportions Copyright © 2009 Pearson Education, Inc.

Hypotheses Our starting hypothesis is called the null hypothesis.
The null hypothesis, that we denote by H0, specifies a population model parameter of interest and proposes a value for that parameter. We write the null hypothesis in the form H0: parameter = hypothesized value. The alternative hypothesis, which we denote by HA, contains the values of the parameter that we consider plausible when we reject the null hypothesis.

Testing Hypotheses The null hypothesis, specifies a population model parameter of interest and proposes a value for that parameter. We might have, for example, H0: p = 0.20 We want to compare our data to what we would expect given that H0 is true. We can do this by finding out how many standard deviations away from the proposed value we are. We then ask how likely it is to get results like we did if the null hypothesis were true.

A Trial as a Hypothesis Test
Think about the logic of jury trials: To prove someone is guilty, we start by assuming they are innocent. We retain that hypothesis until the facts make it unlikely beyond a reasonable doubt. Then, and only then, we reject the hypothesis of innocence and declare the person guilty.

A Trial as a Hypothesis Test (cont.)
The same logic used in jury trials is used in statistical tests of hypotheses: We begin by assuming that a hypothesis is true. Next we consider whether the data are consistent with the hypothesis. If they are, all we can do is retain the hypothesis we started with. If they are not, then like a jury, we ask whether they are unlikely beyond a reasonable doubt.

P-Values The statistical twist is that we can quantify our level of doubt. We can use the model proposed by our hypothesis to calculate the probability that the event we’ve witnessed could happen. That’s just the probability we’re looking for—it quantifies exactly how surprised we are to see our results. This probability is called a P-value.

P-Values (cont.) When the data are consistent with the model from the null hypothesis, the P-value is high and we are unable to reject the null hypothesis. In that case, we have to “retain” the null hypothesis we started with. We can’t claim to have proved it; instead we “fail to reject the null hypothesis” when the data are consistent with the null hypothesis model and in line with what we would expect from natural sampling variability. If the P-value is low enough, we’ll “reject the null hypothesis,” since what we observed would be very unlikely were the null model true.

What to Do with an “Innocent” Defendant
If the evidence is not strong enough to reject the presumption of innocent, the jury returns with a verdict of “not guilty.” The jury does not say that the defendant is innocent. All it says is that there is not enough evidence to convict, to reject innocence. The defendant may, in fact, be innocent, but the jury has no way to be sure.

What to Do with an “Innocent” Defendant (cont.)
Said statistically, we will fail to reject the null hypothesis. We never declare the null hypothesis to be true, because we simply do not know whether it’s true or not. Sometimes in this case we say that the null hypothesis has been retained.

What to Do with an “Innocent” Defendant (cont.)
In a trial, the burden of proof is on the prosecution. In a hypothesis test, the burden of proof is on the unusual claim. The null hypothesis is the ordinary state of affairs, so it’s the alternative to the null hypothesis that we consider unusual (and for which we must marshal evidence).

The Reasoning of Hypothesis Testing
There are seven parts to most hypothesis tests: State the Population and Parameter of Interest. State the Type of Test State the Hypothesis Check Conditions Show Formulas and Calculations State and Interpret the P-Value Write a Conclusion in the context of the problem.

The Reasoning of Hypothesis Testing (cont.)
Hypotheses The null hypothesis: To perform a hypothesis test, we must first translate our question of interest into a statement about model parameters. In general, we have H0: parameter = hypothesized value. The alternative hypothesis: The alternative hypothesis, HA, contains the values of the parameter we consider plausible if we reject the null.

Model To plan a statistical hypothesis test, specify the model you will use to test the null hypothesis and the parameter of interest. All models require assumptions, so state the assumptions and check any corresponding conditions. Your plan should end with a statement like Because the conditions are satisfied, I can model the sampling distribution of the proportion with a Normal model. Watch out, though. It might be the case that your model step ends with “Because the conditions are not satisfied, I can’t proceed with the test.” If that’s the case, stop and reconsider.

Model Each test we discuss in the book has a name that you should include in your report. The test about proportions is called a one-proportion z-test.

One-Proportion z-Test
The conditions for the one-proportion z-test are the same as for the one proportion z-interval. We test the hypothesis H0: p = p0 using the statistic where When the conditions are met and the null hypothesis is true, this statistic follows the standard Normal model, so we can use that model to obtain a P-value.

Mechanics Under “mechanics” we place the actual calculation of our test statistic from the data. Different tests will have different formulas and different test statistics. Usually, the mechanics are handled by a statistics program or calculator, but it’s good to know the formulas.

Mechanics The ultimate goal of the calculation is to obtain a P-value. The P-value is the probability that the observed statistic value (or an even more extreme value) could occur if the null model were correct. If the P-value is small enough, we’ll reject the null hypothesis. Note: The P-value is a conditional probability—it’s the probability that the observed results could have happened if the null hypothesis is true.

Conclusion The conclusion in a hypothesis test is always a statement about the null hypothesis. The conclusion must state either that we reject or that we fail to reject the null hypothesis. And, as always, the conclusion should be stated in context.

Conclusion Your conclusion about the null hypothesis should never be the end of a testing procedure. Often there are actions to take or policies to change.

Alternative Alternatives
There are three possible alternative hypotheses: HA: parameter < hypothesized value HA: parameter ≠ hypothesized value HA: parameter > hypothesized value

Alternative Alternatives (cont.)
HA: parameter ≠ value is known as a two-sided alternative because we are equally interested in deviations on either side of the null hypothesis value. For two-sided alternatives, the P-value is the probability of deviating in either direction from the null hypothesis value.

Alternative Alternatives (cont.)
The other two alternative hypotheses are called one-sided alternatives. A one-sided alternative focuses on deviations from the null hypothesis value in only one direction. Thus, the P-value for one-sided alternatives is the probability of deviating only in the direction of the alternative away from the null hypothesis value.

P-Values and Decisions: What to Tell About a Hypothesis Test
How small should the P-value be in order for you to reject the null hypothesis? It turns out that our decision criterion is context-dependent. When we’re screening for a disease and want to be sure we treat all those who are sick, we may be willing to reject the null hypothesis of no disease with a fairly large P-value. A longstanding hypothesis, believed by many to be true, needs stronger evidence (and a correspondingly small P-value) to reject it. Another factor in choosing a P-value is the importance of the issue being tested.

P-Values and Decisions (cont.)
Your conclusion about any null hypothesis should be accompanied by the P-value of the test. If possible, it should also include a confidence interval for the parameter of interest. Don’t just declare the null hypothesis rejected or not rejected. Report the P-value to show the strength of the evidence against the hypothesis. This will let each reader decide whether or not to reject the null hypothesis.

What Can Go Wrong? (cont.)
Don’t base your null hypothesis on what you see in the data. Think about the situation you are investigating and develop your null hypothesis appropriately. Don’t base your alternative hypothesis on the data, either. Again, you need to Think about the situation.

What Can Go Wrong? (cont.)
Don’t make your null hypothesis what you want to show to be true. You can reject the null hypothesis, but you can never “accept” or “prove” the null. Don’t forget to check the conditions. We need randomization, independence, and a sample that is large enough to justify the use of the Normal model. If you fail to reject the null hypothesis, don’t think a bigger sample would be more likely to lead to rejection. Each sample is different, and a larger sample won’t necessarily duplicate your current observations.

What have we learned? We can use what we see in a random sample to test a particular hypothesis about the world. Hypothesis testing complements our use of confidence intervals. Testing a hypothesis involves proposing a model, and seeing whether the data we observe are consistent with that model or so unusual that we must reject it. We do this by finding a P-value—the probability that data like ours could have occurred if the model is correct.

What have we learned? (cont.)
We’ve learned the process of hypothesis testing, from developing the hypotheses to stating our conclusion in the context of the original question. We know that confidence intervals and hypothesis tests go hand in hand in helping us think about models. A hypothesis test makes a yes/no decision about the plausibility of a parameter value. A confidence interval shows us the range of plausible values for the parameter.

Zero In on the Null Null hypotheses have special requirements.
To perform a hypothesis test, the null must be a statement about the value of a parameter for a model. We then use this value to compute the probability that the observed sample statistic—or something even farther from the null value—will occur.

Zero In on the Null (cont.)
How do we choose the null hypothesis? The appropriate null arises directly from the context of the problem—it is not dictated by the data, but instead by the situation. A good way to identify both the null and alternative hypotheses is to think about the Why of the situation. To write a null hypothesis, you can’t just choose any parameter value you like. The null must relate to the question at hand—it is context dependent.

Zero In on the Null (cont.)
There is a temptation to state your claim as the null hypothesis. However, you cannot prove a null hypothesis true. So, it makes more sense to use what you want to show as the alternative. This way, when you reject the null, you are left with what you want to show.

How to Think About P-Values
A P-value is a conditional probability—the probability of the observed statistic given that the null hypothesis is true. The P-value is NOT the probability that the null hypothesis is true. It’s not even the conditional probability that null hypothesis is true given the data. Be careful to interpret the P-value correctly.

What to Do with a High P-Value
When we see a small P-value, we could continue to believe the null hypothesis and conclude that we just witnessed a rare event. But instead, we trust the data and use it as evidence to reject the null hypothesis. However big P-values just mean what we observed isn’t surprising. That is, the results are now in line with our assumption that the null hypothesis models the world, so we have no reason to reject it. A big P-value doesn’t prove that the null hypothesis is true, but it certainly offers no evidence that it is not true. Thus, when we see a large P-value, all we can say is that we “don’t reject the null hypothesis.”

Alpha Levels Sometimes we need to make a firm decision about whether or not to reject the null hypothesis. When the P-value is small, it tells us that our data are rare given the null hypothesis. How rare is “rare”?

Alpha Levels (cont.) We can define “rare event” arbitrarily by setting a threshold for our P-value. If our P-value falls below that point, we’ll reject H0. We call such results statistically significant. The threshold is called an alpha level, denoted by .

Alpha Levels (cont.) Common alpha levels are 0.10, 0.05, and 0.01.
You have the option—almost the obligation—to consider your alpha level carefully and choose an appropriate one for the situation. The alpha level is also called the significance level. When we reject the null hypothesis, we say that the test is “significant at that level.”

Alpha Levels (cont.) What can you say if the P-value does not fall below ? You should say that “The data have failed to provide sufficient evidence to reject the null hypothesis.” Don’t say that you “accept the null hypothesis.”

Alpha Levels (cont.) Recall that, in a jury trial, if we do not find the defendant guilty, we say the defendant is “not guilty”—we don’t say that the defendant is “innocent.”

Alpha Levels (cont.) The P-value gives the reader far more information than just stating that you reject or fail to reject the null. In fact, by providing a P-value to the reader, you allow that person to make his or her own decisions about the test. What you consider to be statistically significant might not be the same as what someone else considers statistically significant. There is more than one alpha level that can be used, but each test will give only one P-value.

What Not to Say About Significance
What do we mean when we say that a test is statistically significant? All we mean is that the test statistic had a P-value lower than our alpha level. Don’t be lulled into thinking that statistical significance carries with it any sense of practical importance or impact.

What Not to Say About Significance (cont.)
For large samples, even small, unimportant (“insignificant”) deviations from the null hypothesis can be statistically significant. On the other hand, if the sample is not large enough, even large, financially or scientifically “significant” differences may not be statistically significant. It’s good practice to report the magnitude of the difference between the observed statistic value and the null hypothesis value (in the data units) along with the P-value on which we base statistical significance.

Critical Values Again When making a confidence interval, we’ve found a critical value, z*, to correspond to our selected confidence level. Prior to the use of technology, P-values were difficult to find, and it was easier to select a few common alpha values and learn the corresponding critical values for the Normal model.

Critical Values Again (cont.)
Rather than looking up your z-score in the table, you could just check it directly against these critical values. Any z-score larger in magnitude than a particular critical value leads us to reject H0. Any z-score smaller in magnitude than a particular critical value leads us to fail to reject H0.

Here are the traditional critical values from the Normal model:  1-sided 2-sided 0.05 1.645 1.96 0.01 2.33 2.576 0.001 3.09 3.29

When the alternative is one-sided, the critical value puts all of  on one side: When the alternative is two-sided, the critical value splits  equally into two tails:

Confidence Intervals and Hypothesis Tests
Confidence intervals and hypothesis tests are built from the same calculations. They have the same assumptions and conditions. You can approximate a hypothesis test by examining a confidence interval. Just ask whether the null hypothesis value is consistent with a confidence interval for the parameter at the corresponding confidence level.

Because confidence intervals are two-sided, they correspond to two-sided tests. In general, a confidence interval with a confidence level of C% corresponds to a two-sided hypothesis test with an -level of 100 – C%.

The relationship between confidence intervals and one-sided hypothesis tests is a little more complicated. A confidence interval with a confidence level of C% corresponds to a one-sided hypothesis test with an -level of ½(100 – C)%.

Making Errors Here’s some shocking news for you: nobody’s perfect. Even with lots of evidence we can still make the wrong decision. When we perform a hypothesis test, we can make mistakes in two ways: The null hypothesis is true, but we mistakenly reject it. (Type I error) The null hypothesis is false, but we fail to reject it. (Type II error)

Making Errors (cont.) Which type of error is more serious depends on the situation at hand. In other words, the gravity of the error is context dependent. Here’s an illustration of the four situations in a hypothesis test:

Comparing Two Proportions
Comparisons between two percentages are much more common than questions about isolated percentages. And they are more interesting. We often want to know how two groups differ, whether a treatment is better than a placebo control, or whether this year’s results are better than last year’s.

Another Ruler In order to examine the difference between two proportions, we need another ruler—the standard deviation of the sampling distribution model for the difference between two proportions. Recall that standard deviations don’t add, but variances do. In fact, the variance of the sum or difference of two independent random variables is the sum of their individual variances.

The Standard Deviation of the Difference Between Two Proportions
Proportions observed in independent random samples are independent. Thus, we can add their variances. So… The standard deviation of the difference between two sample proportions is Thus, the standard error is

Assumptions and Conditions
Independence Assumptions: Randomization Condition: The data in each group should be drawn independently and at random from a homogeneous population or generated by a randomized comparative experiment. The 10% Condition: If the data are sampled without replacement, the sample should not exceed 10% of the population. Independent Groups Assumption: The two groups we’re comparing must be independent of each other.

Assumptions and Conditions (cont.)
Sample Size Condition: Each of the groups must be big enough… Success/Failure Condition: Both groups are big enough that at least 10 successes and at least 10 failures have been observed in each.

The Sampling Distribution
We already know that for large enough samples, each of our proportions has an approximately Normal sampling distribution. The same is true of their difference.

The Sampling Distribution (cont.)
Provided that the sampled values are independent, the samples are independent, and the samples sizes are large enough, the sampling distribution of is modeled by a Normal model with Mean: Standard deviation:

Two-Proportion z-Interval
When the conditions are met, we are ready to find the confidence interval for the difference of two proportions: The confidence interval is where The critical value z* depends on the particular confidence level, C, that you specify.

Everyone into the Pool The typical hypothesis test for the difference in two proportions is the one of no difference. In symbols, H0: p1 – p2 = 0. Since we are hypothesizing that there is no difference between the two proportions, that means that the standard deviations for each proportion are the same. Since this is the case, we combine (pool) the counts to get one overall proportion.

What Can Go Wrong? Don’t use two-sample proportion methods when the samples aren’t independent. These methods give wrong answers when the independence assumption is violated. Don’t apply inference methods when there was no randomization. Our data must come from representative random samples or from a properly randomized experiment. Don’t interpret a significant difference in proportions causally. Be careful not to jump to conclusions about causality.

What have we learned? We’ve now looked at inference for the difference in two proportions. Perhaps the most important thing to remember is that the concepts and interpretations are essentially the same—only the mechanics have changed slightly.

What have we learned? Hypothesis tests and confidence intervals for the difference in two proportions are based on Normal models. Both require us to find the standard error of the difference in two proportions. We do that by adding the variances of the two sample proportions, assuming our two groups are independent. When we test a hypothesis that the two proportions are equal, we pool the sample data; for confidence intervals we don’t pool.

Testing Hypotheses About Proportions

Similar presentations

Presentation on theme: "Testing Hypotheses About Proportions"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Testing Hypotheses About Proportions

Similar presentations

Presentation on theme: "Testing Hypotheses About Proportions"— Presentation transcript:

Similar presentations

About project

Feedback