Download presentation
Presentation is loading. Please wait.
1
Distribution-Free Procedures
15 Distribution-Free Procedures Copyright © Cengage Learning. All rights reserved.
2
15.2 The Wilcoxon Rank-Sum Test
Copyright © Cengage Learning. All rights reserved.
3
The Wilcoxon Rank-Sum Test
The two-sample t test is based on the assumption that both population distributions are normal. There are situations, though, in which an investigator would want to use a test that is valid even if the underlying distributions are quite nonnormal. We now describe such a test, called the Wilcoxon rank-sum test. An alternative name for the procedure is the Mann-Whitney test, though the Mann-Whitney test statistic is sometimes expressed in a slightly different form from that of the Wilcoxon test.
4
The Wilcoxon Rank-Sum Test
The Wilcoxon test procedure is distribution-free because it will have the desired level of significance for a very large class of underlying distributions.
5
The Wilcoxon Rank-Sum Test
Assumptions The null hypothesis H0: 1 – 2 = 0 asserts that, the X distribution is shifted by the amount 0 to the right of the Y distribution.
6
Development of the Test When m = 3, n = 4
7
Development of the Test When m = 3, n = 4
Suppose the relevant hypotheses are H0: 1 – 2 = 0 versus 𝐻 𝑎 : 1 – 2 > 0. If 1 is actually much larger than 2, then most of the observed x’s will fall to the right of the observed y’s. However, if H0 is true, then the observed values from the two samples should be intermingled. The test statistic assesses how much intermingling there is in the two samples.
8
Development of the Test When m = 3, n = 4
Consider the case m = 3, n = 4. Then if all three observed x’s were to the right of all four observed y’s, this would provide strong evidence for rejecting H0 in favor of Ha. The test procedure involves pooling the X’s and Y’s into a combined sample of size m + n = 7 and ranking these observations from smallest to largest, with the smallest receiving rank 1 and the largest rank 7. If either most of the largest ranks or most of the smallest ranks were associated with X observations, we would begin to doubt H0.
9
Development of the Test When m = 3, n = 4
The test statistic that quantifies this reasoning is W = the sum of the ranks in the combined sample associated with X observations For the values of m and n under consideration, the smallest possible value of W is w = = 6 (if all three x’s are smaller than all four y’s), and the largest possible value is w = = 18 (if all three x’s are larger than all four y’s). (15.3)
10
Development of the Test When m = 3, n = 4
As an example, suppose x1 = –3.10, x2 = 1.67, x3 = 2.01, y1 = 5.27, y2 = 1.89, y3 = 3.86, and y4 = .19. Then the pooled ordered sample and corresponding ranks are as follows: Thus w = = 9.
11
Development of the Test When m = 3, n = 4
For the alternative under consideration, a larger value of W provides more evidence against 𝐻 0 than does a smaller value. This implies that the test is upper-tailed: 𝑃−𝑣𝑎𝑙𝑢𝑒= 𝑃 0 (𝑊≥𝑤), where again 𝑃 0 denotes the probability computed assuming that 𝐻 0 is true. So we need the "null distribution" of the test statistic. To this end, recall that when H0 is true, all seven observations come from the same population. This means that under H0, any possible triple of ranks associated with the three x’s—such as (1, 4, 5), (3, 5, 6), or (5, 6, 7)—has the same probability as any other possible rank triple.
12
Development of the Test When m = 3, n = 4
Since there are = 35 possible rank triples, under H0 each rank triple has probability . From a list of all 35 rank triples and the w value associated with each, the probability distribution of W can immediately be determined. For example, there are four rank triples that have w value 11—(1, 3, 7), (1, 4, 6), (2, 3, 6), and (2, 4, 5)—so P(W = 11) = .
13
Development of the Test When m = 3, n = 4
The computations are summarized in Table 15.4. The distribution of Table 15.4 is symmetric about the value w = (6 + 18)/2 = 12, which is the middle value in the ordered list of possible W values. Probability Distribution of W (m = 3, n = 4) When H0 Is True Table 15.4
14
Development of the Test When m = 3, n = 4
This is because the two rank triples (r, s, t) (with r < s < t) and (8 – t, 8 – s, 8 – r) have values of w symmetric about 12, so for each triple with w value below 12, there is a triple with w value above 12 by the same amount. For the alternative under consideration, the test statistic value w = 16 results in P-value = 𝑃 0 𝑊≥16 = = We would then not be able to reject the null hypothesis at significance level .05. The test is lower-tailed if the alternative hypothesis is
15
Development of the Test When m = 3, n = 4
The test statistic value w = 7 gives P-value = 𝑃 0 𝑊≤7 = =.057. In the case of the alternative hypothesis 𝐻 𝑎 : 𝜇 1 − 𝜇 2 ≠0, the test is two-tailed. Suppose, for example, that w = 17. Then 17 and 18 at least as contradictory to 𝐻 0 as the obtained value of W, and so also are 7 and 6; these latter two values are as far out in the lower tail of the null distribution as the former two are in the upper tail. The P-value is 𝑃 0 (𝑊≥17 𝑜𝑟≤7)= 1/35 + 1/35 + 1/35 + 1/35 = .114.
16
General Description of the Test
17
General Description of the Test
The null hypothesis H0: 1 – 2 = 0 is handled by subtracting 0 from each Xi and using the (Xi – 0)s as the Xi’s were previously used. Note that for any positive integer K, the sum of the first K integers is K(K + 1)/2. This imples the smallest possible value of the statistic W is m(m + 1)/2, which occurs when the (Xi – 0)s are all to the left of the Y sample. The largest possible value of W occurs when the (Xi – 0)s lie entirely to the right of the Y’s; in this case, W = (n + 1) + · · · + (m + n) = sum of first m + n integers) – (sum of first n integers), which gives m(m + 2n + 1)/2.
18
General Description of the Test
As with the special case m = 3, n = 4, the distribution of W is symmetric about the value that is halfway between the smallest and largest values; this middle value is m(m + n + 1)/2. Because of this symmetry, P- values for lower-tail and two-tailed tests are easily obtained from a tabulation of upper-tailed null distribution probabilities.
19
General Description of the Test
20
General Description of the Test
The table gives information only for m = 3, 4,…, 8 and n = m, m 1 1,…, 8 (i.e., 3 ≤ m ≤ n ≤ 8). For values of m and n that exceed 8, a normal approximation can be used; of course statistical software will provide an exact P-value for any sample size. To use the table for small m and n, though, the X and Y samples should be labeled so that m ≤ n.
21
Example 15.4 Noroviruses are a leading cause of acute gastroenteritis, and as of December 2011, no vaccine was available to combat this virus. An experiment involved 15 patients, 8 of whom were randomly assigned to receive a new vaccine; the other 7 individuals received a placebo. The following data on duration (h) of Norwalk virus illness resulted:
22
Example 15.4 cont’d Does it appear that true average duration is different for the two treatments? Since use of Appendix Table A.14 requires m ≤ n, let 𝜇 1 denote true average duration when a placebo is given and 𝜇 2 represent true average duration when the vaccine is administered. We wish to test
23
Example 15.4 cont’d Using the Wilcoxon rank-sum test at significance level .05. The x (placebo) ranks for the pooled sample consisting of all 15 observations are 2, 6, 7, 10, 12, 14, and 15, from which w = 2 + … + 15 = 66. Since max{w, m(m + n + 1) - w} = max{66, 46} = 66, ( 𝑃 0 (W ≥ 71) comes from Table A.14).
24
Example 15.4 cont’d The P-value clearly exceeds .05, so at this significance level the null hypothesis cannot be rejected. There is not enough evidence to conclude that true average duration is different for the vaccine from what it is for the placebo. Software yields a P-value of .27, which is in excellent agreement with the value .28 based on larger sample sizes reported in the article “Norovirus Vaccine against Experimental Human Norwalk Virus Illness” (New England J. of Medicine, 2011: 2178–2187). A large number of articles in this journal include summaries of data analyses using the Wilcoxon signed-rank test, rank-sum test, or both.
25
General Description of the Test
Theoretically, the assumption of continuity of the two distributions ensures that all m + n observed x’s and y’s will have different values. In practice, though, there will often be ties in the observed values. As with the Wilcoxon signed-rank test, the common practice in dealing with ties is to assign each of the tied observations in a particular set of ties the average of the ranks they would receive if they differed very slightly from one another.
26
A Normal Approximation for W
27
A Normal Approximation for W
When both m and n exceed 8, the distribution of W can be approximated by an appropriate normal curve, and this approximation can be used in place of Appendix Table A.14. To obtain the approximation, we need W and when H0 is true. In this case, the rank Ri of Xi – 0 is equally likely to be any one of the possible values 1, 2, 3, , m + n (Ri has a discrete uniform distribution on the first m + n positive integers), so Ri = (m + n + 1)/2.
28
A Normal Approximation for W
Since W = this gives The variance of Ri is also easily computed to be (m + n + 1)(m + n – 1)/12. However, because the Ri’s are not independent variables, V(W) mV(Ri). (15.4)
29
A Normal Approximation for W
Using the fact that, for any two distinct integers a and b between 1 and m + n inclusive, P(Ri = a, Rj = b) = 1/[(m + n)(m + n – 1)] (two integers are being sampled without replacement), Cov(Ri, Rj) = –(m + n + 1)/12, which yields (15.5)
30
A Normal Approximation for W
A Central Limit Theorem can then be used to conclude that when H0 is true, the test statistic has approximately a standard normal distribution. A P-value is computed using Appendix Table A.3 exactly as in the case of previous z tests.
31
Example 15.5 The article “A Lining for the Thermal Comfort of Trekking Boots–Experimental and Numerical Studies” (Research J. of Textile and Apparel, 2011: 50–61) discussed the design and development of linings in footwear. The investigators used the rank-sum test to decide whether true average moisture accumulation (g) was different for two types of boots. The sample sizes and value of W were not provided, so suppose that m = n = 9 and w = 37. Then
32
Example 15.5 cont’d From which z = ( )/ = (this is the value of z given in the cited article). The P-value is 2Φ(24.28), which is less than 2Φ(23.49) = At significance level .01 or even .001, we reject the null hypothesis and conclude that true average moisture accumulation is different for the two types of boots.
33
A Normal Approximation for W
If there are ties in the data, the numerator of Z is still appropriate, but the denominator should be replaced by the square root of the adjusted variance where ti is the number of tied observations in the ith set of ties and the sum is over all sets of ties. Unless there are a great many ties, there is little difference between Equations (15.6) and (15.5). (15.6)
34
Efficiency of the Wilcoxon Rank-Sum Test
35
Efficiency of the Wilcoxon Rank-Sum Test
When the distributions being sampled are both normal with 1 = 2, and therefore have the same shapes and spreads, either the pooled t test or the Wilcoxon test can be used (the two-sample t test assumes normality but not equal variances, so assumptions underlying its use are more restrictive in one sense and less in another than those for Wilcoxon’s test). In this situation, the pooled t test is best among all possible tests in the sense of minimizing for any fixed . However, an investigator can never be absolutely certain that underlying assumptions are satisfied.
36
Efficiency of the Wilcoxon Rank-Sum Test
It is therefore relevant to ask (1) how much is lost by using Wilcoxon’s test rather than the pooled t test when the distributions are normal with equal variances and (2) how W compares to T in nonnormal situations. The notion of test efficiency was discussed in the previous section in connection with the one-sample t test and Wilcoxon signed-rank test. The results for the two-sample tests are the same as those for the one-sample tests.
37
Efficiency of the Wilcoxon Rank-Sum Test
When normality and equal variances both hold, the rank-sum test is approximately 95% as efficient as the pooled t test in large samples. That is, the t test will give the same error probabilities as the Wilcoxon test using slightly smaller sample sizes. On the other hand, the Wilcoxon test will always be at least 86% as efficient as the pooled t test and may be much more efficient if the underlying distributions are very nonnormal, especially with heavy tails.
38
Efficiency of the Wilcoxon Rank-Sum Test
The comparison of the Wilcoxon test with the two-sample (unpooled) t test is less clear-cut. The t test is not known to be the best test in any sense, so it seems safe to conclude that as long as the population distributions have similar shapes and spreads, the behavior of the Wilcoxon test should compare quite favorably to the two-sample t test. Lastly, we note that calculations for the Wilcoxon test are quite difficult.
39
Efficiency of the Wilcoxon Rank-Sum Test
This is because the distribution of W when H0 is false depends not only on 1 – 2 but also on the shapes of the two distributions. For most underlying distributions, the nonnull distribution of W is virtually intractable. This is why statisticians have developed large-sample (asymptotic relative) efficiency as a means of comparing tests. With the capabilities of modern-day computer software, another approach to calculation of is to carry out a simulation experiment.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.