Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 15 Nonparametric Statistics Section 15.1 Compare Two Groups by Ranking
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 3 Nonparametric Statistical Methods Nonparametric methods are especially useful: When the data are ranks for the subjects, rather than quantitative measurements. When it’s inappropriate to assume normality.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 4 Example: How to Get A Better Tan Experiment: A student wanted to compare ways of getting a tan without exposure to the sun. She decided to investigate which of two treatments would give a better tan: An “instant bronze sunless tanner” lotion A tanning studio
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 5 Subjects: Five female students participated in the experiment. Three of the students were randomly selected to use the tanning lotion. The other two students used the tanning studio. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 6 Results: The girls’ tans were ranked from 1 to 5, with 1 representing the best tan. Possible Outcomes: Consider all possible rankings of the girls’ tans. A table of possibilities is displayed on the next page. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 7 Table 15.1 Possible Rankings of Tanning Quality. Each case shows the three ranks for those using the tanning lotion and the two ranks for those using the tanning studio. It also shows the sample mean ranks and their difference. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 8 For each possible outcome, a mean rank is calculated for the ‘lotion’ group and for the ‘studio’ group. The difference in the mean ranks is then calculated for each outcome. For this experiment, the samples were independent random samples – the responses for the girls using the tanning lotion were independent of the responses for the girls using the tanning studio. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 9 Suppose that the two treatments have identical effects: A girl’s tan would be the same regardless of which treatment she uses. Then, each of the ten possible outcomes is equally likely. So, each outcome has probability of 1/10. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 10 Using the ten possible outcomes, we can construct a sampling distribution for the difference between the sample mean ranks. The distribution is displayed on the next slide. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 11 Table 15.2 Sampling Distribution of Difference Between Sample Mean Ranks These probabilities apply when the treatments have identical effects. For example, only one of the ten possible samples in Table 15.1 has a difference between the mean ranks equal to -2.5, so this value has probability 1/10. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 12 Graph of the Sampling Distribution: Figure 15.1 Sampling Distribution of Difference Between Sample Mean Ranks. This sampling distribution, which is symmetric around 0, applies when the treatments have identical effects. It is used for the significance test of the null hypothesis that the treatments are identical in their tanning quality. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 13 The student who planned the experiment hypothesized that the tanning studio would give a better tan than the tanning lotion. She wanted to test the null hypothesis, : The treatments are identical in tanning quality, Against the alternative hypothesis : Better tanning quality results with the tanning studio Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 14 This alternative hypothesis is one-sided. If were true, we would expect the ranks to be smaller (better) for the tanning studio. Thus, if were true, we would expect the differences between the sample mean rank for the tanning lotion and the sample mean rank for the tanning studio to be positive. Example: How to Get A Better Tan
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 15 Wilcoxon Test The test comparing two groups based on the sampling distribution of the difference between the sample mean ranks is called the Wilcoxon test. It is named after the chemist-turned-statistician, Frank Wilcoxon, who devised it in 1945.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc Assumptions: Independent random samples from two groups. 2.Hypotheses: : Identical population distributions for the two groups (this implies equal expected values for the sample mean ranks). : Different expected values for the sample mean ranks (two-sided), or : Higher expected value for the sample mean rank for a specified group (one-sided). SUMMARY: Wilcoxon Nonparametric Test for Comparing Two Groups
Copyright © 2013, 2009, and 2007, Pearson Education, Inc Test Statistic: Difference between sample mean ranks for the two groups (Equivalently, can use sum of ranks for one sample). 4. P-value: One-tail or two-tail probability, depending on the alternative hypothesis, that the difference between the sample mean ranks is as extreme or more extreme than observed. 5. Conclusion: Report the P-value and interpret it. If a decision is needed, reject if the P-value significance level such as SUMMARY: Wilcoxon Nonparametric Test for Comparing Two Groups
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 18 For the actual experiment: The ranks were (2,4,5) for the girls using the tanning lotion. The ranks were (1,3) for the girls using the tanning studio. Example: Tanning Studio Versus Tanning Lotion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 19 The mean rank for the tanning lotion is: (2+4+5)/3 = 3.7 The mean rank for the tanning studio is: (1+3)/2=2 The test statistic is the difference between the sample mean ranks: 3.7 – 2 = 1.7 Example: Tanning Studio Versus Tanning Lotion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 20 The one-sided alternative hypothesis states that the tanning studio gives a better tan. This means that the expected mean rank would be larger for the tanning lotion than for the tanning studio, if is true. And, the difference between the mean ranks would be positive. Example: Tanning Studio Versus Tanning Lotion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 21 The test statistic we obtained from the data was: Difference between the sample mean ranks = 1.7 P-value = P(difference between sample mean ranks at least as large as 1.7). Example: Tanning Studio Versus Tanning Lotion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 22 The P-value can be obtained from the graph of the sampling distribution (as seen on a previous slide and displayed again here): Example: Tanning Studio Versus Tanning Lotion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 23 P-value = 0.20 This is not a very small P-value The evidence does not strongly support the claim that the tanning studio gives a better tan. Example: Tanning Studio Versus Tanning Lotion
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 24 The Wilcoxon Rank Sum The Wilcoxon test can, equivalently, use as the test statistic the sum of the ranks in just one of the samples. This statistic will have the same probabilities as the differences between the sample mean ranks. Some software reports the sum of ranks as the Wilcoxon rank sum statistic.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 25 Suppose the experiment was designed with a two- sided alternative hypothesis: : The treatments are identical in tanning quality : The treatments are different in tanning quality Example: Is there a treatment difference between the UV Tanning Studio and the Tanning Lotion?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 26 Table 15.3 Sampling Distribution of Sum of Ranks. The observed tanning lotion ranks of (2, 4, 5) have a rank sum of 11. These ranks imply that the tanning studio ranks were (1, 3) and that the difference between the sample mean ranks was 1.7. Example: Is there a treatment difference between the UV Tanning Studio and the Tanning Lotion?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 27 The Wilcoxon Rank Sum Often, ties occur when we rank the observations. In this case, we average the ranks in assigning them to those subjects. Example: suppose a girl using the tanning studio got the best tans, two girls using the tanning lotion got the two worst tans, but the other two girls had equally good tans. Tanning studio ranks: 1, 2.5 Tanning lotion ranks: 2.5, 4, 5
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 28 Using the Wilcoxon Test with a Quantitative Response When the response variable is quantitative, the Wilcoxon test is applied by converting the observations to ranks. For the combined sample, the observations are ordered from smallest to largest, the smallest observations gets rank 1, the second smallest gets rank 2, and so forth. The test compares the mean ranks for the two samples.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 29 Example: Driving Reaction Times Experiment: A sample of 64 college students were randomly assigned to a cell phone group or a control group, 32 to each. On a machine that simulated driving situations, participants were instructed to press a “brake button” when they detected a red light.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 30 Experiment: The control group listened to the radio while they performed the simulated driving. The cell phone group carried out a conversation on a cell phone. Each subject’s response time to the red lights is recorded and averaged over all of his/her trials. Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 31 Boxplots of the data: Figure 15.2 Box Plots of Response Times for Cell Phone Study. Question: Does either box plot show any irregularities that suggest it’s safer to use a nonparametric test than a two-sample t test? Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 32 The box plots do not show any substantial skew, but there is an extreme outlier for the cell phone group. The t inferences that we have used previously assume normal population distributions. The Wilcoxon Test does not assume normality. This test can be used in place of the t test if the normality assumption is in question. Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 33 To use the Wilcoxon test, we need to rank the data (response times) from 1 (smallest reaction time) to 64 (largest reaction time). The test statistic is then calculated from the ranks. Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 34 The next slide shows the output for the hypothesis test: : The distribution of reaction times is identical for the two groups. : The distribution of reaction times differs for the two groups. Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 35 Table 15.5 SPSS Output for Wilcoxon Test with Data from Cell Phone Study Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 36 The small P-value (.019) shows strong evidence against the null hypothesis. The sample mean ranks suggest that reaction times tend to be slower for those using cell phones. Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 37 Insight: The Wilcoxon test is not affected by outliers. No matter how far the largest observation falls from the next largest, it still gets the same rank of 1. Example: Driving Reaction Times
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 38 Nonparametric Estimation Comparing Two Groups When the response variable is quantitative, we can compare a measure of center for the two groups. One way to do this is by comparing means. This method requires the assumption of normal population distributions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 39 When the response distribution is highly skewed, nonparametric methods are preferred. For highly skewed distributions, a better measure of the center is the median. We can then estimate the difference between the population medians for the two groups. Nonparametric Estimation Comparing Two Groups
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 40 Most software for the Wilcoxon test reports point and interval estimates comparing medians. Some software refers to the equivalent Mann- Whitney test. Nonparametric Estimation Comparing Two Groups
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 41 The Wilcoxon test (and the Mann-Whitney test) does not require a normal population assumption. It does require an extra assumption: the population distributions for the two groups are symmetric and have the same shape. Nonparametric Estimation Comparing Two Groups
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 42 Example: Nonparametric Estimation Comparing Two Groups The point estimate for the difference in medians is given by 44.5 (note that this is not the same as the difference between the two sample medians) A 95.1% CI for the difference is (8.99, 79.01) Since 0 is not included in the interval, we conclude that the median reaction times are not the same for the cell phone and control groups Table 15.6 MINITAB Output for Comparing Medians for Cell Phone Group and Control Group
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 15 Nonparametric Statistics Section 15.2 Nonparametric Methods For Several Groups and for Matched Pairs
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 44 Comparing Mean Ranks of Several Groups The Wilcoxon test for comparing mean ranks of two groups extends to a comparison of mean ranks for several groups. This test is called the Kruskal-Wallis test.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 45 ANOVA test vs. Kruskal-Wallis Test Both tests are used to compare many groups. The ANOVA F test assumes normal population distributions. The Kruskal-Wallis test does not make this assumption. The Kruskal-Wallis test is a “safer” method to use with small samples when not much information is available about the shape of the distributions. The Kruskal-Wallis test is also useful when the data are merely ranks and we don’t have a quantitative measurement of the response variable.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 46 Summary: Kruskal-Wallis Test 1.Assumptions: Independent random samples from several (g) groups. 2. Hypotheses: Identical population distributions for the g groups. Population distributions not all identical.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc Test statistic: Uses between-groups variability of sample mean ranks. Software easily calculates this. 4. P-value: Right-tail probability above observed test statistic value from chi-squared distribution with df=g Conclusion: Report the P-value and interpret in context. Summary: Kruskal-Wallis Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 48 Example: Frequent Dating and College GPA Experiment: A student in a statistics class (Tim) decided to study whether dating was associated with college GPA. He wondered whether students who date a lot tend to have poorer GPA’s.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 49 Experiment: He asked 17 students in the class to anonymously fill out a short questionnaire in which they were asked to give their college GPA and to indicate whether, during their college careers, they had dated regularly, occasionally, or rarely. Example: Frequent Dating and College GPA
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 50 Dot plots of the GPA data for the 3 dating groups: Figure 15.4 Dot Plots of GPA by Dating Group. Question: Why might we be nervous about using the ordinary ANOVA F test to compare mean GPA for the three dating groups? Example: Frequent Dating and College GPA
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 51 Since the dot plots showed evidence of severe skew to the left and since the sample size was small in each group, Tim felt safer analyzing the data with the Kruskal- Wallis test than with the ordinary ANOVA F test. Example: Frequent Dating and College GPA
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 52 The hypotheses for the Kruskal-Wallis test: Identical population distributions for the three dating groups. Population distributions for the three dating groups are not all identical. Example: Frequent Dating and College GPA
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 53 This table shows the data with the GPA values ordered from smallest to largest for each dating group. Table 15.7 College GPA by Dating Group Example: Frequent Dating and College GPA
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 54 MINITAB output for the Kruskal-Wallis test: Table 15.8 Results of Kruskal-Wallis Test for Data in Table 15.7 Example: Frequent Dating and College GPA
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 55 The test statistic reported in the output is H = The corresponding P-value reported in the output is This large P-value does not give evidence against. It is plausible that GPA is independent of dating group. Example: Frequent Dating and College GPA
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 56 Comparing Matched Pairs: The Sign Test When the samples are dependent, different methods must be used: Tanning example: suppose that a crossover design was used - the same subjects get a tan using one treatment and when it wears off, they get a tan using the other treatment. The order of using the two treatments is random. For each subject, we observe which treatment gives the better tan.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 57 For such a matched pairs experiment, let p denote the population proportion of cases for which a particular treatment does better than the other treatment. Under the null hypothesis of identical treatment effects, p=0.50, i.e., each treatment should have the better response outcome about half the time (we ignore cases in which each treatment gives the same response). Comparing Matched Pairs: The Sign Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 58 SUMMARY: Sign Test for Matched Pairs 1.Assumptions: Random sample of matched pairs for which we can evaluate which observation in a pair has the better response. 2. Hypotheses: Population proportion p=0.50 who make better response for a particular group (two-sided) or (one-sided) Comparing Matched Pairs: The Sign Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc Test Statistic: 4. P-value: For large samples, use tail probabilities from standard normal. For smaller n, use binomial distribution. 5. Conclusion: Report the P-value and interpret in context. Comparing Matched Pairs: The Sign Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 60 Example: Comparing Matched Pairs: The Sign Test Which do most students spend more time doing - browsing the Internet or watching TV? Survey results of first 3 students from University of GA. StudentInternetTV
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 61 Let p denote the population proportion who spent more time watching TV. For the entire sample, 35 students spent more time watching TV and 19 students spent more time browsing the Internet. Example: Comparing Matched Pairs: The Sign Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 62 Test statistic: n=35+19=54 z=( )/0.068= 2.18 From the normal table, the two-sided P-value is This provides considerable evidence that most students spend more time watching TV than browsing the Internet. Example: Comparing Matched Pairs: The Sign Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 63 Note: The sign test uses merely the information about which response is higher and how many responses are higher, not the quantitative information about how much higher. Example: Comparing Matched Pairs: The Sign Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 64 Example: Crossover Experiment Comparing Tanning Methods The Sign Test for Small n For small n, we can conduct the sign test using the binomial distribution. Example: Another tanning experiment was run in which the same 5 girls received each treatment (lotion and studio). The tanning studio gave a better tan than the lotion for four of the five girls.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 65 If p=0.5, the binomial probability that x=4 of the n=5 girls would get better tans with the tanning studio is The more extreme result that all five girls would get better tans with the tanning studio has probability. Example: Crossover Experiment Comparing Tanning Methods
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 66 P-value =P(4)+P(5)= 0.19 The evidence is not strong that more girls get a better tan from the tanning studio than the tanning lotion. Example: Crossover Experiment Comparing Tanning Methods
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 67 Ranking Matched Pairs: The Wilcoxon Signed-Ranks Test With matched pairs data, for each pair the sign test merely observes which treatment does better, but not how much better. The Wilcoxon signed-ranks test is a nonparametric test designed for cases in which comparisons of the paired observations can themselves be ranked.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 68 For each matched pair of responses, the Wilcoxon signed-ranks test measures the difference between the responses. It tests the hypothesis: : population median of difference scores is 0 The test statistic is the sum of the ranks for the positive differences. Ranking Matched Pairs: The Wilcoxon Signed-Ranks Test
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 69 Example: GRE Test Scores Three students volunteered for a study to determine if taking a two-day workshop on GRE preparation improved their GRE analytical writing score from a previous score. Subject 123 Before After33.53
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 70 The Wilcoxon test begins by calculating the difference and then the absolute value, of each instance. The remaining absolute differences are then ranked from lowest to highest, with tied ranks included where appropriate. We sum the ranks for the differences that are positive (namely 0.5 and 1.5): = 4.5. Example: GRE Test Scores
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 71 The table gives all possible samples with absolute difference values of 0.5 and 1.5. Our sample is sample 1. Table 15.9 Possible Samples with Absolute Difference Values of Sample Example: GRE Test Scores
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 72 If the workshop has no effect, then the eight possible samples in the table are equally likely. For example, the rank sum was 4.5 for two of the eight samples, so its probability is 2/8. The P-value is the probability that this sum of ranks is at least as large as observed. Since three of the eight possible samples had a rank sum for the positive differences of at least 4.5 (the observed value), the P-value is 3/8 = Example: GRE Test Scores
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 73 The Wilcoxon Signed-Ranks Test Wilcoxon Signed-Ranks test Advantage is that it can take into account the sizes of the differences and not merely their sign. Disadvantage is that it requires an additional assumption: the population distribution of the difference scores must be symmetric.