Slide 1 Copyright © 2004 Pearson Education, Inc.
Slide 2 Copyright © 2004 Pearson Education, Inc. Chapter 12 Nonparametric Statistics 12-1Overview 12-2Sign Test 12-3Wilcoxon Signed-Ranks Test for Matched Pairs 12-4 Wilcoxon Rank-Sum Test for Two Independent Samples 12-5 Kruskal-Wallis Test 12-6 Rank Correlation 12-7Runs Test for Randomness
Slide 3 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 12-1 & 12-2 Overview and Sign Test
Slide 4 Copyright © 2004 Pearson Education, Inc. Definitions Parametric tests Parametric tests require assumptions about the nature or shape of the populations involved. Nonparametric tests Nonparametric tests do not require such assumptions. Consequently, these tests are called distribution-free tests. Overview
Slide 5 Copyright © 2004 Pearson Education, Inc. Advantages of Nonparametric Methods 1. Nonparametric methods can be applied to a wide variety of situations because they do not have the more rigid requirements of the corresponding parametric methods. In particular, nonparametric methods do not require normally distributed populations. 2. Unlike parametric methods, nonparametric methods can often be applied to nonnumerical data, such as the genders of survey respondents. 3. Nonparametric methods usually involve simpler computations than the corresponding parametric methods and are therefore easier to understand and apply.
Slide 6 Copyright © 2004 Pearson Education, Inc. Disadvantages of Nonparametric Methods 1. Nonparametric methods tend to waste information because exact numerical data are often reduced to a qualitative form. 2. Nonparametric tests are not as efficient as parametric tests, so with a nonparametric test we generally need stronger evidence (such as a larger sample or greater differences) before we reject a null hypothesis.
Slide 7 Copyright © 2004 Pearson Education, Inc. Efficiency of Nonparametric Methods
Slide 8 Copyright © 2004 Pearson Education, Inc. Definition Data are sorted when they are arranged according to some criterion, such as smallest to the largest or best to worst. A rank is a number assigned to an individual sample according to its order in the ranked list. The first item is assigned the rank of 1, the second is assigned the rank of 2, and so on.
Slide 9 Copyright © 2004 Pearson Education, Inc. Example Original scores Scores arranged in order Ranks
Slide 10 Copyright © 2004 Pearson Education, Inc. Handling Ties in Ranks Find the mean of the ranks involved and assign this mean rank to each of the tied items. 2 and 3 are tied Original scores Ranks
Slide 11 Copyright © 2004 Pearson Education, Inc. Sign Test Definition The sign test is a nonparametric (distribution free) test that uses plus and minus signs to test different claims, including: 1) Claims involving matched pairs of sample data; 2) Claims involving nominal data; 3) Claims about the median of a single population.
Slide 12 Copyright © 2004 Pearson Education, Inc. Figure 12-1 Sign Test Procedure
Slide 13 Copyright © 2004 Pearson Education, Inc. Figure 12-1 SignTest Procedure
Slide 14 Copyright © 2004 Pearson Education, Inc. Figure 12-1 Sign Test Procedure
Slide 15 Copyright © 2004 Pearson Education, Inc. Assumptions 1. The sample data have been randomly selected. 2. There is no requirement that the sample data come from a population with a particular distribution, such a normal distribution.
Slide 16 Copyright © 2004 Pearson Education, Inc. Notation for Sign Test x = the number of times the less frequent sign occurs n = the total number of positive and negative signs combined
Slide 17 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Sign Test For n 25 : x (the number of times the less frequent sign occurs) Critical values: For n 25, critical x values are in Table A-7 For n > 25, critical z values are in Table A-2 z = For n > 25 : n ( x + 0.5) – n 2 2
Slide 18 Copyright © 2004 Pearson Education, Inc. Claims Involving Matched Pairs Convert the raw data to plus and minus signs as follows: 1. Subtract each value of the second variable from the corresponding value of the first variable 2. Record only the sign of the difference found in step 1. Exclude ties: that is, any matched pairs in which both values are equal.
Slide 19 Copyright © 2004 Pearson Education, Inc. Key Principle of Sign Test If the two sets of data have equal medians, the number of positive signs should be approximately equal to the number of negative signs.
Slide 20 Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children Use the data in Table 12-2 with a 0.05 significance level to test the claim that there is no difference between the times of the first and second trials.
Slide 21 Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children
Slide 22 Copyright © 2004 Pearson Education, Inc. H 0 : The median of the difference is equal to 0. H 1 : The median of the difference is not equal to 0. = 0.05 x = minimum(12, 2) = 2 Critical value = 2 Use the data in Table 12-2 with a 0.05 significance level to test the claim that there is no difference between the times of the first and second trials. Example: Intelligence in Children
Slide 23 Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children Use the data in Table 12-2 with a 0.05 significance level to test the claim that there is no difference between the times of the first and second trials. We reject the null hypothesis. There is sufficient evidence to warrant rejection of the claim of no difference between the times; that is, the median is equal to 0.
Slide 24 Copyright © 2004 Pearson Education, Inc. Example: Gender Discrimination Hatters Restaurant Chain hired 30 men and 70 women. Use the sign test and a 0.05 significance level to test the null hypothesis that men and women are hired equally by this company. H 0 : p = 0.5 H 1 : p 0.5 x = minimum(30, 70) = 30
Slide 25 Copyright © 2004 Pearson Education, Inc. Example: Gender Discrimination Hatters Restaurant Chain hired 30 men and 70 women. Use the sign test and a 0.05 significance level to test the null hypothesis that men and women are hired equally by this company. n ( x + 0.5) – z = n 2 2 ( ) – z = = –3.90
Slide 26 Copyright © 2004 Pearson Education, Inc. Example: Gender Discrimination Hatters Restaurant Chain hired 30 men and 70 women. Use the sign test and a 0.05 significance level to test the null hypothesis that men and women are hired equally by this company. With = 0.05, the critical values are z = We reject the null hypothesis. There is sufficient evidence to warrant rejection of the claim that hiring practices are fair.
Slide 27 Copyright © 2004 Pearson Education, Inc. Example: Body Temperature Use the 106 temperatures in Data Set 4 on Day 2 with the sign test to test the claim that the median is less than 98.6°F. There are 68 subjects with temperatures greater than 98.6°F, 23 subjects with temperatures less than 98.6°F, and 15 subjects with temperatures equal to 98.6°F. H 0 : Median is equal to 98.6°F. H 1 : Median is less than 98.6°F.
Slide 28 Copyright © 2004 Pearson Education, Inc. Example: Body Temperature Use the 106 temperatures in Data Set 4 on Day 2 with the sign test to test the claim that the median is less than 98.6°F. There are 68 subjects with temperatures greater than 98.6°F, 23 subjects with temperatures less than 98.6°F, and 15 subjects with temperatures equal to 98.6°F. ( x + 0.5) – z = n 2 2 ( ) – z = = –4.61 n
Slide 29 Copyright © 2004 Pearson Education, Inc. Example: Body Temperature Use the 106 temperatures in Data Set 4 on Day 2 with the sign test to test the claim that the median is less than 98.6°F. There are 68 subjects with temperatures greater than 98.6°F, 23 subjects with temperatures less than 98.6°F, and 15 subjects with temperatures equal to 98.6°F. We use Table A-2 to get the critical z value of – We can see that the test statistic of z = –4.61 falls into the critical region. We therefore reject the null hypothesis. We support the claim that the median body temperature of healthy adults is less than 98.6°F.
Slide 30 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 12-3 Wilcoxon Signed-Ranks Test for Matched Pairs
Slide 31 Copyright © 2004 Pearson Education, Inc. The Wilcoxon signed-ranks test is a nonparametric test that uses ranks of sample data consisting of matched pairs. It is used to test for differences in the population distributions. Definition
Slide 32 Copyright © 2004 Pearson Education, Inc. Wilcoxon Signed-Ranks Tests H 0 : The two samples come from populations with the same distribution. H 1 : The two samples come from populations with different distributions.
Slide 33 Copyright © 2004 Pearson Education, Inc. Procedure for Finding the Value of the Test Statistic Step 1: For each pair of data, find the difference d by subtracting the second score from the first, Keep signs, but discard any pairs for which d = 0. Step 2: Ignore the signs of the differences, then sort the differences from lowest to highest and replace the differences by the corresponding rank value. When differences have the same numerical value, assign to them the mean of the ranks involved in the tie. Step 3: Attach to each rank the sign difference from which it came. That is, insert those signs that were ignored in step 2. Step 4: Find the sum of the absolute values of the negative ranks. Also find the sum of the positive ranks. (continued)
Slide 34 Copyright © 2004 Pearson Education, Inc. Step 5: Let T be the smaller of the two sums found in step 4. Either sum could be used, but for a simplified procedure we arbitrarily select the smaller of the two sums. Step 6: Let n be the number of pairs of data for which the difference d is not 0. Step 7: Determine the test statistic and critical values based on the sample size, as shown below. Step 8: When forming the conclusion, reject the null hypothesis if the sample data lead to a test statistic that is in the critical region - that is, the test statistic is less than equal or equal to the critical value(s). Otherwise, fail to reject the null hypothesis. Procedure for Finding the Value of the Test Statistic
Slide 35 Copyright © 2004 Pearson Education, Inc. 1. The sample data have been randomly selected. 2. The population of differences (found from the pairs of data) has a distribution that is approximately symmetric, meaning that the left half of its histogram is roughly a mirror image of its right half. (There is no requirement that the data have a normal distribution. Wilcoxon Signed-Ranks Tests Assumptions
Slide 36 Copyright © 2004 Pearson Education, Inc. Notation T = the smaller of the following two sums: 1. The sum of the absolute values of the negative ranks 2. The sum of the positive ranks
Slide 37 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Wilcoxon Signed-Ranks Test for Matched Pairs For n 30 : T Critical values: For n 30, critical T values are in Table A-8 For n > 30, critical z values are in Table A-2 z = For n > 30 : 4 T – n ( n + 1) n(n +1) (2n +1) 24
Slide 38 Copyright © 2004 Pearson Education, Inc. Example: Intelligence in Children Use the data in Table 12-3 with the Wilcoxon signed-ranks test and 0.05 significance level to test the claim that there is no difference between the times of the first and second trials.
Slide 39 Copyright © 2004 Pearson Education, Inc. H 0 : There is no difference between the times of the first and second trials. H 1 : There is a difference between the times of the first and second trials. The differences in row three of the table are found by computing the first time – second time. The ranks of differences in row four of the table are found by ranking the absolute differences, handling ties by assigning the mean of the ranks. The signed ranks in row five of the table are found by attaching the sign of the differences to the ranks.
Slide 40 Copyright © 2004 Pearson Education, Inc. H 0 : There is no difference between the times of the first and second trials. H 1 : There is a difference between the times of the first and second trials. Find the sum of the absolute values of the negative ranks: 5.5 Find the sum of the values of the positive ranks: 99.5 T = 5.5 (the smaller of the two sums) Let n be the number of pairs where d 0, so n = 14. Since n 30, T = 5.5 will be the test statistic. Using Table A- 8, the critical value will be 21.
Slide 41 Copyright © 2004 Pearson Education, Inc. H 0 : There is no difference between the times of the first and second trials. H 1 : There is a difference between the times of the first and second trials. Since the test statistic ( T = 5.5) is less than the critical value of 21, we reject the null hypothesis (Step 8 of procedures). It appears that there is a difference between the times of the first and second trials.
Slide 42 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 12-4 Wilcoxon Rank-Sum Test for Two Independent Samples
Slide 43 Copyright © 2004 Pearson Education, Inc. Wilcoxon Rank-Sum Test for Two Independent Samples Definition The Wilcoxon rank-sum test is a nonparametric test that uses ranks of sample data from two independent populations. It is used to test the null hypothesis that the two independent samples come from populations with the same distribution. (That is, the two populations are identical.)
Slide 44 Copyright © 2004 Pearson Education, Inc. Key Idea If two samples are drawn from identical populations and the individual values are all ranked as one combined collection of values, then the high and low ranks should fall evenly between the two samples.
Slide 45 Copyright © 2004 Pearson Education, Inc. Assumptions 1. There are two independent samples that were randomly selected. 2. Each of the two samples has more than 10 values. 3. There is no requirement that the two populations have a normal distribution or any other particular distribution.
Slide 46 Copyright © 2004 Pearson Education, Inc. Procedure for Finding the Value of the Test Statistic 1. Temporarily combine the two samples into one big sample, then replace each sample value with its rank. 2. Find the sum of the ranks for either one of the two samples. 3. Calculate the value of the z test statistic as shown next, where either sample can used as ‘sample 1’.
Slide 47 Copyright © 2004 Pearson Education, Inc. n 1 = size of sample 1 n 2 = size of sample 2 R 1 = sum of ranks for sample 1 R 2 = sum of ranks for sample 2 R = same as R 1 (sum of ranks for sample 1) R = mean of the sample R values that is expected when the two populations are identical R = standard deviation of the sample R values that is expected when the two populations are identical Notation for the Wilcoxon Rank-Sum Test
Slide 48 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Wilcoxon Rank-Sum Test for Two Independent Samples n 1 = size of the sample from which the rank sum R is found n 2 = size of the other sample R = sum of ranks of the sample with size n 1 R – R z =z = RR R = n 1 n 2 (n 1 + n ) 12 n 1 (n 1 + n ) 2 = RR where
Slide 49 Copyright © 2004 Pearson Education, Inc. Critical Values Can be found in Table A-2 (because the test statistic is based on the normal distribution) Test Statistic for the Wilcoxon Rank-Sum Test for Two Independent Samples
Slide 50 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Use the data in Table with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution.
Slide 51 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. H 0 : The Rowling and Tolstoy books have Flesch Reading Ease scores with the same distribution. H 1 : The Rowling and Tolstoy books have distributions of Flesch Reading Ease scores that are different in some way.
Slide 52 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. R = = n 1 (n 1 + n ) 2 = RR 13 ( ) 2 = RR = 169
Slide 53 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. R = n 1 n 2 (n 1 + n ) 12 R = (13)(12)( ) 12 =
Slide 54 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. R – R z =z = RR – 169 z =z = = 3.67
Slide 55 Copyright © 2004 Pearson Education, Inc. Example: Rowling and Tolstoy Use the data in Table 12-4 with the Wilcoxon rank-sum test and a 0.05 significance level to test the claim that reading scores for pages from the two books have the same distribution. We have a two tailed test with an = 0.05, so the critical values are 1.96 and –1.96. The test statistic of 3.67 falls in the critical region, so we reject the null hypothesis that the Rowling and Tolstoy books have the same reading scores.
Slide 56 Copyright © 2004 Pearson Education, Inc. Example: Wednesday and Saturday Rain Use the data from the Chapter Problem (shown in the Minitab printout) with the Wilcoxon rank-sum test to test the claim that the rainfall amounts for Wednesdays and Saturdays have the same distribution.
Slide 57 Copyright © 2004 Pearson Education, Inc. Example: Wednesday and Saturday Rain Use the data from the Chapter Problem (shown in the Minitab printout) with the Wilcoxon rank-sum test to test the claim that the rainfall amounts for Wednesdays and Saturdays have the same distribution. H 0 : The Wednesday and Saturday rainfall amounts come from populations with the same distribution. H 1 : The two distributions are different in some way.
Slide 58 Copyright © 2004 Pearson Education, Inc. Example: Wednesday and Saturday Rain Use the data from the Chapter Problem (shown in the Minitab printout) with the Wilcoxon rank-sum test to test the claim that the rainfall amounts for Wednesdays and Saturdays have the same distribution. The rank sum is W = , the P -value = (or after adjustment for ties). We cannot reject the null hypothesis. The differences between Wednesday and Saturday are not significant.
Slide 59 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 12-5 Kruskal-Wallis Test
Slide 60 Copyright © 2004 Pearson Education, Inc. Kruskal-Wallis Test (also call the H test) Definition The Kruskal-Wallis test is a nonparametric test that uses ranks of sample data from three or more independent populations. It is used to test the null hypothesis that the independent samples come from populations with the same distribution.
Slide 61 Copyright © 2004 Pearson Education, Inc. Kruskal-Wallis Test (also call the H test) Hypotheses H 0 : The samples come from populations with the same distribution. H 1 : The samples come from populations with different distributions.
Slide 62 Copyright © 2004 Pearson Education, Inc. Kruskal-Wallis Test We compute the test statistic H, which has a distribution that can be approximated by the chi- square ( 2 ) distribution as long as each sample has at least 5 observations.
Slide 63 Copyright © 2004 Pearson Education, Inc. Procedure for Finding the Value of the Test Statistic 1 Temporarily combine all samples into one big sample and assign a rank to each sample value. (Sort from lowest to highest, and in cases of ties, assign each observation the mean of the ranks involved.) 2.For each sample, find the sum of the ranks and find the sample size. 3.Calculate H by using results of Step 2 and the following:
Slide 64 Copyright © 2004 Pearson Education, Inc. Assumptions 1. We have at least three independent samples, all of which are randomly selected. 2. Each sample has at least 5 observations. 3. There is no requirement that the populations have a normal distribution or any other particular distribution.
Slide 65 Copyright © 2004 Pearson Education, Inc. Notation for the Kruskal-Wallis Test N = total number of observations combined k = number of samples R 1 = sum of ranks for sample 1 n 1 = number of observations in sample 1 For sample 2, the sum of ranks is R 2 and the number of observations is n 2, and similar notation is used for the other samples.
Slide 66 Copyright © 2004 Pearson Education, Inc. H = – 3 (N +1) Test Statistic for the Kruskal-Wallis Test where degrees of freedom = k – 1 R1R1 R2R2 RkRk n1n1 n2n2 nknk N ( N + 1 )
Slide 67 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Kruskal-Wallis Test Critical Values 1. Test is right-tailed. 2. Use Table A-4 (because the H test statistic can be approximated by the 2 distribution). 3. Degrees of freedom = k – 1
Slide 68 Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution.
Slide 69 Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. H 0 : The populations of the readability scores for pages from the three books are identical. H 1 : The three populations are not identical.
Slide 70 Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. n 1 = 12 n 2 = 12 n 3 = 12 N = 36 R 1 = R 2 = 337 R 3 = 127.5
Slide 71 Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. H = – 3 ( N +1) N ( N + 1 ) 12 R1R1 R2R2 RkRk n1n1 n2n2 nknk H = + + – 3 (36 +1) ( ) 12 H =
Slide 72 Copyright © 2004 Pearson Education, Inc. Example: Clancy, Rowling and Tolstoy Use the data in Table 12-5 with the Kruskal-Wallis test to test the claim that reading scores for pages from the three samples have the same distribution. The critical value is 2 = 5.991, which corresponds to 2 degrees of freedom and a 0.05 level of significance. We reject the null hypothesis of equal means.
Slide 73 Copyright © 2004 Pearson Education, Inc. Example: Rains More on Weekends? Use the Data Set 11 in Appendix B to test the claim that the seven weekdays have distributions that are not all the same.
Slide 74 Copyright © 2004 Pearson Education, Inc. Example: Rains More on Weekends? Use the Data Set 11 in Appendix B to test the claim that the seven weekdays have distributions that are not all the same. H 0 : The populations of the weekday rainfall data are identical. H 1 : The populations of the weekday rainfall data are not identical.
Slide 75 Copyright © 2004 Pearson Education, Inc. Example: Rains More on Weekends? Use the Data Set 11 in Appendix B to test the claim that the seven weekdays have distributions that are not all the same. The test statistic H = 3.85 (adjusted for ties), and the P - value is We fail to reject the null hypotheis. There is not enough evidence to support a claim that the rainfall amounts on the seven weekdays have distributions that are not all the same.
Slide 76 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 12-6 Rank Correlation
Slide 77 Copyright © 2004 Pearson Education, Inc. Rank Correlation Definition Rank Correlation uses the ranks of sample data consisting of matched pairs. The rank correlation test is used to test for an association between two variables H o : s = 0 (There is no correlation between the two variables.) H 1 : s 0 (There is a correlation between the two variables.)
Slide 78 Copyright © 2004 Pearson Education, Inc. Advantages 1. The nonparametric method of rank correlation can be used in a wider variety of circumstances than the parametric method of linear correlation. With rank correlation, we can analyze paired data that are ranks or can be converted to ranks. 2. Rank correlation can be used to detect some (not all) relationships that are not linear. 3. The computations for rank correlation are much simpler than the computations for linear correlation, as can be readily seen by comparing the formulas used to compute these statistics.
Slide 79 Copyright © 2004 Pearson Education, Inc. Disadvantages A disadvantage of rank correlation is its efficiency rating of 0.91, as described in Section This efficiency rating shows that with all other circumstances being equal, the nonparametric approach of rank correlation requires 100 pairs of sample data to achieve the same results as only 91 pairs of sample observations analyzed through parametric methods.
Slide 80 Copyright © 2004 Pearson Education, Inc. Assumptions 1. The sample data have been randomly selected. 2. Unlike the parametric methods of Section 9-2, there is no requirement that the sample pairs of data have a bivariate normal distribution. There is no requirement of a normal distribution for any population.
Slide 81 Copyright © 2004 Pearson Education, Inc. Notation r s = rank correlation coefficient for sample paired data ( r s is a sample statistic) s = rank correlation coefficient for all the population data ( s is a population parameter) n = number of pairs of data d = difference between ranks for the two values within a pair r s is often called Spearman’s rank correlation coefficient
Slide 82 Copyright © 2004 Pearson Education, Inc. Test Statistic for the Rank Correlation Coefficient where each value of d is a difference between the ranks for a pair of sample data Critical values: If n 30, refer to Table A-9 If n > 30, use Formula 12-1 r s = 1 – 6 d 2 n(n 2 – 1 )
Slide 83 Copyright © 2004 Pearson Education, Inc. Formula 12-1 where the value of z corresponds to the significance level r s = n – 1 z (critical values when n > 30)
Slide 84 Copyright © 2004 Pearson Education, Inc. Complete the computation of to get the sample statistic. Figure 12-4 Rank Correlation for Testing H 0 : s = 0 Start Calculate the difference d for each pair of ranks by subtracting the lower rank from the higher rank. Let n equal the total number of signs. Are the n pairs of data in the form of ranks ? Convert the data of the first sample to ranks from 1 to n and then do the same for the second sample. No Square each difference d and then find the sum of those squares to get r s = 1 – 6d26d2 n(n 2 –1 ) (d2)(d2) Yes
Slide 85 Copyright © 2004 Pearson Education, Inc. Complete the computation of to get the sample statistic. r s = 1 – 6d26d2 n(n 2 – 1 ) Is n 30 ? If the sample statistic r s is positive and exceeds the positive critical value, there is a correlation. If the sample statistic r s is negative and is less than the negative critical value, there is a correlation. If the sample statistic r s is between the positive and negative critical values, there is no correlation. Find the critical values of r s in Table A-9 Calculate the critical values where z corresponds to the significance level r s = n – 1 z Yes No Figure 13-4 Rank Correlation for Testing H 0 : s = 0
Slide 86 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of = 0.05.
Slide 87 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of = H 0 : s = 0 H 1 : s 0 n = 10 r s = 1 – 6 d 2 n(n 2 – 1 ) r s = r s = 1 – 6(74) 10(10 2 – 1 )
Slide 88 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty Use the data in Table 12-6 to determine if there is a correlation between the rankings of men and women in terms of what they find attractive. Use a significance level of = We refer to Table A-9 to determine that the critical values are Because the test statistic of r s = does not exceed the critical value of 0.648, we fail to reject the null hypothesis. There is no sufficient evidence to support a claim of a correlation between the rankings of men and women.
Slide 89 Copyright © 2004 Pearson Education, Inc. Assume that the preceding example is expanded by including a total of 40 women and that the test statistic r s is found to be If the significance level of = 0.05, what do you conclude about the correlation? Example: Perceptions of Beauty with Large Samples
Slide 90 Copyright © 2004 Pearson Education, Inc. Example: Perceptions of Beauty with Large Samples r s = n – 1 z r s = 40 – 1 1.96 = These are the critical values.
Slide 91 Copyright © 2004 Pearson Education, Inc. The test statistic of r s = does not exceed the critical value of 0.314, so we fail to reject the null hypothesis. There is not sufficient evidence to support the claim of a correlation between men and women. Example: Perceptions of Beauty with Large Samples
Slide 92 Copyright © 2004 Pearson Education, Inc. The data in Table 12-7 are the numbers of games played and the last scores (in millions) of a Raiders of the Lost Ark pinball game. We expect that there should be an association between the number of games played and the pinball score. Is there sufficient evidence to support the claim that there is such an association? Example: Detecting a Nonlinear Pattern
Slide 93 Copyright © 2004 Pearson Education, Inc. Example: Detecting a Nonlinear Pattern
Slide 94 Copyright © 2004 Pearson Education, Inc. H 0: s = 0 H 1: s 0 n = 9 r s = 1 – 6 d 2 n(n 2 – 1 ) r s = 1 – 6(6) 9(9 2 – 1 ) r s = Example: Detecting a Nonlinear Pattern
Slide 95 Copyright © 2004 Pearson Education, Inc. Example: Detecting a Nonlinear Pattern We use Table A-9 to get the critical values of The sample statistic of exceeds the critical value of 0.683, so we conclude that there is significant correlation. Higher numbers of games played appear to be associated with higher scores.
Slide 96 Copyright © 2004 Pearson Education, Inc. Created by Erin Hodgess, Houston, Texas Section 12-7 Runs Test for Randomness
Slide 97 Copyright © 2004 Pearson Education, Inc. Runs Test for Randomness Definitions Run A run is a sequence of data having the same characteristic; the sequence is preceded and followed by data with a different characteristic or no data at all. Runs Test The runs test uses the number of runs in a sequence of sample data to test for randomness in the order of the data.
Slide 98 Copyright © 2004 Pearson Education, Inc. Fundamental Principles of the Run Test Reject randomness if the number of runs is very low or very high.
Slide 99 Copyright © 2004 Pearson Education, Inc. R 2nd run D D D 3rd run R 4th run D D 1st run D D D D R R D D D R 4 runs Examples
Slide 100 Copyright © 2004 Pearson Education, Inc. If the number of runs is very low, randomness is lacking. D R D R D R D R D R 10 runs If the number of runs is very high, randomness is lacking. Examples D D D D D R R R R R only 2 runs
Slide 101 Copyright © 2004 Pearson Education, Inc. Assumptions 1. The sample data are arranged according to some ordering scheme, such as the order in which the sample values were obtained. 2. Each data value can be categorized into one of two separate categories. 3. The runs test for randomness is based on the order in which the data occur; it is not based on the frequency of the data.
Slide 102 Copyright © 2004 Pearson Education, Inc. Notation n 1 = number of elements in the sequence that have one particular characteristic (The characteristic chosen for n 1 is arbitrary.) n 2 = number of elements in the sequence that have the other characteristic G = number of runs
Slide 103 Copyright © 2004 Pearson Education, Inc. Large Sample Cases Table A-10 applies when: 1. We are using 5% as the cutoff for sequences that have too few or too many runs 2. n 1 20, and 3. n 2 20
Slide 104 Copyright © 2004 Pearson Education, Inc. where µ G = mean of the runs G G = standard deviation of the runs G and the distribution of the number of runs G is approximately normal µGµG Formula 12-2 = + 1 2n1n22n1n2 n 1 + n 2 Formula 13-3 ( 2 n 1 n 2 ) ( 2 n 1 n 2 – n 1 – n 2 ) ( n 1 + n 2 ) ( n 1 + n 2 – 1) G =G = 2 Large Sample Cases
Slide 105 Copyright © 2004 Pearson Education, Inc. Critical values: If the test statistic is G, critical values are found in Table A-10 If the test statistic is z, critical values are found in Table A-2 by using the same procedures introduced in Chapter 6. Test Statistic for the Runs Test for Randomness If = 0.05 and n 1 20 and n 2 20, the test statistic is G. If 0.05 and n 1 > 20 and n 2 > 20, the test statistic is z =z = GG – µ G G
Slide 106 Copyright © 2004 Pearson Education, Inc. Figure 12-5 Runs Test for Randomness
Slide 107 Copyright © 2004 Pearson Education, Inc. Figure 12-5 Runs Test for Randomness
Slide 108 Copyright © 2004 Pearson Education, Inc. Figure 12-5 Runs Test for Randomness
Slide 109 Copyright © 2004 Pearson Education, Inc. Example: Basketball Foul Shots In the course of a game, WNBA player Cynthia Cooper shoots 12 free throws. Denoting shots made by “H” and shots missed by “M”, her results are as follows: H, H, H, M, H, H, H, H, M, M, M, H. Use a 0.05 significance level to test for randomness in the sequence of hits and misses.
Slide 110 Copyright © 2004 Pearson Education, Inc. Example: Basketball Foul Shots There are 8 hits, 4 misses, and 5 runs, so we have n 1 = 8, n 2 = 4, and G = 5. The test statistic is G = 5, and we refer to Table A-10 to find the critical values of 3 and 10. We do not reject randomness. There is not sufficient evidence to warrant rejection of the claim that the hits and misses occur randomly.
Slide 111 Copyright © 2004 Pearson Education, Inc. Refer to the rainfall amounts for Boston as listed in Data Set 11 in Appendix B. Is there sufficient evidence to support the claim that rain on Mondays is not random? Example: Boston Rainfall on Mondays
Slide 112 Copyright © 2004 Pearson Education, Inc. H 0 : The sequence is random. H 1 : The sequence is not random. n 1 = 33 n 2 = 19 G = 30 Example: Boston Rainfall on Mondays
Slide 113 Copyright © 2004 Pearson Education, Inc. G = 2n1n22n1n2 n1+n2n1+n2 + 1 G = 2(33)(19) = Example: Boston Rainfall on Mondays
Slide 114 Copyright © 2004 Pearson Education, Inc. ( 2 n 1 n 2 ) ( 2 n 1 n 2 – n 1 – n 2 ) ( n 1 + n 2 ) ( n 1 + n 2 – 1) G =G = 2 2(33)(19)[2(19)(33) – 33 – 19] ( ) ( – 1 ) G =G = 2 G = Example: Boston Rainfall on Mondays
Slide 115 Copyright © 2004 Pearson Education, Inc. z =z = GG G – G z =z = – = 1.48 Example: Boston Rainfall on Mondays
Slide 116 Copyright © 2004 Pearson Education, Inc. Example: Boston Rainfall on Mondays The critical values are 1.96, since = 0.05, and we had a two tailed test. The test statistic of 1.48 does not fall within the critical region. We fail to reject the null hypothesis of randomness. The given sequence does appear to be random.