1 1 Slide © 2005 Thomson/South-Western AK/ECON 3480 M & N WINTER 2006 n Power Point Presentation n Professor Ying Kong School of Analytic Studies and Information Technology Atkinson Faculty of Liberal and Professional Studies York University
2 2 Slide © 2005 Thomson/South-Western Chapter 19 Nonparametric Methods n Sign Test n Wilcoxon Signed-Rank Test n Mann-Whitney-Wilcoxon Test n Kruskal-Wallis Test n Rank Correlation
3 3 Slide © 2005 Thomson/South-Western Nonparametric Methods n Most of the statistical methods referred to as parametric require the use of interval- or ratio-scaled data. n Nonparametric methods are often the only way to analyze nominal or ordinal data and draw statistical conclusions. n Nonparametric methods require no assumptions about the population probability distributions. n Nonparametric methods are often called distribution- free methods.
4 4 Slide © 2005 Thomson/South-Western Nonparametric Methods n In general, for a statistical method to be classified as nonparametric, it must satisfy at least one of the following conditions. The method can be used with nominal data.The method can be used with nominal data. The method can be used with ordinal data.The method can be used with ordinal data. The method can be used with interval or ratio data when no assumption can be made about the population probability distribution.The method can be used with interval or ratio data when no assumption can be made about the population probability distribution.
5 5 Slide © 2005 Thomson/South-Western Sign Test n A common application of the sign test involves using a sample of n potential customers to identify a preference for one of two brands of a product. n The objective is to determine whether there is a difference in preference between the two items being compared. n To record the preference data, we use a plus sign if the individual prefers one brand and a minus sign if the individual prefers the other brand. n Because the data are recorded as plus and minus signs, this test is called the sign test.
6 6 Slide © 2005 Thomson/South-Western Sign Test: Small-Sample Case n The small-sample case for the sign test should be used whenever n < 20. n The hypotheses are A preference for one brand over the other exists. No preference for one brand over the other exists. n The number of plus signs is our test statistic. n Assuming H 0 is true, the sampling distribution for the test statistic is a binomial distribution with p =.5. H 0 is rejected if the p -value < level of significance, . H 0 is rejected if the p -value < level of significance, .
7 7 Slide © 2005 Thomson/South-Western Sign Test: Large-Sample Case n Using H 0 : p =.5 and n > 20, the sampling distribution for the number of plus signs can be approximated by a normal distribution. n When no preference is stated ( H 0 : p =.5), the sampling distribution will have: n The test statistic is: H 0 is rejected if the p -value < level of significance, . H 0 is rejected if the p -value < level of significance, . Mean: =.50 n Standard Deviation: ( x is the number of plus signs) of plus signs)
8 8 Slide © 2005 Thomson/South-Western n Example: Ketchup Taste Test Sign Test: Large-Sample Case AA BB As part of a market research study, a As part of a market research study, a sample of 36 consumers were asked to taste two brands of ketchup and indicate a preference. Do the data shown on the next slide indicate a significant difference in the consumer preferences for the two brands?
9 9 Slide © 2005 Thomson/South-Western 18preferred Brand A Ketchup (+ sign recorded) (+ sign recorded) 12preferred Brand B Ketchup ( _ sign recorded) ( _ sign recorded) 6 had no preference 6 had no preference Sign Test: Large-Sample Case n Example: Ketchup Taste Test AA BB The analysis will be based on a sample size of = 30. The analysis will be based on a sample size of = 30.
10 Slide © 2005 Thomson/South-Western n Hypotheses AA BB Sign Test: Large-Sample Case A preference for one brand over the other exists No preference for one brand over the other exists
11 Slide © 2005 Thomson/South-Western n Sampling Distribution for Number of Plus Signs =.5(30) = 15 =.5(30) = 15 AA BB Sign Test: Large-Sample Case
12 Slide © 2005 Thomson/South-Western n Rejection Rule AA BB Sign Test: Large-Sample Case p -Value = 2( ) =.2714 n p -Value z = ( x – )/ = ( )/2.74 = 3/2.74 = 1.10 n Test Statistic Using.05 level of significance: Reject H 0 if p -value <.05
13 Slide © 2005 Thomson/South-Western AA BB Sign Test: Large-Sample Case n Conclusion Because the p -value > , we cannot reject H 0. There is insufficient evidence in the sample to conclude that a difference in preference exists for the two brands of ketchup.
14 Slide © 2005 Thomson/South-Western Hypothesis Test About a Median n We can apply the sign test by: Using a plus sign whenever the data in the sample are above the hypothesized value of the median Using a plus sign whenever the data in the sample are above the hypothesized value of the median Using a minus sign whenever the data in the sample are below the hypothesized value of the median Using a minus sign whenever the data in the sample are below the hypothesized value of the median Discarding any data exactly equal to the hypothesized median Discarding any data exactly equal to the hypothesized median
15 Slide © 2005 Thomson/South-Western Hypothesis Test About a Median H 0 : Median Age H a : Median Age n Example: Trim Fitness Center A hypothesis test is being conducted about the median age of female members of the Trim Fitness Center. In a sample of 40 female members, 25 are older than 34, 14 are younger than 34, and 1 is 34. Is there sufficient evidence to reject H 0 ? Assume =.05.
16 Slide © 2005 Thomson/South-Western p -Value = 2(.5000 .4608) =.0784 =.5(39) = 19.5 =.5(39) = 19.5 Hypothesis Test About a Median n p -Value z = ( x – )/ = (25 – 19.5)/3.12 = 1.76 n Test Statistic n Mean and Standard Deviation
17 Slide © 2005 Thomson/South-Western Hypothesis Test About a Median n Rejection Rule n Conclusion Do not reject H 0. The p -value for this two-tail test is There is insufficient evidence in the sample to conclude that the median age is not 34 for female members of Trim Fitness Center. Using.05 level of significance: Using.05 level of significance: Reject H 0 if p -value <.05 Reject H 0 if p -value <.05
18 Slide © 2005 Thomson/South-Western Wilcoxon Signed-Rank Test n This test is the nonparametric alternative to the parametric matched-sample test presented in Chapter 10. n The methodology of the parametric matched-sample analysis requires: interval data, and interval data, and the assumption that the population of differences between the pairs of observations is normally distributed. the assumption that the population of differences between the pairs of observations is normally distributed. n If the assumption of normally distributed differences is not appropriate, the Wilcoxon signed-rank test can be used.
19 Slide © 2005 Thomson/South-Western n Example: Express Deliveries Wilcoxon Signed-Rank Test A firm has decided to select one A firm has decided to select one of two express delivery services to provide next-day deliveries to its district offices. To test the delivery times of the two services, the To test the delivery times of the two services, the firm sends two reports to a sample of 10 district offices, with one report carried by one service and the other report carried by the second service. Do the data on the next slide indicate a difference in the two services?
20 Slide © 2005 Thomson/South-Western Wilcoxon Signed-Rank Test Seattle Los Angeles Boston Cleveland New York Houston Atlanta St. Louis Milwaukee Denver 32 hrs hrs District Office Office OverNight OverNight NiteFlite
21 Slide © 2005 Thomson/South-Western Wilcoxon Signed-Rank Test n Preliminary Steps of the Test Compute the differences between the paired observations. Compute the differences between the paired observations. Discard any differences of zero. Discard any differences of zero. Rank the absolute value of the differences from lowest to highest. Tied differences are assigned the average ranking of their positions. Rank the absolute value of the differences from lowest to highest. Tied differences are assigned the average ranking of their positions. Give the ranks the sign of the original difference in the data. Give the ranks the sign of the original difference in the data. Sum the signed ranks. Sum the signed ranks.... next we will determine whether the sum is significantly different from zero.
22 Slide © 2005 Thomson/South-Western Wilcoxon Signed-Rank Test Seattle Los Angeles Boston Cleveland New York Houston Atlanta St. Louis Milwaukee Denver 1111 2 2222 5 District Office Office Differ. Differ. |Diff.| Rank Sign. Rank
23 Slide © 2005 Thomson/South-Western Wilcoxon Signed-Rank Test n Hypotheses H 0 : The delivery times of the two services are the same; neither offers faster service than the other. H a : Delivery times differ between the two services; recommend the one with the smaller times.
24 Slide © 2005 Thomson/South-Western n Sampling Distribution of T for Identical Populations T = 0 T = 0 Wilcoxon Signed-Rank Test T
25 Slide © 2005 Thomson/South-Western Wilcoxon Signed-Rank Test n Rejection Rule Using.05 level of significance, Reject H 0 if p -value <.05 n Test Statistic n p -Value z = ( T - T )/ T = (44 - 0)/19.62 = 2.24 p -Value = 2( ) =.025
26 Slide © 2005 Thomson/South-Western n Conclusion Reject H 0. The p -value for this two-tail test is.025. There is sufficient evidence in the sample to conclude that a difference exists in the delivery times provided by the two services. Wilcoxon Signed-Rank Test
27 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test n This test is another nonparametric method for determining whether there is a difference between two populations. n This test, unlike the Wilcoxon signed-rank test, is not based on a matched sample. n This test does not require interval data or the assumption that both populations are normally distributed. n The only requirement is that the measurement scale for the data is at least ordinal.
28 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test H a : The two populations are not identical H 0 : The two populations are identical n Instead of testing for the difference between the means of two populations, this method tests to determine whether the two populations are identical. n The hypotheses are:
29 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test n Example: Westin Freezers Manufacturer labels indicate the annual energy cost associated with operating home appliances such as freezers. The energy costs for a sample of 10 Westin freezers and a sample of 10 Easton Freezers are shown on the next slide. Do the data indicate, using =.05, that a difference exists in the annual energy costs for the two brands of freezers?
30 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test $ $ Westin Freezers Easton Freezers
31 Slide © 2005 Thomson/South-Western n Hypotheses Mann-Whitney-Wilcoxon Test H a : Annual energy costs differ for the two brands of freezers. brands of freezers. H 0 : Annual energy costs for Westin freezers and Easton freezers are the same. and Easton freezers are the same.
32 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test: Large-Sample Case n First, rank the combined data from the lowest to the highest values, with tied values being assigned the average of the tied rankings. n Then, compute T, the sum of the ranks for the first sample. n Then, compare the observed value of T to the sampling distribution of T for identical populations. The value of the standardized test statistic z will provide the basis for deciding whether to reject H 0.
33 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test: Large-Sample Case Approximately normal, provided n 1 > 10 and n 2 > 10 T = 1 n 1 ( n 1 + n 2 + 1) n Sampling Distribution of T for Identical Populations Mean Mean Standard Deviation Standard Deviation Distribution Form Distribution Form
34 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test $ $ Westin Freezers Easton Freezers Sum of Ranks RankRank
35 Slide © 2005 Thomson/South-Western n Sampling Distribution of T for Identical Populations T = ½(10)(21) = 105 T = ½(10)(21) = 105 Mann-Whitney-Wilcoxon Test T
36 Slide © 2005 Thomson/South-Western n Rejection Rule Using.05 level of significance, Reject H 0 if p -value <.05 n Test Statistic n p -Value z = ( T - T )/ T = (86.5 105)/13.23 = p -Value = 2( ) =.1616 Mann-Whitney-Wilcoxon Test
37 Slide © 2005 Thomson/South-Western Mann-Whitney-Wilcoxon Test n Conclusion Do not reject H 0. The p -value > . There is insufficient evidence in the sample data to conclude that there is a difference in the annual energy cost associated with the two brands of freezers.
38 Slide © 2005 Thomson/South-Western Kruskal-Wallis Test n The Mann-Whitney-Wilcoxon test has been extended by Kruskal and Wallis for cases of three or more populations. n The Kruskal-Wallis test can be used with ordinal data as well as with interval or ratio data. n Also, the Kruskal-Wallis test does not require the assumption of normally distributed populations. H a : Not all populations are identical H 0 : All populations are identical
39 Slide © 2005 Thomson/South-Western n Test Statistic Kruskal-Wallis Test where: k = number of populations n i = number of items in sample i n i = number of items in sample i n T = n i = total number of items in all samples n T = n i = total number of items in all samples R i = sum of the ranks for sample i R i = sum of the ranks for sample i
40 Slide © 2005 Thomson/South-Western Kruskal-Wallis Test n When the populations are identical, the sampling distribution of the test statistic W can be approximated by a chi-square distribution with k – 1 degrees of freedom. n This approximation is acceptable if each of the sample sizes n i is > 5. The rejection rule is: Reject H 0 if p -value < The rejection rule is: Reject H 0 if p -value <
41 Slide © 2005 Thomson/South-Western Rank Correlation n The Pearson correlation coefficient, r, is a measure of the linear association between two variables for which interval or ratio data are available. n The Spearman rank-correlation coefficient, r s, is a measure of association between two variables when only ordinal data are available. n Values of r s can range from –1.0 to +1.0, where values near 1.0 indicate a strong positive association between the rankings, and values near 1.0 indicate a strong positive association between the rankings, and values near -1.0 indicate a strong negative association between the rankings. values near -1.0 indicate a strong negative association between the rankings.
42 Slide © 2005 Thomson/South-Western Rank Correlation n Spearman Rank-Correlation Coefficient, r s where: n = number of items being ranked x i = rank of item i with respect to one variable x i = rank of item i with respect to one variable y i = rank of item i with respect to a second variable y i = rank of item i with respect to a second variable d i = x i - y i d i = x i - y i
43 Slide © 2005 Thomson/South-Western Test for Significant Rank Correlation n We may want to use sample results to make an inference about the population rank correlation p s. n To do so, we must test the hypotheses: (No rank correlation exists) (Rank correlation exists)
44 Slide © 2005 Thomson/South-Western Rank Correlation Approximately normal, provided n > 10 n Sampling Distribution of r s when p s = 0 Mean Mean Standard Deviation Standard Deviation Distribution Form Distribution Form
45 Slide © 2005 Thomson/South-Western Rank Correlation n Example: Crennor Investors Crennor Investors provides Crennor Investors provides a portfolio management service for its clients. Two of Crennor’s analysts ranked ten investments as shown on the next slide. Use rank correlation, with =.10, to comment on the agreement of the two analysts’ rankings.
46 Slide © 2005 Thomson/South-Western Rank Correlation Analyst # Analyst # InvestmentA B C D E F G H I J n Example: Crennor Investors (No rank correlation exists) (Rank correlation exists) Analysts’ Rankings Analysts’ Rankings Hypotheses Hypotheses
47 Slide © 2005 Thomson/South-Western Rank Correlation ABCDEFGHIJ Sum =92 Investment Analyst #1 Ranking Analyst #2 Ranking Differ. (Differ.) 2
48 Slide © 2005 Thomson/South-Western Sampling Distribution of r s Assuming No Rank Correlation Rank Correlation r = 0 rsrsrsrs
49 Slide © 2005 Thomson/South-Western n Test Statistic Rank Correlation z = ( r s - r )/ r = ( )/.3333 = 1.33 n Rejection Rule With.10 level of significance: Reject H 0 if p -value <.10 n p -Value p -Value = 2( ) =.1836
50 Slide © 2005 Thomson/South-Western Do no reject H 0. The p -value > . There is not a significant rank correlation. The two analysts are not showing agreement in their ranking of the risk associated with the different investments. Rank Correlation n Conclusion
51 Slide © 2005 Thomson/South-Western End of Chapter 19