Download presentation
Presentation is loading. Please wait.
Published byRobyn Dickerson Modified over 9 years ago
1
Keller: Stats for Mgmt & Econ, 7th Ed Nonparametric Statistics
April 24, 2017 Chapter 19 Nonparametric Statistics Copyright © 2006 Brooks/Cole, a division of Thomson Learning, Inc.
2
Nonparametric Statistics
This chapter deals with statistical techniques that deal with ordinal data. Ordinal data is the result of a rating system such as Excellent, good, fair, and poor W can record the responses using any numbering system as long as the order is maintained. For example, Excellent = 4 Good = 3 Fair = 2 Poor = 1
3
Ordinal Data… or Excellent = 85 Good = 40 Fair = 25 Poor = 10
Both numbering systems are valid.
4
Ordinal Data… The difference between interval and ordinal data is that with interval data the differences are meaningful and consistent. With ordinal data the differences between values has no meaning. For example, what is the difference between Excellent and Good. Is it 4-3 = 1 ? or 85-40 = 45 ? The answer is neither. All we can say about the difference between Excellent and Good is that Excellent is ranked higher. We cannot interpret the magnitude of the difference.
5
Nonparametric Statistics
When the data are ordinal, the mean is not an appropriate measure of central location. Instead, we will test characteristics of populations without referring to specific parameters, hence the term nonparametric. Although nonparametric methods are designed to test ordinal data, they have another area of application. The statistical tests described in Sections 13.1 and 13.3 and in Chapter 14 require that the populations be normally distributed.
6
Nonparametric Statistics
If the data are extremely nonnormal, the t-tests and F-test are invalid. Nonparametric techniques can be used instead. For this reason, nonparametric procedures are often (perhaps more accurately) called distribution-free statistics.
7
Nonparametric Statistics
In such circumstances we will treat the interval data as if they were ordinal. For this reason, even when the data are interval and the mean is the appropriate measure of location, we will choose instead to test population locations.
8
Population Locations These two populations have the same location…
9
Population Locations The location of pop’n 1 is to the left of the location of pop’n 2… The location of pop’n 1 is to the right of the location of pop’n 2… population 1 population 2 population 2 population 1
10
Problem Objectives When the problem objective is to compare two populations the null hypothesis will state: H0: The two population locations are the same. The alternative hypothesis can take on any one of the following three forms: u H1: The location of population 1 is different from the location of population 2 v H1: The location of population 1 is to the right of the location of population 2 w H1: The location of population 1 is to the left of the location of population 2
11
The Alternative Hypotheses
u H1: The location of population 1 is different from the location of population 2 Used when we want to know whether there is sufficient evidence to infer that there is a difference between the two populations.
12
The Alternative Hypotheses
v H1: The location of population 1 is to the right of the location of population 2 Used when we want to know whether we can conclude that the random variable in population 1 is larger in general than the random variable in population 2, and, not surprisingly…
13
The Alternative Hypotheses
w H1: The location of population 1 is to the left of the location of population 2 Used when we want to know whether we can conclude that the random variable in population 1 is smaller in general than the random variable in population 2. NOTE: all of our hypotheses are phrased in terms of “1 then 2”. This is for consistency. Rather than state: H1: The location of population 2 is to the left of the location of population 2, we would want to phrase this as: H1: The location of population 1 is to the right of the location of population 2
14
Wilcoxon Rank Sum Test We’ll use the Wilcoxon Rank Sum Test for problems where: — Problem objective is to compare two populations, — The data are ordinal or interval (where the normality requirement is unsatisfied) — The samples are independent.
15
Example 19.1 From these samples: u: 22, 23, 20
v: 18, 27, 26 Can we conclude (at 5% significance level) that the location of population 1 is to the left (i.e. “smaller”) that the location of population 2? That is, we want to test: H0: The two population locations are the same. H1: The location of population 1 is to the left of the location of population 2. We can test this, we just need a test statistic…
16
Test Statistic Step #1… rank the observations from smallest to largest, assign a rank number, and add up the “rank sum”… u rank v 22 3 18 1 23 4 27 6 20 2 26 5 T1=9 T2=12 *in the case of “ties” we average the ranks of the tied observations. We arbitrarily select T1 as the test statistic and label it “T”
17
Sampling Distribution of the Test Statistic
A small value of T indicates most of the smaller observations are in sample 1 which was drawn from population 1 — but how small is “small”? Is 9 “small” enough? We have our test statistic, T=9. We need to compare it to some critical value of “T” to know if we’re in the rejection region for H0 (or not). So, what then, does the sampling distribution of “ranks” look like?
18
Sampling Distribution of the Test Statistic
We can build up the sampling distribution of the test statistic in much the same way we we built histograms for the outcomes of rolls of 2 and 3 dice… j Enumerate all possible combinations of ranks k Calculate ranks sums for the combinations l The probability of any rank sum is the number of occurrences divided by the total number of combinations…
19
Sampling Distribution of the Test Statistic
Enumerate & k Calculate & l Probabilities… 1 combination 3 combinations Total of 20 combinations
20
Sampling Distribution of the Test Statistic
5% P(T≤6) = 1/20 = .05 Thus our critical value of T is 6 Since T=9 < TCritical=6, we cannot reject H0… X
21
Example 19.1… INTERPRET We cannot reject the null hypothesis, that is, there is not enough evidence to conclude that the location of population 1 is located to the left of population 2 (at 5% significance).
22
Critical Values: Wilcoxon Rank Sum Test
Sample sizes less than 10 are unrealistic. For sample sizes larger than 10, the test statistic is approximately normally distributed with: Mean: Hence: Standard Deviation: ni=size of sample i, i=1,2
23
Example 19.2 A pharmaceutical company is planning to introduce a new painkiller. In a preliminary experiment to determine its effectiveness, 30 people were randomly selected, of whom 15 were given the new painkiller and 15 were given aspirin. All 30 were told to use the drug when headaches or other minor pains occurred and to indicate which of the following statements most accurately represented the effectiveness of the drug they took. 5 = The drug was extremely effective. 4 = The drug was quite effective. 3 = The drug was somewhat effective. 2 = The drug was slightly effective. 1 = The drug was not at all effective.
24
Example 19.2 The responses are listed here (and stored in Xm19-02) using the codes. Can we conclude at the 5% significance level that the new painkiller is perceived to be more effective? New painkiller: 3, 5, 4, 3, 2, 5, 1, 4, 5, 3, 3, 5, 5, 5, 4 Aspirin: 4, 1, 3, 2, 4, 1, 3, 4, 2, 2, 2, 4, 3, 4, 5
25
Example 19.2 IDENTIFY The problem objective is to compare two populations. The data are ordinal and the samples are independent. The appropriate technique is the Wilcoxon rank sum test. Its important to note here that “5” is a “good” score, so if the drug is effective, we’d likely see its location “greater than” the location of aspirin users, hence: H1: The location of population 1 is to the right of the location of population 2, and so: H0: The two population locations are the same.
26
Example 19.2 COMPUTE (though not shown here) The rank sum for the new painkiller is T1=276.5, and the rank sum for aspirin: T2=188.5 Set T= T1=276.5, and begin calculating…
27
Example 19.2 The p-value of the test is:
COMPUTE The p-value of the test is: p-value = P(Z > 1.83) = = .0336 (or Z=1.83 > Zα = Z.05 =1.645), hence: “There is sufficient evidence to infer that the new painkiller is perceived to be more effective than aspirin”
28
Example 19.2 COMPUTE We can use the Wilcoxon Rank Sum Test in the Data Analysis Plus set of tools to come to the same conclusion. Click Add-Ins, Data Analysis Plus, Wilcoxon Rank Sum Test.
29
Example 19.2 COMPUTE p-value
30
Example 19.2 INTERPRET There is enough evidence to infer that the new painkiller is more effective than aspirin.
31
Required Conditions The Wilcoxon rank sum test actually tests to determine whether the population distributions are identical. This means that it tests not only for identical locations, but for identical spreads (variances) and shapes (distributions) as well. The rejection of the null hypothesis may be due instead to a difference in distribution shapes and/or spreads. To avoid this problem, we will require that the two probability distributions be identical except with respect to location.
32
Identifying Factors Factors that identify the Wilcoxon Rank Sum…
33
Tests for Matched Pairs Experiments
We will now look at two nonparametric techniques (Sign Test and Wilcoxon Signed Rank Sum Test) that test hypotheses in problems with the following characteristics: — We want to compare two populations, — The data are either ordinal or interval (nonnormal), — and the samples are matched pairs. As before, we’ll compute matched pair differences and work from there…
34
The Sign Test We can use the Sign Test when we’re dealing with two populations of ordinal data in a matched pairs experiment. For each matched pair, take the differences and count up the number of positive differences and negative differences. If population locations are the same (say), we’d expect the number of positives and negatives to net out to zero. If we have more positives than negatives (or vice versa) what can we learn? Again, how many is enough to make a difference?
35
Sign Test We can think of the sign test in terms of a binomial experiment, getting a positive sign is like flipping heads on a coin. We use this notion along with previously developed statistics to come up with our standardized test statistic (assuming the null hypothesis is true): Our null hypothesis: H0: the two population locations are the same is equivalent to: H0: p = .5 (i.e. equal proportions of +’s & –’s) n≥10
36
Sign Test Hypotheses Since our null hypothesis is:
H0: the two population locations are the same (i.e. p = .5) Our research hypothesis must be: H1: the two population locations are different which is the same as: H1: p ≠ .5
37
Example 19.3 In an experiment to determine which of two cars is perceived to have the more comfortable ride, 25 people rode (separately) in the back seat of an expensive European model and also in the back seat of a North American midsize car. Each of the 25 people was asked to rate the ride on the following 5-point scale. 1 = Ride is very uncomfortable. 2 = Ride is quite uncomfortable. 3 = Ride is neither uncomfortable nor comfortable. 4 = Ride is quite comfortable. 5 = Ride is very comfortable. The results are stored in Xm Do these data allow us to conclude at the 5% significance level that the European car is perceived to be more comfortable than the North American car?
38
Example 19.3 IDENTIFY The problem objective is to compare two populations. The data are ordinal and the experimental design is matched pairs. Thus the correct technique is the sign test. Because we want to whether there is enough evidence to infer that the European car is perceived to have a smoother ride than the North American car the hypotheses are H0 :The two population locations are the same. H1 : The location of population 1 (European car rating) is to the right of the location of population 2 (North American car rating)
39
Example 19.3 COMPUTE Again, we can leverage Excel to reduce the amount of work that we have to do. Click Add-Ins, Data Analysis Plus, Sign Test.
40
Example 19.3 COMPUTE p-value
41
Example 19.3 INTERPRET There is enough evidence to infer that the European car is perceived to have a smoother ride than the North American car the hypotheses are
42
Checking the Required Conditions
The sign test requires: The populations be similar in shape and spread: The sample size exceeds 10 (n=23).
43
Wilcoxon Signed Rank Sum Test
We’ll use Wilcoxon Signed Rank Sum test when we want to compare two populations of interval (but not normally distributed) date in a matched pairs type experiment. j Compute paired differences, discard zeros. k Rank absolute values of differences smallest (1) to largest (n), averaging ranks of tied observations. l Sum the ranks of positive differences (T+) and of negative differences (T–). m Use T=T+ as our test statistic…
44
Wilcoxon Signed Rank Sum Test
Now we have a test statistic, but what to compare it against? For large sample sizes, i.e. n > 30, T is approximately normally distributed, so we have:
45
Example 19.4 Traffic congestion on roads and highways costs industry billions of dollars annually as workers struggle to get to and from work. Several suggestions have been made about how to improve this situation, one of which is called flextime, which involves allowing workers to determine their own schedules (provided they work a full shift). Such workers will likely choose an arrival and departure time to avoid rush-hour traffic.
46
Example 19.4 In a preliminary experiment designed to investigate such a program the general manager of a large company wanted to compare the times it took workers to travel from their homes to work at 8:00 A.M. with travel time under the flextime program. A random sample of 32 workers was selected. The employees recorded the time (in minutes) it took to arrive at work at 8:00 A.M. on Wednesday of one week. The following week, the same employees arrived at work at times of their own choosing. The travel time on Wednesday of that week was recorded.
47
Example 19.4 These results are listed in the Xm Can we conclude at the 5% significance level that travel times under the flextime program are different from travel times to arrive at work at 8:00 A.M.?
48
Example 19.4 IDENTIFY The problem objective is to compare two populations. The data are interval and the samples are matched. If the matched pairs differences are normally distributed the correct method is the t-test of µD. Here is the histogram of the differences. .
49
Example 19.4 IDENTIFY A histogram of the paired differences reveals a non-normal distribution, hence we must use a non-parametric technique.
50
Example 19.4 IDENTIFY The appropriate technique is the Wilcoxon signed rank sum test. Because we want to know whether the population locations differ we have H0: The two population locations are the same. H1: The two population locations are different This is a two-tail test.
51
Example 19.4 COMPUTE ranks of +ve differences…
The Original Data Rank Sums Sorted ascending by |difference|
52
Example 19.4 We compute our test statistic as follows…
Our rejection region is…
53
Example 19.4 COMPUTE INTERPRET Click Add-Ins, Data Analysis Plus, Wilcoxon Signed Rank Sum Test.
54
Example 19.4 COMPUTE p-value
55
Example 19.4 INTERPRET The Wilcoxon Signed Rank Sum Test tool in Data Analysis Plus yields the same result as the manual calculation; there is not enough evidence to infer that flextime commute times differ from 8:00 am start commute times.
56
Identifying Factors I Factors that Identify the Sign Test…
57
Identifying Factors II
Factors that Identify the Wilcoxon Signed Rank Sum Test…
58
Kruskal-Wallis Test So far we’ve been comparing locations of two populations, now we’ll look at comparing two or more populations. The Kruskal-Wallis test is applied to problems where we want to compare two or more populations or ordinal or interval (but nonnormal) data from independent samples. Our hypotheses will be: H0: The locations of all k populations are the same. H1: At least two population locations differ.
59
Test Statistic In order to calculate the Kruskal-Wallis test statistic, we need to: j Rank all the observations from smallest (1) to largest (n), and average the ranks in the case of ties. k We calculate rank sums for each sample: T1, T2, …, Tk l Lastly, we calculate the test statistic (denoted H):
60
Sampling Distribution of the Test Statistic:
For sample sizes greater than or equal to 5, the test statistic H is approximately Chi-squared distributed with k–1 degrees of freedom. Our rejection region is: And our p-value is:
61
Example 19.5 The management of fast-food restaurants is extremely interested in knowing how their customers rate the quality of food and service and the cleanliness of the restaurants. Customers are given the opportunity to fill out customer comment cards. Suppose that one franchise wanted to compare how customers rate the three shifts 4:00 P.M. to midnight Midnight to 8:00 A.M. 8:00 A.M. to 4:00 P.M
62
Example 19.5 In a preliminary study, 10 customer cards were randomly selected from each shift. The responses to the question concerning speed of service were recorded where 4 = excellent, 3 = good, 2 = fair, and 1 = poor The data are listed next. Do these data provide sufficient evidence at the 5% significance level to indicate whether customers perceive the speed of service to be different between the three shifts?
63
Example 19.5 4:00 p.m. to Midnight Midnight to 8:00 p.m :00 A.M. to 4:00 P.M.) Xm19-05
64
Example 19.5 IDENTIFY The problem objective is to compare three populations of ordinal data (the ratings of the three shifts), and the samples are independent. These factors are sufficient to determine the use of the Kruskal-Wallis test. The null and alternative hypotheses are H0:The locations of all three populations are the same. H1: At least two population locations differ
65
Example 19.5 COMPUTE One way to solve the problem is to take the original data, “stack” it, and then sort by customer response & rank bottom to top… sorted by response
66
Example 19.5 COMPUTE Once its in “stacked” format, put in straight rankings from 1 to 30, average the rankings for the same response, then parse them out by shift to come up with rank sum totals…
67
Example 19.5 COMPUTE Our critical value of Chi-squared (5% significance and k–1=2 degrees of freedom) is 5.99, hence there is not enough evidence to reject H0.
68
Example 19.5 COMPUTE From Data Analysis Plus, Kruskal Wallis a similar finding… “There is not enough evidence to infer that a difference in speed of service exists between the three shifts, i.e. all three of the shifts are equally rated, and any action to improve service should be applied to all three shifts” p-value
69
Identifying Factors Factors that Identify the Kruskal-Wallis Test…
70
Friedman Test The Friedman Test is a technique used compare two or more populations of ordinal or interval (nonnormal) data that are generated from a randomized block experiment. The hypotheses are the same as in the Kruskal-Wallis test. H0: The locations of all k populations are the same. H1: At least two population locations differ.
71
Friedman Test – Test Statistic
Since this is a blocked experiment, we first rank each observation within each of b blocks from smallest to largest (i.e. from 1 to k), averaging any ties. We then compute the rank sums: T1, T2, …, Tk. The we calculate our test statistic: This test statistic is approximate Chi-squared with k–1 degrees of freedom (provided either k or b ≥ 5). Our rejection region and p-value are:
72
Example 19.6 The personnel manager of a national accounting firm has been receiving complaints from senior managers about the quality of recent hirings. All new accountants are hired through a process whereby four managers interview the candidate and rate her or him on several dimensions, including academic credentials, previous work experience, and personal suitability. Each manager then summarizes the results and produces an evaluation of the candidate. There are five possibilities: 1 The candidate is in the top 5% of applicants. 2 The candidate is in the top 10% of applicants, but not in the top 5%. 3 The candidate is in the top 25% of applicants, but not in the top 10%. 4 The candidate is in the top 50% of applicants, but not in the top 25%. 5 The candidate is in the bottom 50% of applicants.
73
Example 19.6 The evaluations are then combined in making the final decision. The personnel manager believes that the quality problem is caused by the evaluation system. However, she needs to know whether there is general agreement or disagreement between the interviewing managers in their evaluations. To test for differences between the managers, she takes a random sample of the evaluations of eight applicants. The results are shown below and stored in Xm What conclusions can the personnel manager draw from these data? Employ a 5% significance level.
74
Example 19.6 Manager Applicant 1 2 3 4 1 2 1 2 2 2 4 2 3 2 3 2 2 2 3
75
Example 19.6 IDENTIFY The problem objective is to compare the four populations of managers' evaluations, which we can see are ordinal data. This experiment is identified as a randomized block design because the eight applicants were evaluated by all four managers. (The treatments are the managers, and the blocks are the applicants.) The appropriate statistical technique is the Friedman test. The null and alternative hypotheses are as follows. H0: The locations of all four populations are the same H1: At least two population locations differ
76
Example 19.6 The data looks like this:
COMPUTE The data looks like this: Applicant #1 for example, received a top score from manager v and next-to-top scores from the other three. Applicant #7 received a top score from manager v as well, but the other three scored this candidate very low… There are k=4 populations (managers) and b=8 blocks (applicants) in this set-up.
77
Example 19.6 COMPUTE “rank each observation within block from smallest to largest (i.e. from 1 to k), averaging any ties”… For example, consider the case of candidate #2: Manager u Manager v Manager w Manager x Original Scores 4 2 3 checksum “straight” ranking 1 10 averaged ranking (1+2)/2= 1.5 checksum = … + k
78
Example 19.6 COMPUTE Compute the rank sums: T1, T2, …, Tk and our test statistic…
79
It appears that the managers’ evaluations of applicants
Example 19.6 INTERPRET The value of our Friedman test statistic is compared to a critical value of Chi-squared (at 5% significance and 3 d.f.) which is: 7.81.Thus, there is sufficient evidence to reject H0 in favor of H1. It appears that the managers’ evaluations of applicants do indeed differ
80
Identifying Factors Factors that Identify the Friedman Test…
81
Spearman Rank Correlation Coefficient
Previously we looked at the t-test of the coefficient of correlation ( ). In many situations, one or both variables may be ordinal; or if both variables are interval, the normality requirement may not be satisfied. In such cases, we measure and test to determine whether a relationship exists by employing a nonparametric technique, the Spearman rank correlation coefficient.
82
Spearman Rank Correlation Coefficient
We are interested whether a relationship exists between the two variables, hence the hypotheses to be tested are: H0: = 0 (no linear pattern, hence no correlation) H1: ≠ 0 (correlation; we can also do one-tail tests) Since is a population parameter, our sample statistic is rs, and is calculated as: (where a and b are the ranks of x and y respectively) [ is referred to as the Spearman correlation coefficient]
83
Spearman Rank Correlation Coefficient
The statistic rs is approximately normally distributed with — a mean of zero, and — a standard deviation of Hence our standardized test statistic is:
84
Example 19.7 The production manager of a firm wants to examine the relationship between aptitude test scores given prior to hiring of production-line workers and performance ratings received by the employees 3 months after starting work. The results of the study would allow the firm to decide how much weight to give to these aptitude tests relative to other work-history information obtained, including references. The aptitude test results range from 0 to 100. The performance ratings are as follows: 1 = Employee has performed well below average. 2 = Employee has performed somewhat below average. 3 = Employee has performed at the average level. 4 = Employee has performed somewhat above average. 5 = Employee has performed well above average.
85
Example 19.7 A random sample of 20 production workers yielded the results listed here. Can the firm's manager infer at the 5% significance level that aptitude test scores are correlated with performance rating? Employee Aptitude Test Score Performance Rating Xm19-07
86
Example 19.7 IDENTIFY The problem is we’re trying to correlate interval & ordinal data. We’ll treat the aptitude scores as ordinal, and apply the Spearman rank correlation coefficient…
87
Example 19.7 IDENTIFY We specify our hypotheses as: H0: = 0 H1: ≠ 0
88
Example 19.7 COMPUTE As before, we rank each of the variables separately and average any ties. Now we compute the standard deviations of the ranks(sa, sb) and covariance (sab).
89
Example 19.7 COMPUTE
90
Example 19.7 COMPUTE Using the short-cut calculation on we determine that the covariance of the ranks is
91
Example 19.7 COMPUTE The sample variances of the ranks (using the short-cut formula on) are
92
Example 19.7 The standard deviations are Thus,
COMPUTE The standard deviations are Thus, The value of the test statistic is p-value = 2P(Z > 1.83) = 2(1 − .9664) =
93
Example 19.7 Click Add-Ins, Data Analysis Plus, Correlation (Spearman)
COMPUTE Click Add-Ins, Data Analysis Plus, Correlation (Spearman)
94
Example 19.7 COMPUTE
95
Example 19.7 INTERPRET Z = 1.83, p-value = .0674; there is not enough evidence to infer that aptitude test score and performance rating are related.
96
Identifying Factors Factors that Identify the Spearman Rank Correlation Coefficient Test…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.