Nonparametric Inference

Name: Nonparametric Inference
Uploaded: 2017-07-10T10:47:15+00:00
Duration: PTM30S57
Channel: Ferdinand Kelly
Description: Nonparametric Inference

Nonparametric Inference

Why Nonparametric Tests?
We have been primarily discussing parametric tests; i.e. , tests that hold certain assumptions about when they are valid, e.g. t-tests and ANOVA both had assumptions regarding the shape of the distribution (normality) and about the necessity of having similar groups (homogeneity of variance). When these assumptions hold we can use standard sampling distributions (e.g. t-distribution, F-distribution) to find p-values.

Why Nonparametric Tests?
When these assumptions are violated it is necessary to turn to tests that do not have such stringent assumptions ~ nonparametric or "distribution-free" tests. Specifically, there are three cases which necessitate the use of non-parametric tests: 1) The data for the response is not at least interval scale, i.e. measurements For example the response might be ordinal. 3) There exists severely unequal variances between groups, i.e. there is obviously a violation of the homogeneity of variance assumption required for parametric tests. In the last two cases, we have interval level data, but it violates our parametric assumptions. Therefore, we no longer treat this data as interval, but as ordinal. In a sense, we demote it because it fails to meet specific assumptions. 2) The distribution of the data for the response is not normal. Recall that a relatively normal distribution is assumed for parametric tests.

Table of Parametric & Nonparametric Tests
Purpose of Test Two-Sample t-Test (either case) Mann-Whitney/ Wilcoxon Rank Sum Test Compare two independent samples Paired t-Test Sign Test or Wilcoxon Signed-Rank Test Compare dependent samples Oneway ANOVA Kruskal-Wallis Test Compare k-independent samples

Independent Samples For two populations we use…
Mann-Whitney/Wilcoxon Rank Sum Test For three or more populations we use… Kruskal-Wallis Test (at the end)

Mann-Whitney/Wilcoxon Rank Sum Test
Alternative to two-sample t-Test Use when… - populations being sampled are not normally distributed. - sample sizes are small so assessing normality is not possible (ni < 20). - response is ordinal

General Hypotheses Ho: distribution of pop. A and pop. B are the same, i.e. A = B HA: distribution of pop. A and pop. B are NOT the same, i.e A = B HA: distribution of pop. A is shifted to the right of pop. B, i.e. A > B. HA: distribution of pop. A is shifted to the left of pop. B, i.e. A < B

Ho: A = B vs HA: A > B Q: Is there evidence that the values in population A are generally larger than those in population B?

Mann-Whitney/Wilcoxon Rank Sum Test (Test Procedure)
Rank all N = nA + nB observations in the combined sample from both populations in ascending order. Sum the ranks of the observations from populations A and B separately and denote the sums wA and wB. Assign average rank to tied observations. For HA: A < B reject Ho if wA is “small” or wB is “big”. For HA: A > B reject Ho if wA is “big” or wB is “small”. Use tables to determine how “big” or “small” the rank sums must be in order to reject Ho or use software to conduct the test.

Mann-Whitney/Wilcoxon Rank Sum Test (Critical Value Table)
This table contains the value the smaller rank sum must be less than in order to reject the Ho for a one-tailed test situation for two significance levels (a = .05 & .01) Tables exist for the two-tailed tests as well. n is the sample size of the group with the smaller rank sum.

Example: Huntington’s Disease and Fasting Glucose Levels
Davidson et al. studied the responses to oral glucose in patients with Huntington’s disease and in a group of control subjects. The five-hour responses are shown below. Is there evidence to suggest the five-hour glucose (mg present) is greater for patients with Huntington’s disease? Ho: Control = Huntington’s i.e. C = H HA: Control < Huntington’s i.e. C < H

Example: Observations & Ranks
Control Group (nA = 10) Huntington’s Disease (nB = 11) 83 85 73 89 65 86 91 90 77 93 78 100 97 82 92 75 9 10.5 3 15 1.5 13 1.5 17 16 5.5 5.5 19 7 21 20 8 10.5 18 4 13 13 wA = 78 wB = 153

Example: Critical Value Table
Here, nC = 10 (control) nH= 11 (Huntington’s) we will reject Ho: C = H in favor of HA: C < H if the rank sum for the control group is less than 86 at a = .05 level and less than 77 at a = .01 level.

Example: Decision/Conclusion
Using the Wilcoxon Rank Sum Test we have evidence to suggest that the five hour glucose level for individuals with Huntington’s disease is greater than that for healthy controls (p < .05). Note: p < .05 because the observed rank sum for the control group is less than 86 which is the critical value for a = .05.

Rank Sum Test in JMP The p-values reported based upon large sample approximations which generally should not be used when sample sizes are small. Here the conclusion reached is the same but in general we should use tables if they are available.

Exact one-tailed p-value = .024/2 = .012 *
Rank Sum Test in SPSS Exact one-tailed p-value = .024/2 = .012 *

Wilcoxon Signed-Rank Test
Dependent Samples Sign Test Wilcoxon Signed-Rank Test

Sign Test The sign test can be used in place of the paired t-test when we have evidence that the paired differences are NOT normally distributed. It can be used when the response is ordinal. Best used when the response is difficult to quantify and only improvement can be measured, i.e. subject got better, got worse, or no change. Magnitude of the paired difference is lost when using this test.

Sign Test The sign test looks at the number of (+) and (-) differences amongst the nonzero paired differences. A preponderance of +’s or –’s can indicate that some type of change has occurred. If the null hypothesis of no change is true we expect +’s and –’s to be equally likely to occur, i.e. P(+) = P(-) = .50 and the number of each observed follows a binomial distribution.

Example: Sign Test A study evaluated hepatic arterial infusion of floxuridine and cisplatin for the treatment of liver metastases of colorectral cancer. Performance scores for 29 patients was recorded before and after infusion. Is there evidence that patients had a better performance score after infusion?

Example: Sign Test Patient Before (B) Infusion After (A) Infusion Difference (A – B) 1 2 -1 16 17 3 18 4 19 5 20 6 21 7 22 8 23 9 24 10 25 11 26 12 27 13 28 14 -2 29 15

Example: Sign Test Ho: No change in performance score following infusion, or more specifically median change in performance score is 0. HA: Performance scores improve following infusion, or more specifically median change in performance score > 0. Intuitively we will reject Ho if there is a “large” number of +’s.

Example: Sign Test - - - - - - 17 nonzeros differences, 11 +’s 6 –’s +
Patient Before (B) Infusion After (A) Infusion Difference (A – B) 1 2 -1 16 17 3 18 4 19 5 20 6 21 7 22 8 23 9 24 10 25 11 26 12 27 13 -2 28 14 29 15 - + + - + - + + + + + - + - - + +

Example: Sign Test If Ho is true, X = the number of +’s has a binomial dist. with n = 17 and p = P(+) = .50. Therefore the p-value is simply the P(X > 11|n=17, p = .50)=.166 > a We fail to reject Ho, there is insufficient evidence to conclude the performance score improves following infusion (p = .166).

Wilcoxon Signed-Rank Test
The problem with the sign test is that the magnitude or size of the paired differences is lost. The Wilcoxon Signed-Rank Test uses ranks of the paired differences to retain some sense of their size. Use when the distribution of the paired differences are NOT normal or when sample size is small. Can be used with an ordinal response.

Wilcoxon Signed Rank Test (Test Procedure)
Exclude any differences which are zero. Put the rest of differences in ascending order ignoring their signs. Assign them ranks. If any differences are equal, average their ranks.

Example: Wilcoxon Signed Rank Test
Resting Energy Expenditure (REE) for Patient with Cystic Fibrosis A researcher believes that patients with cystic fibrosis (CF) expend greater energy during resting than those without CF. To obtain a fair comparison she matches 13 patients with CF to 13 patients without CF on the basis of age, sex, height, and weight.

Pair CF (C) Healthy (H) Difference d = C - H Sign of Difference Abs. Diff. |d| Rank |d| Signed Rank 1 1153 996 157 + 6 2 1132 1080 52 3 1165 1182 -17 - 17 4 1460 1452 8 5 1634 1162 472 13 1493 1619 -126 126 7 1358 1140 218 9 1453 1123 330 11 1185 1113 72 10 1824 1463 361 12 1793 1632 161 1930 1614 316 216 2075 1836 239 6 3 -2 1 13 -5 9 11 4 12 7 8 10

Pair CF (C) Healthy (H) Difference d = C - H Signed Rank 1 1153 996 157 6 2 1132 1080 52 3 1165 1182 -17 -2 4 1460 1452 8 5 1634 1162 472 13 1493 1619 -126 - 5 7 1358 1140 218 9 1453 1123 330 11 1185 1113 72 10 1824 1463 361 12 1793 1632 161 1930 1614 316 2075 1836 239 We then calculate the sum of the positive ranks ( T+ ) and the sum of the negative ranks (T- ). Here we have T+ = = 84 and T- = = 7

Wilcoxon Signed Rank Test (Test Statistic)
Intuitively we will reject the Ho ,which states that there is no difference between the populations, if either one of these rank sums is “large” and the other is “small”. The Wilcoxon Signed Rank Test uses the smaller rank sum, T = min( T+ ,T- ) , as the test statistic.

For the cystic fibrosis example we have the following hypotheses: Ho: there is no difference in the resting energy expenditure of individuals with CF and healthy controls who are the same gender, age, height, and weight. HA: the resting energy expenditure of individuals with CF is greater than that of healthy individuals who are the same gender, age, height, and weight. MEDIAN PAIRED DIFFERENCE = 0 MEDIAN PAIRED DIFFERENCE > 0

HA: the resting energy expenditure of individuals with CF is greater than that of healthy individuals who are the same gender, age, height, and weight. The alternative is clearly supported if T+ is “large” or T- is “small”. The test statistic T = min( T+ , T- ) = 7 Is T = 7 considered small, i.e. what is the corresponding p-value? To answer this question we need a Wilcoxon Signed Rank Test table or statistical software.

This table gives the value of T = min( T+ , T- ) that our observed value must be less than in order to reject Ho for the both two- and one-tailed tests. Here we have n = 13 & T = 7. We can see that our test statistic is less than 21 (a = .05) and 12 (a = .01) so we will reject Ho and we also estimate that our p-value < .01.

We conclude that individuals with cystic fibrosis (CF) have a large resting energy expenditure when compared to healthy individuals who are the same gender, age, height, and weight (p < .01).

Analysis in JMP Select Test Mean from Difference pull-down menu, 0 for null value, and check Wilcoxon option. The test statistic is reported as (T+ - T-)/2 = (84 – 7)/2 = but we only need p-value =

Analysis in SPSS Click on CF first and then Healthy to specify that the paired difference will be defined as CF – Healthy & specify which tests to conduct. Note: the Difference column is not actually used in the SPSS analysis.

Analysis in SPSS For one-tailed Wilcoxon Signed Rank Test our p-value = .007/2 = (not exact!) For the Sign Test we have a one-tailed p-value = .022/2 = .011

Independent Samples If we have three or more populations to compare we use… Kruskal – Wallis Test

Kruskal-Wallis Test One-way ANOVA for a completely randomized design is based on the assumption of normality and equality of variance. The nonparametric alternative not relying on these assumptions is called the Kruskal-Wallis Test. Like the Mann-Whitney/Wilcoxon Rank Sum Test we use the sum of the ranks assigned to each group when considering the combined sample as the basis for our test statistic.

Kruskal-Wallis Test Basic Idea:
1) Looking at all observations together, rank them. 2) Let R1, R2, …,Rk be the sum of the ranks of each group 3) If some Ri’s are much larger than others, it indicates the response values in different groups come from different populations.

Kruskal-Wallis Test The test statistic is where,
N = total sample size = n1 + n nk

Kruskal-Wallis Test The test statistic is
Under the null hypothesis, this has an approximate chi-square distribution with df = k -1, i.e The approximation is OK when each group contains at least 5 observations. N = total sample size = n1 + n nk

Chi-squared Distribution and p-value
Area = p-value

Example: Kruskal-Wallis Test
A clinical trial evaluating the fever reducing effects of aspirin, ibuprofen, and acetaminophen was conducted. Study subjects were adults seen in an ER with diagnoses of flu with body temperatures between 100o F and 100.9o F. Subjects were randomly assigned to treatment. Changes in body temperature were recorded 2 hrs. after administration of treatments.

Resulting Data: Temperature Decrease (deg. F) Aspirin Rank Ibuprofen Acetaminophen .95 .39 .19 1.48 .44 1.02 1.33 1.31 .07 1.28 2.48 .01 1.39 .62 -.39 (i.e. temp increase) 8 5 4 14 6 9 12 11 3 10 15 2 13 7 1 N = R1 = R2 = R3 = n1 = n2 = n3 = 6

N = R1 = R2 = R3 = n1 = n2 = n3 = 6

Chi-squared Distribution and p-value
Area = .033

Kruskal-Wallis in JMP (Demo)
Analyze > Fit Y by X RESULTS R1 = 44 n1 = 4 R2 = 50 n2 = 5 R3 = 26 n3 = 6 H = df = 2 p = .033

Kruskal-Wallis in SPSS (Demo)
RESULTS R1 /n1 = 11.00 R2 /n2 = 10.00 R3 /n3 = 4.33 H = df = 2 p = .033

Decision/Conclusion Using the Kruskal-Wallis test have evidence to suggest that the temperature changes after taking the different drugs are not the same (p = .033). Now we might like to know which drugs significantly differ from one another.

Multiple Comparisons for Kruskal – Wallis Test
If we decide at least two populations differ in term of what is typical of their values we can use multiple comparisons to determine which populations differ. To do this we calculate an approximate p-value for each pair-wise comparison and then compare that p-value to a Bonferroni corrected significance level (a).

To determine if group i significantly differs from group j we compute . and then compute p-value = and compare to a/2m where m is the number of possible pair-wise comparisons, m =

Comparing Aspirin to Acetominophen N = Aspirin Acetominophen R1 = R3 = n1 = n3 = 6 Computing the Bonferroni corrected significance level we have .05/2(3) =

As this is not significant no others will either, so how can this be? The problem is the Bonferroni correction is too conservative and the approximate normality of the multiple comparison is valid only when sample sizes are “large” and the sample sizes here quite small. Thus the comparison shown is fine for a demonstration of the procedure but the results cannot be trusted.

Nonparametric Multiple Comparisons in JMP

Nonparametric Inference

Similar presentations

Presentation on theme: "Nonparametric Inference"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonparametric Inference

Similar presentations

Presentation on theme: "Nonparametric Inference"— Presentation transcript:

Similar presentations

About project

Feedback