Some Nonparametric Methods stat120C UCI
Introduction: Why use nonparametric methods? A statistical model usually has several assumptions. Parametric models make more assumptions than nonparametric models As an example, consider one-sample t-test. What would happen if The normality assumption does not hold? There are several outliers?
Parametric vs Nonparametric Nonparametric methods are usually more robust to violations of assumptions and outliers When assumptions hold, parametric methods are usually more powerful Nonparametric methods do not assume that the data follow any particular distribution In many tests, data values are replaced by ranks. The results are invariant under any monotonic transformation
A nonparametric test for comparing paired samples – Wilcoxon signed rank test An example: measurements before and after a treatment Before: 25 29 60 27 After: 27 25 59 37 Question: is this treatment efficient? If normality can be assumed, we can consider paired t-test. What if the distributional assumptions underlying t-test cannot be satisfied?
The Wilcoxon signed rank test An example: measurements before and after a treatment Before: 25 29 60 27 After: 27 25 59 37 Step 1: calculate Di Step 2: Rank |Di| Step 3: Calculate W+, which is defined as Step 4: make statistical inference Based on exact distributions Based on asymptotic distributions (for large n)
The example W+=2+4=6 Under the null, the sign is random and W+ W+=2+4=6 Under the null, the sign is random and W+ should not be too large or too small.
Based on exact distributions Probabilities under the null hypothesis: signed ranks w+ prob 2 3 1 4 10 0.0625 2 3 -1 4 9 0.0625 -2 3 1 4 8 0.0625 2 -3 1 4 7 0.0625 -2 3 -1 4 7 0.0625 2 3 1 -4 6 0.0625 2 -3 -1 4 6 0.0625 (observed) -2 -3 1 4 5 0.0625 2 3 -1 -4 5 0.0625 -2 3 1 -4 4 0.0625 -2 -3 -1 4 4 0.0625 2 -3 1 -4 3 0.0625 -2 3 -1 -4 3 0.0625 2 -3 -1 -4 2 0.0625 -2 -3 1 -4 1 0.0625 -2 -3 -1 -4 0 0.0625 Evidence of positive change P-value = 0.875 Evidence of positive change
What is the null hypothesis H0: the random sample is drawn from a distribution with zero median H1: the random sample is drawn from a distribution whose median is not zero
Inference based on exact distribution We can use the R function “wilcox.test” to obtain p-values > wilcox.test(c(2,-4,-1,10)) Wilcoxon signed rank test data: c(2, -4, -1, 10) V = 6, p-value = 0.875 alternative hypothesis: true location is not equal to 0
Inference based on asymptotic distributions The asymptotic distribution of W+ under the null hypothesis of zero median
Proof
Asymptotic Wilcoxon signed rank test
A nonparametric test for comparing two samples: the Mann-Whitney-Wilcoxon test In a study of four subjects, two are randomly assigned to the treatment group and two to the control group. The following values are observed When sample size is small (such as the above example) or the normality assumption does not hold, the two-sample t-test is not accurate
An example
An example Two-sided p-value= 1/6+1/6+1/6+1/6=2/3
The null hypothesis of the Mann-Whitney-Wilcoxon test The null hypothesis for the two-sample t-test says that the two populations means are the same The null hypothesis for the Mann-Whitney-Wilcoxon test says that the two random samples were drawn from the same distribution
A nonparametric test for comparing two samples: the Mann-Whitney-Wilcoxon test
Asymptotic Mann-Whitney-Wilcoxon test
Asymptotic Mann-Whitney-Wilcoxon test