Wilcoxon’s Rank-Sum Test (two independent samples) n1 + n2 ≤ 25: Same Distributions Runs (Labor Data) Naïve Bayes Acc (n1) RanksNaïve Bayes Acc (n2) Ranks Sample Size97 Mean Rank Sum (W) (accept) Critical Values (Wilcoxon table) H 0 : mean(Acc 1 ) = mean(Acc 2 ) Significance, test type0.05, two-tailed0.01, two-tailed0.05, one-tailed0.01, one-tailed V
Wilcoxon’s Rank-Sum Test (two independent samples) n1 + n2 ≤ 25: Different Distributions Runs (Labor Data) Naïve Bayes Acc (n1) RanksJ48 Acc (n2)Ranks Sample Size97 Mean Rank Sum (W) (reject) Critical Values (Wilcoxon table) H 0 : mean(Acc 1 ) = mean(Acc 2 ) Significance, test type0.05, two-tailed0.01, two-tailed0.05, one-tailed0.01, one-tailed V
Wilcoxon’s Rank-Sum Test (two independent samples) n1 + n2 > 25: Different Distributions Adult Datan1: Naïve Bayes Acc(rank) runs n1: Naïve Bayes Acc(rank) runs n2: J48 Acc(rank) runs n2: J48 Acc(rank) runs (1.0) (2.0) (3.0) (4.0) (5.0) (6.0) 83.1 (7.0) (8.0) (9.0) (10.0) (11.0) (12.0) (13.0) (14.0) (15.0) (16.0) 83.4 (17.0) (18.0) (19.5) (21.0) (22.0) (23.0) (24.0) (25.0) (26.0) (27.0) (28.0) (29.0) (30.0) 85.7 (31.0) (32.0) (33.0) (34.0) (35.0) (36.5) (38.0) (39.0) (40.0) (41.0) (42.0) (43.0) (44.5) (46.5) 86.1 (48.5) (50.5) 86.2 (52.0) (53.0) (54.0) (55.0) (56.0) (57.0) (58.0) (59.0) 86.7 (60.0) Sample Size30 Mean Rank Sum (W) Mean(W) = 915, STD(W) = Z statistic < 1.96 (z at alpha = 0.05) * reject H 0 : mean(Acc 1 ) = mean(Acc 2 )
Wilcoxon’s Matched Pairs Signed Ranks Test (for paired scores) n ≤ 50 Data Example Classifier 1 scores (A) Classifier 2 scores (B) A-B|A-B|Rank(|A-B|)Signed Rank(|A-B|) —3 — — remove remove +1 —2 — — Sum of Signed RanksW+ = +86 W- = -19 Select W = 19 (reject H 0 ) Critical Values (Wilcoxon table) H 0 : mean(signed_rank(|A-B|) = 0 Significance, test type0.05, two-tailed0.01, two-tailed0.05, one-tailed0.01, one-tailed0.05, two-tailed V
Wilcoxon’s Matched Pairs Signed Ranks Test (for paired scores) n > 50 Randomly split the Adult data set at 50% 100 times. For each training/testing data set, run Naïve Bayes and J48 and record their accuracy values as a pair for which we compute the difference in accuracy Determine the signed ranks of the difference for each pair (as previous example – data is omitted due to space constraints) We get W+ = 0 and W- = 5050 (J48 produces higher accuracy always), N = 100 We get, mean(W) = 2525, STD(W)= Z=(0-2525)/ = < 1.96 (at alpha = 0.05)
What is the Effect Size? (The effect of using LaPlace smoothing on accuracy of J48) Runs on Adult dataAccuracy of J48 (no LePlace)Acc J48 (LePlace) Mean Standard Deviation SP 2 SP (9 * ( ) * ( ) 2 ) / 18 = Sqrt(0.0365) = d(86.05 – 86.04) / = This is less than 0.2 d is very small to no effect
One-Way ANOVA (J48 on three domains) RunsJ48 Acc AdultJ48 Acc PimaJ48 Acc Credit Results: High F and very low p Groups are significantly different (see plot) Source of Variability Sum Squares Degree of Freedom Mean Squares F Statistic = MS G /MS E Pro. > F (p-value) Groups E-14 Error Total
One-Way ANOVA (J48 on three domains)
Two-Way ANOVA (J48 & N.B. on 3 domains) ClassifierRunsAcc AdultAcc PimaAcc Credit J48 (A) NB (B) p-values are low Columns (H 0A ), and Interactions(H 0AB ) are significantly different but Rows(H 0B ) are the least different Source of Variability Sum Squares Degree of Freedom Mean Squares F Statistic = MS G /MS E Pro. > F (p-value) Columns H 0A E-10 Rows H 0B Interactions H 0AB E-05 Error Total