Presentation is loading. Please wait.

Presentation is loading. Please wait.

Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.

Similar presentations


Presentation on theme: "Three Papers: AUC, PFA and BIOInformatics The three papers are posted online."— Presentation transcript:

1 Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

2 Learning Algorithms for Better Ranking Jin Huang, Charles X. Ling: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng. 17(3): 299-310 (2005)IEEE Trans. Knowl. Data Eng. 17 Find the citations online (google scholar) Goal: accuracy vs ranking Secondary Goal: Decision Tree vs Bayesian Networks in Ranking – Design Algorithms That Directly Optimize Ranking

3 Accuracy: not good enough Two classifiers Accuracy of Classifier1: 4/5 Accuracy of Classifier2: 4/5 But intuitively, Classifier 1 is better! Classifier 1 –––– + – ++++ Classifier 2+ –––– ++++ – Cutoff line Higher ranking: more desirable

4 Accuracy vs ranking Accuracy-based: making two assumptions: balanced class distribution and equal costs for misclassification Ranking: step aside these assumptions – Problem: Training examples are labeled, not ranked How to evaluate ranking?

5 ROC curve (Provost & Fawcett, AAAI’97)

6 How to calculate AUC Rank test examples in an increasing order Let r i be the rank of the i th positive example (left: low r_i, right: high r_i = better) S 0 = ∑ r i AUC: (Hand & Till, 2001, MLJ)

7 An example Classifier 1 –––– + – ++++ riri 5 789 10 S 0 = 5+7+8+9+10 = 39 AUC = (39 – 5x6/2) / 25 = 24/25 Better result

8 ROC curve and AUC If A dominates D, then A is better than D Often A and B are not dominating each other AUC (area under the ROC curve) – Overall performance AUC for evaluating ranking

9 ROC curve and AUC Traditional learning algorithms produce poor probability estimates as by-product. – Decision tree algorithms – Strategies to improve How about Bayesian network learning algorithms ?

10 Evaluation of Classifiers Classification accuracy or error rate. ROC curve and AUC.

11 AUC Two classifiers: The AUC of Classifier1: 24/25 The AUC of Classifier2: 16/25 Classifier 1 is better than 2! Classifier 1 –––– + – ++++ Classifier 2+ –––– ++++ –

12 AUC is more discriminating For N examples (N+1) different accuracies N (N+1)/2 different AUC values AUC is a better and more discriminating evaluation measure than accuracy

13 Naïve Bayes vs C4.4 Overall, Naïve Bayes outperforms C4.4 in AUC Ling&Zhang, submitted, 2002

14 PCA in Face Recognition

15 Problem with PCA The features are principal components – Thus they do not correspond directly to the original features – Problem with face recognition: wish to pick a subset of original features rather than composed ones Principal Feature Analysis: pick the best, uncorrelated, subset of features of a data set – Equivalent to finding q dimensions of a random variable X=[x1,x2, …, xn]^T

16 How to find the q features? [ q1, q2, q3, … qn] i^th row= i^th feature q

17 The subspace

18 Algorithm

19 Result

20 When PCA does not work

21 PCA + Clustering = Bad Idea

22 More…

23 Rand Index for Clusters (Partitions)

24 Results


Download ppt "Three Papers: AUC, PFA and BIOInformatics The three papers are posted online."

Similar presentations


Ads by Google