Download presentation
Presentation is loading. Please wait.
1
Three Papers: AUC, PFA and BIOInformatics The three papers are posted online
2
Learning Algorithms for Better Ranking Jin Huang, Charles X. Ling: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng. 17(3): 299-310 (2005)IEEE Trans. Knowl. Data Eng. 17 Find the citations online (google scholar) Goal: accuracy vs ranking Secondary Goal: Decision Tree vs Bayesian Networks in Ranking – Design Algorithms That Directly Optimize Ranking
3
Accuracy: not good enough Two classifiers Accuracy of Classifier1: 4/5 Accuracy of Classifier2: 4/5 But intuitively, Classifier 1 is better! Classifier 1 –––– + – ++++ Classifier 2+ –––– ++++ – Cutoff line Higher ranking: more desirable
4
Accuracy vs ranking Accuracy-based: making two assumptions: balanced class distribution and equal costs for misclassification Ranking: step aside these assumptions – Problem: Training examples are labeled, not ranked How to evaluate ranking?
5
ROC curve (Provost & Fawcett, AAAI’97)
6
How to calculate AUC Rank test examples in an increasing order Let r i be the rank of the i th positive example (left: low r_i, right: high r_i = better) S 0 = ∑ r i AUC: (Hand & Till, 2001, MLJ)
7
An example Classifier 1 –––– + – ++++ riri 5 789 10 S 0 = 5+7+8+9+10 = 39 AUC = (39 – 5x6/2) / 25 = 24/25 Better result
8
ROC curve and AUC If A dominates D, then A is better than D Often A and B are not dominating each other AUC (area under the ROC curve) – Overall performance AUC for evaluating ranking
9
ROC curve and AUC Traditional learning algorithms produce poor probability estimates as by-product. – Decision tree algorithms – Strategies to improve How about Bayesian network learning algorithms ?
10
Evaluation of Classifiers Classification accuracy or error rate. ROC curve and AUC.
11
AUC Two classifiers: The AUC of Classifier1: 24/25 The AUC of Classifier2: 16/25 Classifier 1 is better than 2! Classifier 1 –––– + – ++++ Classifier 2+ –––– ++++ –
12
AUC is more discriminating For N examples (N+1) different accuracies N (N+1)/2 different AUC values AUC is a better and more discriminating evaluation measure than accuracy
13
Naïve Bayes vs C4.4 Overall, Naïve Bayes outperforms C4.4 in AUC Ling&Zhang, submitted, 2002
14
PCA in Face Recognition
15
Problem with PCA The features are principal components – Thus they do not correspond directly to the original features – Problem with face recognition: wish to pick a subset of original features rather than composed ones Principal Feature Analysis: pick the best, uncorrelated, subset of features of a data set – Equivalent to finding q dimensions of a random variable X=[x1,x2, …, xn]^T
16
How to find the q features? [ q1, q2, q3, … qn] i^th row= i^th feature q
17
The subspace
18
Algorithm
19
Result
20
When PCA does not work
21
PCA + Clustering = Bad Idea
22
More…
23
Rand Index for Clusters (Partitions)
24
Results
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.