Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.

Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Learning Algorithms for Better Ranking Jin Huang, Charles X. Ling: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng. 17(3): 299-310 (2005)IEEE Trans. Knowl. Data Eng. 17 Find the citations online (google scholar) Goal: accuracy vs ranking Secondary Goal: Decision Tree vs Bayesian Networks in Ranking – Design Algorithms That Directly Optimize Ranking

Accuracy: not good enough Two classifiers Accuracy of Classifier1: 4/5 Accuracy of Classifier2: 4/5 But intuitively, Classifier 1 is better! Classifier 1 –––– + – ++++ Classifier 2+ –––– ++++ – Cutoff line Higher ranking: more desirable

Accuracy vs ranking Accuracy-based: making two assumptions: balanced class distribution and equal costs for misclassification Ranking: step aside these assumptions – Problem: Training examples are labeled, not ranked How to evaluate ranking?

ROC curve (Provost & Fawcett, AAAI’97)

How to calculate AUC Rank test examples in an increasing order Let r i be the rank of the i th positive example (left: low r_i, right: high r_i = better) S 0 = ∑ r i AUC: (Hand & Till, 2001, MLJ)

An example Classifier 1 –––– + – ++++ riri 5 789 10 S 0 = 5+7+8+9+10 = 39 AUC = (39 – 5x6/2) / 25 = 24/25 Better result

ROC curve and AUC If A dominates D, then A is better than D Often A and B are not dominating each other AUC (area under the ROC curve) – Overall performance AUC for evaluating ranking

ROC curve and AUC Traditional learning algorithms produce poor probability estimates as by-product. – Decision tree algorithms – Strategies to improve How about Bayesian network learning algorithms ?

Evaluation of Classifiers Classification accuracy or error rate. ROC curve and AUC.

AUC Two classifiers: The AUC of Classifier1: 24/25 The AUC of Classifier2: 16/25 Classifier 1 is better than 2! Classifier 1 –––– + – ++++ Classifier 2+ –––– ++++ –

AUC is more discriminating For N examples (N+1) different accuracies N (N+1)/2 different AUC values AUC is a better and more discriminating evaluation measure than accuracy

Naïve Bayes vs C4.4 Overall, Naïve Bayes outperforms C4.4 in AUC Ling&Zhang, submitted, 2002

PCA in Face Recognition

Problem with PCA The features are principal components – Thus they do not correspond directly to the original features – Problem with face recognition: wish to pick a subset of original features rather than composed ones Principal Feature Analysis: pick the best, uncorrelated, subset of features of a data set – Equivalent to finding q dimensions of a random variable X=[x1,x2, …, xn]^T

How to find the q features? [ q1, q2, q3, … qn] i^th row= i^th feature q

The subspace

Algorithm

Result

When PCA does not work

PCA + Clustering = Bad Idea

More…

Rand Index for Clusters (Partitions)

Results

Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.

Similar presentations

Presentation on theme: "Three Papers: AUC, PFA and BIOInformatics The three papers are posted online."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Three Papers: AUC, PFA and BIOInformatics The three papers are posted online.

Similar presentations

Presentation on theme: "Three Papers: AUC, PFA and BIOInformatics The three papers are posted online."— Presentation transcript:

Similar presentations

About project

Feedback