ECML Estimating the predictive accuracy of a classifier Hilan Bensusan Alexandros Kalousis
Why do we need to estimate the classifier performance? To perform model selection without a previously established pool of classifiers. To make meta-learning more automatic and less dependent on human experts. To gain insight into the area of expertise of different classifiers
Meta-learning Meta-learning is the endeavour to learn something about the expected performance of a classifier from previous applications. It depends heavily on the way datasets are characterised. It has concentrated on predicting the suitability of a classifier and on classifier selection from a pool.
Regression to predict performance In this paper we examine an approach to the direct estimation of performances through regression. The work is somehow related to zooming for ranking but there no knowledge about the classifiers is gained.
Previous work includes João Gama and Pavel Brazdil in work related to StatLog (one dataset characterisation only and poor results reported in NMSE). So Young Sohn (with StatLog datasets with boosting but better results). Recent paper by Christian Koepf (good results with few classifiers and artificial datasets only).
Our approach Broaden the research by comparing different dataset characterisation strategies and different regression methods. A metadataset for each classifier is composed by a set of dataset characterisation attributes and the performance of the classifier in each dataset.
We concentrate on 8 classifiers: two decision tree classifiers (C5.0tree and Ltree), Naive Bayes Linear discriminant, Two rule methods (C5.0rules and ripper), Nearest neighbor, And a combination method (C5.0boost)
Strategies of dataset characterization: A set of information-theoretical and statistical features of the datasets developed after StatLog (dct). A finer grained development of the StatLog characteristics, where histograms are used to describe the distributions of features computed for each attribute of a dataset (histo). Landmarking (land).
ECML Landmarking A characterization technique where the performance of simple, bare-bone learners in the dataset is used to characterise it. In this paper we use seven landmarkers: Decision node, Worst node, Randomly chosen node, Naive Bayes, 1-Nearest Neighbour, Elite 1-Nearest Neighbour and Linear Discriminant.
ECML Regression on accuracies The quality of the estimate depends on its closeness to the actual accuracy achieved by the classifier, measured by the Mean Absolute Deviation (MAD) using 10 fold xval. MAD is defined as the sum of the absolute differences between real and predicted values divided by the number of test items. dMAD, the MAD obtained by predicting always the mean error, is used as reference.
ECML Regression methods and datasets We used a kernel method and Cubist for regression. 65 datasets from UCI and METAL were used. Classifier performance is the mean of the accuracy on the 10 xval folds
ECML Estimating with kernel Classifier dct histo land dMAD C5.0boost C5.0rules C5.0tree lindisc ltree Near.Nei NaiBayes ripper
ECML Estimating with Cubist Classifier dct histo land dMAD C5.0boost C5.0rules C5.0tree lindisc ltree Near.Nei NaiBayes ripper
ECML Using estimates to rank Rankings are compared for similarity using Spearman’s rank correlation. Zooming cannot be applied to land since we should not use a classifier to rank itself (we use land-). We compare the ranking estimates with the true ranking. The default ranking is computed over all datasets (C5b,r,t,Lt,rip,nn,nb,lind downwards).
ECML Average Spearman's Correlation Coefficients with the True Ranking RankingsKernelCubistZooming Default dct histo land land
ECML Gaining insight about classifiers Example, a land rule: (34 cases, mean error 0.218) IF Rand_Node <= 0.57 Elite_Node > THEN mlcnb = Rand_Node Worst_Node Elite_Node
ECML Conclusions Regression can be used to estimate performances. Meta-learning needs good dataset characterisation. Landmarking is the best dataset characterisation strategy for performance estimation but not the best one for ranking. Future work includes further exploration of dataset characterisation strategies and of the results of combining them (as well as explaining the still odd result of landmarking in ranking).