Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland
Model Selection Using Cross-Validation Choose a search algorithm – for example: hill-climbing, grid search, genetic algorithm Evaluate the models using cross-validation Select the model that gives the best CV score
Multiple-Comparison Procedure ( D. D. Jensen and P. R. Cohen: Multiple Comparisons in Induction Algorithms, Machine Learning, volume 38, pages 309–338, 2000) Example: Choosing an investment advisor Criterion: Predict stock market change (+/–) correctly for 11 out of 14 days You evaluate 10 candidates Your friend evaluates 30 candidates If everyone is just guessing, your probability of accepting is 0.253, your friend’s 0.583
The Problem Overfitting on the first level of inference: Increasing model complexity may decrease the training error while the test error goes up Overfitting on the second level of inference: Making the search more intense may decrease the CV error estimate, even if the test error would actually go up
Overfitting Visualized Model Complexity, or Number of Models Evaluated
Solutions First level of inference: Regularization – penalize complex models Model selection – welcome to the second level... Second level of inference: Regularization! (G. C. Cawley and N. L. C. Talbot: Preventing over- fitting during model selection via Bayesian regularisation of the hyper-parameters, Journal of Machine Learning Research, volume 8, pages , 2007) Another layer of (cross-)validation...
Another Layer of Validation A lot of variance: the estimate related to the winner gets biased (in the MCP sense) Cross-validation makes it smoother, but does not remove the problem
The Cross-indexing Trick Assume an outer loop of cross-validation using five folds Use (for example) three folds to determine the best depth, and the rest two to assess it This essentially removes the multiple-comparison effect Revolve, and average (or, create an ensemble) Previously shown to work in feature selection (Juha Reunanen: Less Biased Measurement of Feature Selection Benefits, SLSFS 2005, LNCS 3940, pages 198–208, 2006)
Competition Entries Stochastic search guided by cross-validation Several candidate models (and corresponding search processes running pseudo-parallel): Prepro+naiveBayes, PCA+kernelRidge, GS+kernelRidge, Prepro+linearSVC, Prepro+nonlinearSVC, Relief+neuralNet, RF, and Boosting (with neuralNet, SVC and kernelRidge) Final selection and assessment using the cross-indexing criterion
Milestone Results Yellow: CLOP model. CLOP prize winner: Juha Reunanen (both ave. rank and ave. BER). Best ave. BER held by Reference (Gavin Cawley) with “the bad”. Agnostic learning ranks as of December 1 st, 2006
Models Selected
Conclusions Because of multiple-comparison procedures (MCPs) on the different levels of inference, validation is often used to estimate final performance On the second level, the cross-indexing trick may give estimates that are less biased (when comparing to straightforward outer-loop CV)