Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland.

Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland

Model Selection Using Cross-Validation  Choose a search algorithm – for example: hill-climbing, grid search, genetic algorithm  Evaluate the models using cross-validation  Select the model that gives the best CV score

Multiple-Comparison Procedure ( D. D. Jensen and P. R. Cohen: Multiple Comparisons in Induction Algorithms, Machine Learning, volume 38, pages 309–338, 2000)  Example: Choosing an investment advisor  Criterion: Predict stock market change (+/–) correctly for 11 out of 14 days  You evaluate 10 candidates  Your friend evaluates 30 candidates  If everyone is just guessing, your probability of accepting is 0.253, your friend’s 0.583

The Problem  Overfitting on the first level of inference: Increasing model complexity may decrease the training error while the test error goes up  Overfitting on the second level of inference: Making the search more intense may decrease the CV error estimate, even if the test error would actually go up

Overfitting Visualized Model Complexity, or Number of Models Evaluated

Solutions  First level of inference: Regularization – penalize complex models Model selection – welcome to the second level...  Second level of inference: Regularization! (G. C. Cawley and N. L. C. Talbot: Preventing overfitting during model selection via Bayesian regularisation of the hyper-parameters, Journal of Machine Learning Research, volume 8, pages 841-861, 2007) Another layer of (cross-)validation...

Another Layer of Validation  A lot of variance: the estimate related to the winner gets biased (in the MCP sense)  Cross-validation makes it smoother, but does not remove the problem

The Cross-indexing Trick  Assume an outer loop of cross-validation using five folds  Use (for example) three folds to determine the best depth, and the rest two to assess it This essentially removes the multiple-comparison effect  Revolve, and average (or, create an ensemble)  Previously shown to work in feature selection (Juha Reunanen: Less Biased Measurement of Feature Selection Benefits, SLSFS 2005, LNCS 3940, pages 198–208, 2006)

Competition Entries  Stochastic search guided by cross-validation  Several candidate models (and corresponding search processes running pseudo-parallel): Prepro+naiveBayes, PCA+kernelRidge, GS+kernelRidge, Prepro+linearSVC, Prepro+nonlinearSVC, Relief+neuralNet, RF, and Boosting (with neuralNet, SVC and kernelRidge)  Final selection and assessment using the cross-indexing criterion

Milestone Results Yellow: CLOP model. CLOP prize winner: Juha Reunanen (both ave. rank and ave. BER). Best ave. BER held by Reference (Gavin Cawley) with “the bad”. Agnostic learning ranks as of December 1 st, 2006

Models Selected

Conclusions  Because of multiple-comparison procedures (MCPs) on the different levels of inference, validation is often used to estimate final performance  On the second level, the cross-indexing trick may give estimates that are less biased (when comparing to straightforward outer-loop CV)

Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland.

Similar presentations

Presentation on theme: "Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland.

Similar presentations

Presentation on theme: "Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland."— Presentation transcript:

Similar presentations

About project

Feedback