RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror.

Slides:



Advertisements
Similar presentations
On the application of GP for software engineering predictive modeling: A systematic review Expert systems with Applications, Vol. 38 no. 9, 2011 Wasif.
Advertisements

Lazy Paired Hyper-Parameter Tuning
Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )
Learning Algorithm Evaluation
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Model Assessment and Selection
Model Assessment, Selection and Averaging
Feature/Model Selection by Linear Programming SVM, Combined with State-of-Art Classifiers: What Can We Learn About the Data Erinija Pranckeviciene, Ray.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Model Selection and Assessment Using Cross-indexing Juha Reunanen ABB, Web Imaging Systems, Finland.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Reduced Support Vector Machine
Evaluation.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Machine Learning CMPT 726 Simon Fraser University
Sparse vs. Ensemble Approaches to Supervised Learning
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Ensemble Learning (2), Tree and Forest
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see
CLOP A MATLAB® learning object package
Baseline Methods for the Feature Extraction Class Isabelle Guyon Best BER=1.26  0.14% - n0=1000 (20%) – BER0=1.80% GISETTE Best BER=1.26  0.14% - n0=1000.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, Five Datasets were provided for experiments: ARCENE: cancer diagnosis.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Lab 1 Getting started with CLOP and the Spider package.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Experimental Evaluation of Learning Algorithms Part 1.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li University of Zurich Department of Informatics The NIPS 2003 challenge was organized to.
Lecture 2: Learning without Over-learning Isabelle Guyon
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
PSMS for Neural Networks on the Agnostic vs Prior Knowledge Challenge Hugo Jair Escalante, Manuel Montes and Enrique Sucar Computer Science Department.
Presentation Title Department of Computer Science A More Principled Approach to Machine Learning Michael R. Smith Brigham Young University Department of.
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Ensemble Methods in Machine Learning
Validation methods.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
CS 9633 Machine Learning Support Vector Machines
Evaluating Classifiers
An Empirical Comparison of Supervised Learning Algorithms
Boosted Augmented Naive Bayes. Efficient discriminative learning of
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
ECE 5424: Introduction to Machine Learning
CS548 Fall 2017 Decision Trees / Random Forest Showcase by Yimin Lin, Youqiao Ma, Ran Lin, Shaoju Wu, Bhon Bunnag Showcasing work by Cano,
Learning with information of features
MILESTONE RESULTS Mar. 1st, 2007
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Support Vector Machines 2
Presentation transcript:

RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror

Part I INTRODUCTION

Model selection Selecting models (neural net, decision tree, SVM, …) Selecting hyperparameters (number of hidden units, weight decay/ridge, kernel parameters, …) Selecting variables or features (space dimensionality reduction.) Selecting patterns (data cleaning, data reduction, e.g by clustering.)

Performance prediction How good are you at predicting how good you are? Practically important in pilot studies. Good performance predictions render model selection trivial.

Why a challenge? Stimulate research and push the state-of- the art. Move towards fair comparisons and give a voice to methods that work but may not be backed up by theory (yet). Find practical solutions to true problems. Have fun…

History USPS/NIST. Unipen (with Lambert Schomaker): 40 institutions share 5 million handwritten characters. KDD cup, TREC, CASP, CAMDA, ICDAR, etc. NIPS challenge on unlabeled data. Feature selection challenge (with Steve Gunn): success! ~75 entrants, thousands of entries. Pascal challenges. Performance prediction challenge …

Challenge Date started: Friday September 30, Date ended: Monday March 1, 2006 Duration: 21 weeks. Estimated number of entrants: 145. Number of development entries: Number of ranked participants: 28. Number of ranked submissions: 117.

Datasets Dataset Domain Type Feat- ures Training Examples Validation Examples Test Examples ADA Marketing Dense GINA Digits Dense HIVA Drug discovery Dense NOVA Text classif. Sparse binary SYLVA Ecology Dense

BER distribution Test BER

Results Overall winners for ranked entries: Ave rank: Roman Lutz with LB tree mix cut adapted Ave score: Gavin Cawley with Final #2 ADA: Marc Boullé with SNB(CMA)+10k F(2D) tv or SNB(CMA) + 100k F(2D) tv GINA: Kari Torkkola & Eugene Tuv with ACE+RLSC HIVA: Gavin Cawley with Final #3 (corrected) NOVA: Gavin Cawley with Final #1 SYLVA: Marc Boullé with SNB(CMA) + 10k F(3D) tv Best AUC: Radford Neal with Bayesian Neural Networks

Part II PROTOCOL and SCORING

Protocol Data split: training/validation/test. Data proportions: 10/1/100. Online feed-back on validation data. Validation label release one month before end of challenge. Final ranking on test data using the five last complete submissions for each entrant.

Performance metrics Balanced Error Rate (BER): average of error rates of positive class and negative class. Guess error:  BER = abs(testBER – guessedBER) Area Under the ROC Curve (AUC).

Optimistic guesses ADA GINA HIVA NOVA SYLVA

Scoring method E = testBER +  BER [1-exp(-   BER/  )]  BER = abs(testBER – guessedBER)  Guessed BER Challenge score Test BER

 BER/  Test BER E  testBER +  BER ADA GINA HIVA NOVA SYLVA

Score E testBER testBER+  BER E = testBER +  BER [1-exp(-   BER/  )]

Score (continued) ADA GINA SYLVA HIVA NOVA

Part III RESULT ANALYSIS

What did we expect? Learn about new competitive machine learning techniques. Identify competitive methods of performance prediction, model selection, and ensemble learning (theory put into practice.) Drive research in the direction of refining such methods (on-going benchmark.)

Method comparison  BER Test BER

Danger of overfitting BER Time (days) ADA GINA HIVA NOVA SYLVA Full line: test BER Dashed line: validation BER

How to estimate the BER? Statistical tests (Stats): Compute it on training data; compare with a “null hypothesis” e.g. the results obtained with a random permutation of the labels. Cross-validation (CV): Split the training data many times into training and validation set; average the validation data results. Guaranteed risk minimization (GRM): Use of theoretical performance bounds.

Stats / CV / GRM ???

Top ranking methods Performance prediction: –CV with many splits 90% train / 10% validation –Nested CV loops Model selection: –Use of a single model family –Regularized risk / Bayesian priors –Ensemble methods –Nested CV loops, computationally efficient with with VLOO

Other methods Use of training data only: –Training BER. –Statistical tests. Bayesian evidence. Performance bounds. Bilevel optimization.

Part IV CONCLUSIONS AND FURTHER WORK

Open problems Bridge the gap between theory and practice… What are the best estimators of the variance of CV? What should k be in k-fold? Are other cross-validation methods better than k- fold (e.g bootstrap, 5x2CV)? Are there better “hybrid” methods? What search strategies are best? More than 2 levels of inference?

Future work Game of model selection. JMLR special topic on model selection. IJCNN 2007 challenge!

Benchmarking model selection? Performance prediction: Participants just need to provide a guess of their test performance. If they can solve that problem, they can perform model selection efficiently. Easy and motivating. Selection of a model from a finite toolbox: In principle a more controlled benchmark, but less attractive to participants.

CLOP CLOP=Challenge Learning Object Package. Based on the Spider developed at the Max Planck Institute. Two basic abstractions: –Data object –Model object

CLOP tutorial  D=data(X,Y);  hyper = {'degree=3', 'shrinkage=0.1'};  model = kridge(hyper);  [resu, model] = train(model, D);  tresu = test(model, testD);  model = chain({standardize,kridge(hyper)}); At the Matlab prompt:

Conclusions Twice as much volume of participation as in the feature selection challenge Top methods as before (different order): –Ensembles of trees –Kernel methods (RLSC/LS-SVM, SVM) –Bayesian neural networks –Naïve Bayes. Danger of overfitting. Triumph of cross-validation?