Active Learning Challenge Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia,

Slides:



Advertisements
Similar presentations
ICML 2009 Yisong Yue Thorsten Joachims Cornell University
Advertisements

Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
Introduction to Monte Carlo Markov chain (MCMC) methods
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Active Learning based on Bayesian Networks Luis M. de Campos, Silvia Acid and Moisés Fernández.
Yukun Chen Subramani Mani Discovery Systems Lab (DSL) Department of Biomedical Informatics Vanderbilt University May 2010.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.
Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.
Optimizing Estimated Loss Reduction for Active Sampling in Rank Learning Presented by Pinar Donmez joint work with Jaime G. Carbonell Language Technologies.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Dr. Abdul Aziz Associate Dean Faculty of Computer Sciences Riphah International University Islamabad, Pakistan Dr. Nazir A. Zafar.
Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.
1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.
Discriminative Naïve Bayesian Classifiers Kaizhu Huang Supervisors: Prof. Irwin King, Prof. Michael R. Lyu Markers: Prof. Lai Wan Chan, Prof. Kin Hong.
Exploration & Exploitation in Adaptive Filtering Based on Bayesian Active Learning Yi Zhang, Jamie Callan Carnegie Mellon Univ. Wei Xu NEC Lab America.
Competent Undemocratic Committees Włodzisław Duch, Łukasz Itert and Karol Grudziński Department of Informatics, Nicholas Copernicus University, Torun,
Lecture 6: Causal Discovery Isabelle Guyon
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Ensemble Learning (2), Tree and Forest
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
RESULTS OF THE WCCI 2006 PERFORMANCE PREDICTION CHALLENGE Isabelle Guyon Amir Reza Saffari Azar Alamdari Gideon Dror.
RESULTS OF THE NIPS 2006 MODEL SELECTION GAME Isabelle Guyon, Amir Saffari, Gideon Dror, Gavin Cawley, Olivier Guyon, and many other volunteers, see
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Causality Workbenchclopinet.com/causality Cause-Effect Pair Challenge Isabelle Guyon, ChaLearn IJCNN 2013 IEEE/INNS.
Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Active Learning for Class Imbalance Problem
Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.
1 Unsupervised and Transfer Learning Challenge Can Machines Transfer Knowledge from Task to Task? Isabelle Guyon Clopinet, California.
ACTIVE LEARNING USING CONFORMAL PREDICTORS: APPLICATION TO IMAGE CLASSIFICATION HypHyp Introduction HypHyp Conceptual overview HypHyp Experiments and results.
Semi-supervised Learning on Partially Labeled Imbalanced Data May 16, 2010 Jianjun Xie and Tao Xiong.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Active Learning Challenge Active Learning Workshop.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
NIPS 2001 Workshop on Feature/Variable Selection Isabelle Guyon BIOwulf Technologies.
Competitions in machine learning: the fun, the art, and the science Isabelle Guyon Clopinet, Berkeley, California
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.
1 Unsupervised and Transfer Learning Challenge Unsupervised and Transfer Learning Challenge Isabelle Guyon Clopinet, California.
Carolyn Penstein Rosé Language Technologies Institute Human-Computer Interaction Institute School of Computer Science With funding from the National Science.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.9: Semi-Supervised Learning Rodney Nielsen Many.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
ChaLearn HEP competitions Isabelle Guyon, ChaLearn and LRI, Université Paris Sud, Orsay for.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Predicting Consensus Ranking in Crowdsourced Setting Xi Chen Mentors: Paul Bennett and Eric Horvitz Collaborator: Kevyn Collins-Thompson Machine Learning.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
AutoML challenge Isabelle Guyon
Bayesian Active Learning with Evidence-Based Instance Selection LMCE at ECML PKDD th September 2015, Porto Niall Twomey, Tom Diethe, Peter Flach.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Canadian Bioinformatics Workshops
Transfer Learning in Astronomy: A New Machine Learning Paradigm
Introductory Seminar on Research: Fall 2017
Finding Clusters within a Class to Improve Classification Accuracy
Gesture Recognition Challenge
Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels.
Model generalization Brief summary of methods
MILESTONE RESULTS Mar. 1st, 2007
Presentation transcript:

Active Learning Challenge Active Learning Challenge Isabelle Guyon (Clopinet, California) Gavin Cawley (University of East Anglia, UK) Olivier Chapelle (Yahhoo!, California) Gideon Dror (Academic College of Tel-Aviv-Yaffo, Israel) Vincent Lemaire (Orange, France) Amir Reza Saffari Azar (Graz University of Technology) Alexander Statnikov (New York University, USA)

Active Learning Challenge What is the problem?

Active Learning Challenge Labeling data is expensive $$ $$$$$

Active Learning Challenge Examples of domains Chemo-informatics Handwriting and speech recognition Image processing Text processing Marketing Ecology Embryology

Active Learning Challenge What is active learning?

Active Learning Challenge What is out there?

Active Learning Challenge Scenarios Burr Settles. Active Learning Literature Survey. CDTR 1648, Univ. Wisconsin – Madison

Active Learning Challenge De novo queries De novo queries implicitly assume interventions on the system under study: not for this challenge

Active Learning Challenge Focus on pool-based AL Simplest scenario for a challenge. Training data: labels can be queried Test data: unknown labels Methods developed for pool-based AL should also be useful for stream-based AL.

Active Learning Challenge Example (a) Toy 2-class problem, 400 instances Gaussian distributed. (b) Linear logistic regression model trained w. 30 random instances. (c) Linear logistic regression model trained w. 30 actively queried instances using uncertainty sampling. Accuracy=0.7Accuracy=0.9 Burr Settles, 2009

Active Learning Challenge Learning curve Burr Settles, 2009

Active Learning Challenge Other methods Expected model change (greatest gradient if sample were used for training) Query by committee (query the sample subject to largest disagreement) Bayesian active learning (maximize change in revised posterior distribution) Expected error reduction (maximize generalization performance improvement) Information density (ask for examples both informative and representative) Burr Settles, 2009

Active Learning Challenge Datasets

Active Learning Challenge Data donors This project would not have been possible without generous donations of data: Chemoinformatics -- Charles Bergeron, Kristin Bennett and Curt Breneman (Rensselaer Polytechnic Institute, New York) contributed a dataset, which will be used for final testing.Kristin Bennett Embryology -- Emmanuel Faure, Thierry Savy, Louise Duloquin, Miguel Luengo Oroz, Benoit Lombardot, Camilo Melani, Paul Bourgine, and Nadine Peyriéras (Institut des systèmes complexes, France) contributed the ZEBRA dataset.Emmanuel Faure Handwriting recognition -- Reza Farrahi Moghaddam, Mathias Adankon, Kostyantyn Filonenko, Robert Wisnovsky, and Mohamed Chériet (Ecole de technologie supérieure de Montréal, Quebec) contributed the IBN_SINA dataset.Mohamed Chériet Marketing -- Vincent Lemaire, Marc Boullé, Fabrice Clérot, Raphael Féraud, Aurélie Le Cam, and Pascal Gouzien (Orange, France) contributed the ORANGE dataset, previously used in the KDD cup 2009.Vincent LemaireMarc BoulléKDD cup 2009 We also reused data made publicly available on the Internet: Chemoinformatics -- The National Cancer Institute (USA) for the HIVA dataset.The National Cancer Institute Ecology -- Jock A. Blackard, Denis J. Dean, and Charles W. Anderson (US Forest Service, USA) for the SYLVA dataset (Forest cover type).US Forest ServiceForest cover type Text processing -- Tom Mitchell (USA) and Ron Bekkerman (Israel) for the NOVA datset (derived from the Twenty Newsgroups).Ron BekkermanTwenty Newsgroups

Active Learning Challenge Development datasets

Active Learning Challenge Difficulties Spase data Missing values Unbalanced classes Categorical variables Noisy data Large datasets

Active Learning Challenge Final test datasets Will serve to do the final ranking Will be from the same domains May have different data representations and distributions No feed-back: the results will not be revealed until the end of the challenge

Active Learning Challenge Protocol

Active Learning Challenge Virtual Lab Joint work with: Constantin Aliferis, New York University Gregory F. Cooper, Pittsburg University André Elisseeff, Nhumi, Zürich Jean-Philippe Pellet, IBM Zürich Alexander Statnikov, New York University Peter Spirtes, Carnegie Mellon Virtual cash

Active Learning Challenge Step by step instructions 1.Predict 2.Sample 3.Submit a query 4.Retrieve the labels Download the data. You get 1 labeled example.

Active Learning Challenge Two phases Development phase: –6 datasets available –Can try as many times as you want –Matlab users can run queries on their computers –Others can use the labels (provided) Final test phase: –6 new datasets available –A single try –No feed-back

Active Learning Challenge Evaluation

Active Learning Challenge AUC score For each set of samples queried, we assess the predictions of the learning machine with the Area under the ROC curve.

Active Learning Challenge Area under the Learning Curve (ALC) Linear interpolation. Horizontal extrapolation. One queryFive queriesThirteen queries Lazy: ask for all labels at once

Active Learning Challenge Prizes 1 dataset: $100 2 datasets: $200 3 datasets: $400 4 datasets: $800 5 datasets: $ datasets: $3200! Plus travel awards for top ranking students. If you win on…

Active Learning Challenge Schedule

Active Learning Challenge Conclusion Try our new challenge, learn, and win!!!! –Workshops: AISTATS 2010, Sardinia, May, 2010 WCCI 2010 Workshop, Barcelona, July, 2010 Travel awards for top ranking students. –Proceedings published by JMLR & IEEE. –Prizes: P(i)=$100 * 2 (n-1) –Your problem solved by dozens of research groups: Help us organize the next challenge!