AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple.

Slides:



Advertisements
Similar presentations
Active Learning based on Bayesian Networks Luis M. de Campos, Silvia Acid and Moisés Fernández.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Random Forest Predrag Radenković 3237/10
Imbalanced data David Kauchak CS 451 – Fall 2013.
Learning Algorithm Evaluation
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Model generalization Test error Bias, variance and complexity
Model assessment and cross-validation - overview
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Mutual Information Mathematical Biology Seminar
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Maximizing Classifier Utility when Training Data is Costly Gary M. Weiss Ye Tian Fordham University.
DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Ensemble Learning (2), Tree and Forest
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Who would be a good loanee? Zheyun Feng 7/17/2015.
1 CSI5388 Data Sets: Running Proper Comparative Studies with Large Data Repositories [Based on Salzberg, S.L., 1997 “On Comparing Classifiers: Pitfalls.
Today Evaluation Measures Accuracy Significance Testing
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Machine Learning CS 165B Spring 2012
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
A Significance Test-Based Feature Selection Method for the Detection of Prostate Cancer from Proteomic Patterns M.A.Sc. Candidate: Qianren (Tim) Xu The.
by B. Zadrozny and C. Elkan
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Participation in the NIPS 2003 Challenge Theodor Mader ETH Zurich, Five Datasets were provided for experiments: ARCENE: cancer diagnosis.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Experimental Evaluation of Learning Algorithms Part 1.
Semi-supervised Learning on Partially Labeled Imbalanced Data May 16, 2010 Jianjun Xie and Tao Xiong.
Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Team Dogecoin: An Experience in Predicting Hospital Readmissions Acknowledgements The Problem Hospitals in the UK must keep track of which patients, once.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Christopher M. Bishop Object Recognition: A Statistical Learning Perspective Microsoft Research, Cambridge Sicily, 2003.
Stochastic Unsupervised Learning on Unlabeled Data July 2, 2011 Presented by Jianjun Xie – CoreLogic Collaborated with Chuanren Liu, Yong Ge and Hui Xiong.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.
Validation methods.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Classifying Covert Photographs CVPR 2012 POSTER. Outline  Introduction  Combine Image Features and Attributes  Experiment  Conclusion.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
Data Science Credibility: Evaluating What’s Been Learned
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Computational Intelligence: Methods and Applications
Neuro-Computing Lecture 4 Radial Basis Function Network
Approaching an ML Problem
Model generalization Brief summary of methods
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple University

2 Introduction Pool-based Active Learning  Data labeling is expensive  Large amount of unlabeled data are available at low cost.  Goal is to label as few of the unlabeled examples and achieve as high accuracy as possible 2010 Active Learning Challenge  Provide an opportunity for practitioners to evaluate active learning algorithms within an unbiased setup  Data sets came from 6 various application domains Active Learning Algorithm Based on Parzen Window Classification

Challenge Data Sets Common Properties  Binary Classification  Class-Imbalanced Differences  Features  Concept 3 Data Set DomainFeat. Type Feat. Num. Sparsity % Missing % Label Train num. Train pos:neg Test num. AHandWritting Rec.mixed binary : BMarketingmixed binary : CChemo-informaticsmixed binary : DText classificationbinary binary : EEmbryologycontinuous binary : FEcologymixed1200binary : Active Learning Algorithm Based on Parzen Window Classification

4 Challenge Setup Given 1 positive seed example Repeat  Select which unlabeled examples to label  Train a classifier  Evaluate its accuracy (AUC: Area Under ROC Curve) Evaluate the active learning algorithm  using ALC (Area under the Learning Curve) Active Learning Algorithm Based on Parzen Window Classification

Algorithm Design Issues Querying Strategy  How many examples to label at each stage  What unlabeled examples to select Classification Algorithm  Simple vs. Powerful  Easy to implement vs. Involved Preprocessing and Feature Selection  Often the critical issue

6 Components of Our Approach Data preprocessing  Normalization Feature selection filtering  Pearson correlation; Kruskal-Wallis test Regularized Parzen Window Classifier  Parameter tuning by cross-validation Ensemble of classifiers  Classifiers differ by the selected features Active learning strategy  Uncertainty sampling + Clustering-based random sampling Active Learning Algorithm Based on Parzen Window Classification

7 Algorithm Details Data preprocessing  Missing value (did not address this issue)  Normalization (mean = 0 and std = 1 of all non-binary features) Feature selection filters  Pearson Correlation Test  Kruskal-Wallis Test  Calculated p-value for each feature  Selected M features with lowest p-values  Selected all features with p-value below 0.05 Active Learning Algorithm Based on Parzen Window Classification

Classification Model  Regularized Parzen Window Classifier (RPWC) ε is regularizing parameter (set to in our experiments) K is the Gaussian Kernel of form: where the σ represents the kernel size  RPWC easy to implement can learn highly nonlinear problems 8 Algorithm Details Active Learning Algorithm Based on Parzen Window Classification

9 Algorithm Details Active Learning Algorithm Based on Parzen Window Classification Classification Model  Ensemble of RPWC classifiers Base classifiers differ in features used  all features  p-value of Pearson correlation<0.05 (filter_data1)  10 features with smallest p-value of Pearson correlation<0.05 (filter_data2)  p-value of Kruskal-Wallis test<0.05 (filter_data3)  10 features with smallest p-value of Kruskal-Wallis test (filter_data4) Resulting ensemble classifier  Average of the 5 base RPWC  Base RPWC parameter tuning Examined 4 different values for kernel width σ: [M/9, M/3, M, 3M] Used leave-one-out cross-validation to select σ for each base classifier

10 Algorithm Details Active Learning Strategy Uncertainty sampling (EXPLOITATION)  Uncertainty score for example x is defined as score(x) = |p(y|x)  0.5|  Examples with the smallest score are selected  Advantage: Focuses on improving accuracy near decision boundary.  Disadvantage: Overlooks important underexplored regions Clustering-based random sampling (EXPLORATION)  Partition unlabeled data into k clusters  Select same number of random examples from each cluster  Advantage: Does not miss any important region  Disadvantage: Fails to focus on the uncertain regions Active Learning Algorithm Based on Parzen Window Classification

Algorithm Outline 11 Input: labeled set L, unlabeled set U Q  randomly select 20 unlabeled examples from U for t = 1 to 10 U  U – Q ; L  L + labeled(Q) for j = 1 to F F j  feature_filter(L) // feature selection filter C j  train_classifier(L, F j ) // train classifier C j from L using features F j ; determine model parameters by CV on L A j  accuracy(C j,L) // estimate accuracy of classifier C j by CV on L end for C avg  average(C j ) // build ensemble classifier C avg by averaging Q  (2|L|/3 of the most uncertain examples in U) Q  Q + (|L|/3 random examples chosen from randomly selected clusters of U) end for

Competition Results DATA SET Best ALC INTEL ALC (best team) TUCIS ALC TUCIS RANK A B C D E F Overall, our TUCIS Team ranked at 5 th place Active Learning Algorithm Based on Parzen Window Classification Official ALC Scores:

Competition Results DATA SET Best AUC INTEL AUC (best team) TUCIS AUC A B C D E F Active Learning Algorithm Based on Parzen Window Classification AUC Accuracy of the Final Classifier: Our final predictors are less accurate than the best challenge algorithms. This indicates that Parzen Window Classifiers are not the best choice for the challenge data sets.

Competition Results

Data Set One by one (alternative) Begin with 20 (submitted) A0.537 ± ± B0.267 ± ± C0.242 ± ± D0.576 ± ± E0.445 ± ± F0.750 ± ± Overall, querying beginning with 20 random examples is slightly better Active Learning Algorithm Based on Parzen Window Classification Post-Competition Experiments 1. Early Start  select the first 20 examples randomly one by one vs.  select the first 20 at once

16 Ensemble of 5 classifiers is the best overall choice Active Learning Algorithm Based on Parzen Window Classification Data Set One classifier (all features) One classifier (Pearson Corr)Two ClassifiersFive Classifiers A0.462± ± ± ±0.016 B0.260± ± ± ±0.023 C0.321± ± ± ±0.049 D0.670± ± ± ±0.050 E0.426± ± ± ±0.006 F0.620± ± ± ±0.025 Post-Competition Experiments 2. Comparison of Ensembles of Classifiers

17 Pre-clustering does not improve the performance Active Learning Algorithm Based on Parzen Window Classification Post-Competition Experiments 3. Comparison of 2 Querying strategies  2/3 uncertainty + 1/3 random vs.  preclustering DataSet2/3 + 1/3Preclustering A0.466± ±0.011 B0.273± ±0.057 C0.260± ±0.048 D0.600± ±0.036 E0.432± ±0.058 F0.757± ±0.062

Conclusions Our active learning algorithm  Uses ensemble of Parzen Window Classifiers  Uses feature selection filters  Combined uncertainty and random sampling  Has geometric sampling schedule Our team was ranked 5 th overall  The gap from the best performing algorithms was significant  Indicates PW classifiers are not appropriate for challenge sets  Building larger PW ensembles could improve performance Our exploration-exploitation querying approach was successful Active Learning Algorithm Based on Parzen Window Classification