Download presentation
Presentation is loading. Please wait.
Published byLaurence Glenn Modified over 9 years ago
2
AISTATS 2010 Active Learning Challenge: A Fast Active Learning Algorithm Based on Parzen Window Classification L.Lan, H.Shi, Z.Wang, S.Vucetic Temple University
3
2 Introduction Pool-based Active Learning Data labeling is expensive Large amount of unlabeled data are available at low cost. Goal is to label as few of the unlabeled examples and achieve as high accuracy as possible 2010 Active Learning Challenge Provide an opportunity for practitioners to evaluate active learning algorithms within an unbiased setup Data sets came from 6 various application domains Active Learning Algorithm Based on Parzen Window Classification
4
Challenge Data Sets Common Properties Binary Classification Class-Imbalanced Differences Features Concept 3 Data Set DomainFeat. Type Feat. Num. Sparsity % Missing % Label Train num. Train pos:neg Test num. AHandWritting Rec.mixed9279.020binary175351267:1626817535 BMarketingmixed25046.8925.76binary250002289:2271125000 CChemo-informaticsmixed8518.60binary257202095:2362525720 DText classificationbinary1200099.670binary100002519:748110000 EEmbryologycontinuous1540.040.0004binary322522912:2934032252 FEcologymixed1200binary676285194:6243467628 Active Learning Algorithm Based on Parzen Window Classification
5
4 Challenge Setup Given 1 positive seed example Repeat Select which unlabeled examples to label Train a classifier Evaluate its accuracy (AUC: Area Under ROC Curve) Evaluate the active learning algorithm using ALC (Area under the Learning Curve) Active Learning Algorithm Based on Parzen Window Classification
6
Algorithm Design Issues Querying Strategy How many examples to label at each stage What unlabeled examples to select Classification Algorithm Simple vs. Powerful Easy to implement vs. Involved Preprocessing and Feature Selection Often the critical issue
7
6 Components of Our Approach Data preprocessing Normalization Feature selection filtering Pearson correlation; Kruskal-Wallis test Regularized Parzen Window Classifier Parameter tuning by cross-validation Ensemble of classifiers Classifiers differ by the selected features Active learning strategy Uncertainty sampling + Clustering-based random sampling Active Learning Algorithm Based on Parzen Window Classification
8
7 Algorithm Details Data preprocessing Missing value (did not address this issue) Normalization (mean = 0 and std = 1 of all non-binary features) Feature selection filters Pearson Correlation Test Kruskal-Wallis Test Calculated p-value for each feature Selected M features with lowest p-values Selected all features with p-value below 0.05 Active Learning Algorithm Based on Parzen Window Classification
9
Classification Model Regularized Parzen Window Classifier (RPWC) ε is regularizing parameter (set to 10 -5 in our experiments) K is the Gaussian Kernel of form: where the σ represents the kernel size RPWC easy to implement can learn highly nonlinear problems 8 Algorithm Details Active Learning Algorithm Based on Parzen Window Classification
10
9 Algorithm Details Active Learning Algorithm Based on Parzen Window Classification Classification Model Ensemble of RPWC classifiers Base classifiers differ in features used all features p-value of Pearson correlation<0.05 (filter_data1) 10 features with smallest p-value of Pearson correlation<0.05 (filter_data2) p-value of Kruskal-Wallis test<0.05 (filter_data3) 10 features with smallest p-value of Kruskal-Wallis test (filter_data4) Resulting ensemble classifier Average of the 5 base RPWC Base RPWC parameter tuning Examined 4 different values for kernel width σ: [M/9, M/3, M, 3M] Used leave-one-out cross-validation to select σ for each base classifier
11
10 Algorithm Details Active Learning Strategy Uncertainty sampling (EXPLOITATION) Uncertainty score for example x is defined as score(x) = |p(y|x) 0.5| Examples with the smallest score are selected Advantage: Focuses on improving accuracy near decision boundary. Disadvantage: Overlooks important underexplored regions Clustering-based random sampling (EXPLORATION) Partition unlabeled data into k clusters Select same number of random examples from each cluster Advantage: Does not miss any important region Disadvantage: Fails to focus on the uncertain regions Active Learning Algorithm Based on Parzen Window Classification
12
Algorithm Outline 11 Input: labeled set L, unlabeled set U Q randomly select 20 unlabeled examples from U for t = 1 to 10 U U – Q ; L L + labeled(Q) for j = 1 to F F j feature_filter(L) // feature selection filter C j train_classifier(L, F j ) // train classifier C j from L using features F j ; determine model parameters by CV on L A j accuracy(C j,L) // estimate accuracy of classifier C j by CV on L end for C avg average(C j ) // build ensemble classifier C avg by averaging Q (2|L|/3 of the most uncertain examples in U) Q Q + (|L|/3 random examples chosen from randomly selected clusters of U) end for
13
Competition Results DATA SET Best ALC INTEL ALC (best team) TUCIS ALC TUCIS RANK A0.6290.5270.4707 B0.3760.3170.2618 C0.4270.3810.2397 D0.8610.6400.6525 E0.6270.4730.4437 F0.802 0.6748 12 Overall, our TUCIS Team ranked at 5 th place Active Learning Algorithm Based on Parzen Window Classification Official ALC Scores:
14
Competition Results DATA SET Best AUC INTEL AUC (best team) TUCIS AUC A0.9620.9520.899 B0.7670.7540.668 C0.833 0.707 D0.973 0.939 E0.925 0.834 F0.999 0.987 13 Active Learning Algorithm Based on Parzen Window Classification AUC Accuracy of the Final Classifier: Our final predictors are less accurate than the best challenge algorithms. This indicates that Parzen Window Classifiers are not the best choice for the challenge data sets.
15
Competition Results
16
Data Set One by one (alternative) Begin with 20 (submitted) A0.537 ± 0.0360.466 ± 0.015 B0.267 ± 0.0220.273 ± 0.021 C0.242 ± 0.0340.261 ± 0.039 D0.576 ± 0.0350.601 ± 0.027 E0.445 ± 0.0280.433 ± 0.017 F0.750 ± 0.0460.757 ± 0.062 15 Overall, querying beginning with 20 random examples is slightly better Active Learning Algorithm Based on Parzen Window Classification Post-Competition Experiments 1. Early Start select the first 20 examples randomly one by one vs. select the first 20 at once
17
16 Ensemble of 5 classifiers is the best overall choice Active Learning Algorithm Based on Parzen Window Classification Data Set One classifier (all features) One classifier (Pearson Corr)Two ClassifiersFive Classifiers A0.462±0.080.433±0.0390.466±0.0150.458±0.016 B0.260±0.0040.246±0.0340.273±0.0210.304±0.023 C0.321±0.0230.292±0.0380.260±0.0390.336±0.049 D0.670±0.0260.540±0.0280.600±0.0260.551±0.050 E0.426±0.0060.416±0.0230.432±0.0160.449±0.006 F0.620±0.0230.785±0.0260.757±0.0620.779±0.025 Post-Competition Experiments 2. Comparison of Ensembles of Classifiers
18
17 Pre-clustering does not improve the performance Active Learning Algorithm Based on Parzen Window Classification Post-Competition Experiments 3. Comparison of 2 Querying strategies 2/3 uncertainty + 1/3 random vs. preclustering DataSet2/3 + 1/3Preclustering A0.466±0.0150.447±0.011 B0.273±0.0210.169±0.057 C0.260±0.0390.292±0.048 D0.600±0.0260.506±0.036 E0.432±0.0160.378±0.058 F0.757±0.0620.708±0.062
19
Conclusions Our active learning algorithm Uses ensemble of Parzen Window Classifiers Uses feature selection filters Combined uncertainty and random sampling Has geometric sampling schedule Our team was ranked 5 th overall The gap from the best performing algorithms was significant Indicates PW classifiers are not appropriate for challenge sets Building larger PW ensembles could improve performance Our exploration-exploitation querying approach was successful Active Learning Algorithm Based on Parzen Window Classification
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.