Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.

Similar presentations


Presentation on theme: "An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei."— Presentation transcript:

1 An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, Chun- Sung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin, Shou-de Lin Team: CSIE Department, National Taiwan University 2016/6/41

2 Observations on Training Positive labels of 3 tasks are exclusive –Transform to 4-class classification (C / A / U / Null) 2016/6/42

3 Observations on 10% Testing Cross-validation (CV) varies from 10% testing (online feedback) significantly for certain classifiers –Use CV instead of 10% results to tuning up parameters 2016/6/43 ChurnAppetencyUp-selling Validation Testing

4 Challenges 1.Noisy –Redundant or irrelevant features: feature selection –Significant amount of missing values 2.Heterogeneous –Number of distinct numerical values: 1 to ~50,000 –Number of distinct categorical values: 1 to ~15,000 2016/6/44 → Decision of classifiers and pre-processing methods !

5 System Overview 2016/6/45 Pre-processing Missing Values Numerical Features Categorical Features Classification Maximum Entropy Heterogeneous AdaBoost Selective Naïve Bayes Post-processing Score Adjustment Score Ensemble Feature Selection

6 Maximum Entropy Transform to joint multi-class classification –Maximum entropy model → probability estimation Feature selection –L1-regularized solver 2016/6/46 x = example y = label w = model k = # of classes n = # of iterations l = # of examples C = penalty parameter L1 Regularization Maximum Entropy

7 Max-Ent: Missing Values Fill missing values with zeroes or “missing” Add a binary feature to indicate “missing” 2016/6/47 IDLabelNum1Num2Cat1Cat2 1+1100.2AD 220.4A 31000.5B 4+10.3CE 51200.1B Mis1 0 0 0 1 0 Mis2 0 1 1 0 1 0 Miss

8 Max-Ent: Numerical Feature Log scaling Linear scaling to [0, 1] 2016/6/48 IDNum1Num2 1100.2 220.4 31000.5 400.3 5200.1 IDLog1Log2 11.000-0.699 20.301-0.398 32.000-0.301 40.000-0.523 51.301 Lin1Lin2 0.1000.400 0.0200.800 1.000 0.0000.600 0.200

9 Max-Ent: Categorical Feature Add a binary feature for each category Also for numerical features with <5 distinct values 2016/6/49 IDCat1Cat2 1AD 2AMiss 3B 4CE 5B ABC 100 100 010 001 010 DE 10 00 00 01 00

10 Heterogeneous AdaBoost Feature selection –Inherent Missing value –Treated as a category Numerical feature –Numerical tree base learner Categorical feature –Categorical tree base learner –Height limitation for complexity regularization 2016/6/410

11 Selective Naïve Bayes Feature selection –Heuristic search [Boull´e, 2007] Missing value –No assumption required Numerical feature –Discretization [Hue and Boull´e, 2007] Categorical feature –Grouping [Hue and Boull´e, 2007] 2016/6/411

12 Score Adjustment Train a linear SVM to select one from the 4 classes For each classifier, generate features using –Scores of 3 classes –Entropy of the prediction scores –Ranks of 3 classes Use true label for training Output adjusted scores 2016/6/412

13 Score Ensemble Refer to the adjusted scores of CV Select best 2 classifiers for each task Average the rank of scores from 2 classifiers Output the averaged rank as final result 2016/6/413

14 Results AUC Results –Train: CV –Test: 10% testing 2016/6/414 Appetency improves most with post-processing

15 Other methods we have tried 2016/6/415 Rank logistic regression –Maximize AUC = maximize pair-wise ranking accuracy –Adopt pair-wise rank logistic regression –Not as good as other classifiers Tree-based composite classifier –Categorize examples using missing value pattern –Train a classifier for each of the 85 groups –Not significantly better than other classifiers

16 Conclusions Identify 2 challenges in data –Noisy → feature selection + missing value processing –Heterogeneous → numerical + categorical pre-processing Combine 3 classifiers to solve the challenges –Maximum Entropy → convert data into numerical –Heterogeneous AdaBoost → combine heterogeneous info –Selective Naïve Bayes → discover probabilistic relations Observe 2 properties from tasks –Training → model design and post-processing –10% Testing → overfitting prevention using CV 2016/6/416 Thanks !

17 Reference [ Boull´e, 2007] M. Boull´e. Compression-based averaging of selective naive bayes classifiers. Journal of Machine Learning Research, 8:1659–1685, 2007. [Hue and Boull´e, 2007] C. Hue and M. Boull´e. A new probabilistic approach in rank regression with optimal bayesian partitioning. Journal of Machine Learning Research, 8:2727–2754, 2007. 2016/6/417


Download ppt "An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei."

Similar presentations


Ads by Google