An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.

Slides:

Advertisements

Similar presentations

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Advertisements

Random Forest Predrag Radenković 3237/10

On-line learning and Boosting

D ON ’ T G ET K ICKED – M ACHINE L EARNING P REDICTIONS FOR C AR B UYING Albert Ho, Robert Romano, Xin Alice Wu – Department of Mechanical Engineering,

Feature selection and transduction for prediction of molecular bioactivity for drug design Reporter: Yu Lun Kuo (D )

Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.

An Overview of Machine Learning

KDD Cup 2004 Winning Model for Task 1: Particle Physics Prediction David S. Vogel: MEDai / AI Insight, University of Central Florida Eric Gottschalk: MEDai.

CMPUT 466/551 Principal Source: CMU

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.

Final review LING572 Fei Xia Week 10: 03/13/08 1.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Today Linear Regression Logistic Regression Bayesians v. Frequentists

A Brief Introduction to Adaboost

Today Logistic Regression Decision Trees Redux Graphical Models

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Ensemble Learning (2), Tree and Forest

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Crash Course on Machine Learning

AbstractAbstract Adjusting Active Basis Model by Regularized Logistic Regression Ruixun Zhang Peking University, Department of Statistics and Probability.

Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.

Final review LING572 Fei Xia Week 10: 03/11/

A Two-Stage Ensemble of classification, regression, and ranking Models for Advertisement Ranking Presenter: Prof. Shou-de Lin Team NTU members: Kuan-Wei.

Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.

Data mining and machine learning A brief introduction.

DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.

Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.

Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

Chapter 4: Introduction to Predictive Modeling: Regressions

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Predicting Good Probabilities With Supervised Learning

Slides for “Data Mining” by I. H. Witten and E. Frank.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

Ensemble Methods in Machine Learning

Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel.

1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.

NTU & MSRA Ming-Feng Tsai

DATA MINING LECTURE 10b Classification k-nearest neighbor classifier

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Chapter 7. Classification and Prediction

Boosted Augmented Naive Bayes. Efficient discriminative learning of

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Boosting and Additive Trees (2)

Asymmetric Gradient Boosting with Application to Spam Filtering

CIKM Competition 2014 Second Place Solution

Data Mining Course Project

Boosting For Tumor Classification With Gene Expression Data

Overfitting and Underfitting

Predicting Loan Defaults

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Presentation transcript:

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, Chun- Sung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin, Shou-de Lin Team: CSIE Department, National Taiwan University 2016/6/41

Observations on Training Positive labels of 3 tasks are exclusive –Transform to 4-class classification (C / A / U / Null) 2016/6/42

Observations on 10% Testing Cross-validation (CV) varies from 10% testing (online feedback) significantly for certain classifiers –Use CV instead of 10% results to tuning up parameters 2016/6/43 ChurnAppetencyUp-selling Validation Testing

Challenges 1.Noisy –Redundant or irrelevant features: feature selection –Significant amount of missing values 2.Heterogeneous –Number of distinct numerical values: 1 to ~50,000 –Number of distinct categorical values: 1 to ~15, /6/44 → Decision of classifiers and pre-processing methods !

System Overview 2016/6/45 Pre-processing Missing Values Numerical Features Categorical Features Classification Maximum Entropy Heterogeneous AdaBoost Selective Naïve Bayes Post-processing Score Adjustment Score Ensemble Feature Selection

Maximum Entropy Transform to joint multi-class classification –Maximum entropy model → probability estimation Feature selection –L1-regularized solver 2016/6/46 x = example y = label w = model k = # of classes n = # of iterations l = # of examples C = penalty parameter L1 Regularization Maximum Entropy

Max-Ent: Missing Values Fill missing values with zeroes or “missing” Add a binary feature to indicate “missing” 2016/6/47 IDLabelNum1Num2Cat1Cat AD 220.4A B CE B Mis Mis Miss

Max-Ent: Numerical Feature Log scaling Linear scaling to [0, 1] 2016/6/48 IDNum1Num IDLog1Log Lin1Lin

Max-Ent: Categorical Feature Add a binary feature for each category Also for numerical features with <5 distinct values 2016/6/49 IDCat1Cat2 1AD 2AMiss 3B 4CE 5B ABC DE

Heterogeneous AdaBoost Feature selection –Inherent Missing value –Treated as a category Numerical feature –Numerical tree base learner Categorical feature –Categorical tree base learner –Height limitation for complexity regularization 2016/6/410

Selective Naïve Bayes Feature selection –Heuristic search [Boullé, 2007] Missing value –No assumption required Numerical feature –Discretization [Hue and Boullé, 2007] Categorical feature –Grouping [Hue and Boullé, 2007] 2016/6/411

Score Adjustment Train a linear SVM to select one from the 4 classes For each classifier, generate features using –Scores of 3 classes –Entropy of the prediction scores –Ranks of 3 classes Use true label for training Output adjusted scores 2016/6/412

Score Ensemble Refer to the adjusted scores of CV Select best 2 classifiers for each task Average the rank of scores from 2 classifiers Output the averaged rank as final result 2016/6/413

Results AUC Results –Train: CV –Test: 10% testing 2016/6/414 Appetency improves most with post-processing

Other methods we have tried 2016/6/415 Rank logistic regression –Maximize AUC = maximize pair-wise ranking accuracy –Adopt pair-wise rank logistic regression –Not as good as other classifiers Tree-based composite classifier –Categorize examples using missing value pattern –Train a classifier for each of the 85 groups –Not significantly better than other classifiers

Conclusions Identify 2 challenges in data –Noisy → feature selection + missing value processing –Heterogeneous → numerical + categorical pre-processing Combine 3 classifiers to solve the challenges –Maximum Entropy → convert data into numerical –Heterogeneous AdaBoost → combine heterogeneous info –Selective Naïve Bayes → discover probabilistic relations Observe 2 properties from tasks –Training → model design and post-processing –10% Testing → overfitting prevention using CV 2016/6/416 Thanks !

Reference [ Boullé, 2007] M. Boullé. Compression-based averaging of selective naive bayes classifiers. Journal of Machine Learning Research, 8:1659–1685, [Hue and Boullé, 2007] C. Hue and M. Boullé. A new probabilistic approach in rank regression with optimal bayesian partitioning. Journal of Machine Learning Research, 8:2727–2754, /6/417