Implementing AdaBoost

Implementing AdaBoost
Jonathan Boardman

Boosting The General Idea
Kind of like a game of “Guess Who?” Each question on it’s own only provides limited insight, much like a weak learner Many questions taken together allows for stronger predictions Image credit:

AdaBoost Algorithm Overview
Choose how many iterations to run For each iteration: Train a classifier on a weighted sample (weights initially equal) to obtain a weak hypothesis Generate a strength for this hypothesis based on how well the learner did Reweight the sample Incorrectly classified observations get upweighted Correctly classified observations get downweighted Final classifier is a weighted sum of the weak hypotheses

The Classifier Support Vector Machine
svm.SVC from sklearn Used default settings RBF kernel Gamma equal to 1 / n_features C set to 1 Gamma is roughly the inverse of the “radius of influence” of a single training example C acts as a regularization parameter Lower C -> larger margin -> simpler decision function Higher C -> smaller margin -> more complex decision function Image credit:

The Dataset Credit Card Fraud
284,807 credit card transactions collected over a period of 3 days, but only 492 were fraudulent. 30 Predictor Variables: Time, Amount, and 28 principal components – ‘V1’ through ‘V28’ Target Variable: Class Binary Fraud (1) or Not Fraud (0)

Removing Class Imbalance Subsetting, Undersampleing, Shuffling the Data
Undersampled the majority class (non-fraud) Random Sample of 492 without replacement from observations with Class = 0 Concatenated the 492 fraud and the 492 sampled non-fraud observations together to create a balanced dataset. Shuffled the observations

Further Preprocessing
Drop all predictor fields except principal components V1 and V2 NOTE: In just these 2 dimensions, the data is not linearly separable Separate label and predictor data Apply z-score normalization to V1 and V2 Split the data into 5 disjoint folds

The Code

The Code (Cont.)

Results Boosting is Better than Lone SVM
Lone SVM 5-Fold CV Accuracy: 0.817 AdaBoost-ed SVM 5-Fold CV Accuracy: (18 iterations)

Implementing AdaBoost

Similar presentations

Presentation on theme: "Implementing AdaBoost"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Implementing AdaBoost

Similar presentations

Presentation on theme: "Implementing AdaBoost"— Presentation transcript:

Similar presentations

About project

Feedback