Download presentation
Presentation is loading. Please wait.
1
Implementing AdaBoost
Jonathan Boardman
2
Boosting The General Idea
Kind of like a game of “Guess Who?” Each question on it’s own only provides limited insight, much like a weak learner Many questions taken together allows for stronger predictions Image credit:
3
AdaBoost Algorithm Overview
Choose how many iterations to run For each iteration: Train a classifier on a weighted sample (weights initially equal) to obtain a weak hypothesis Generate a strength for this hypothesis based on how well the learner did Reweight the sample Incorrectly classified observations get upweighted Correctly classified observations get downweighted Final classifier is a weighted sum of the weak hypotheses
4
The Classifier Support Vector Machine
svm.SVC from sklearn Used default settings RBF kernel Gamma equal to 1 / n_features C set to 1 Gamma is roughly the inverse of the “radius of influence” of a single training example C acts as a regularization parameter Lower C -> larger margin -> simpler decision function Higher C -> smaller margin -> more complex decision function Image credit:
5
The Dataset Credit Card Fraud
284,807 credit card transactions collected over a period of 3 days, but only 492 were fraudulent. 30 Predictor Variables: Time, Amount, and 28 principal components – ‘V1’ through ‘V28’ Target Variable: Class Binary Fraud (1) or Not Fraud (0)
6
Removing Class Imbalance Subsetting, Undersampleing, Shuffling the Data
Undersampled the majority class (non-fraud) Random Sample of 492 without replacement from observations with Class = 0 Concatenated the 492 fraud and the 492 sampled non-fraud observations together to create a balanced dataset. Shuffled the observations
7
Further Preprocessing
Drop all predictor fields except principal components V1 and V2 NOTE: In just these 2 dimensions, the data is not linearly separable Separate label and predictor data Apply z-score normalization to V1 and V2 Split the data into 5 disjoint folds
8
The Code
9
The Code (Cont.)
10
Results Boosting is Better than Lone SVM
Lone SVM 5-Fold CV Accuracy: 0.817 AdaBoost-ed SVM 5-Fold CV Accuracy: (18 iterations)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.