Presentation is loading. Please wait.

Presentation is loading. Please wait.

Talking Data Click Fraud Detection

Similar presentations


Presentation on theme: "Talking Data Click Fraud Detection"— Presentation transcript:

1 Talking Data Click Fraud Detection
Andrew Cudworth 04/23/18

2 FAKE! Introduction TalkingData Objective: Does Click = Download?
(70% of Chinese Mobile Devices) Chinese Data Service Company Builds IP blacklists Objective: Does Click = Download? Kaggle Data (184M Training rows 100k Sample for modeling) All Data is Anonymized ROC_AUC score FAKE! “3 billion clicks per day 90% potentially Fraudulent”

3 EDA – The Data! 100k Sample 187M Full Data 18.8M Predictions Score +
Rank MODEL Predict Apply Submit

4 ***100k training Sample Represented
EDA -What is Unique? Unique Count ip 34857 app 161 device 100 OS 130 Channel 2 OS make up 45% of traffic iOS? Android? ***100k training Sample Represented

5 EDA – Unique Continued

6 EDA- Data Imbalance 227 attributed values 100k total records
Very Unbalanced Data 227 attributed values 100k total records Null Accuracy Hard to Improve .778 null ROC_AUC with logistic Regression Room to Improve .5000 Kaggle Score if you submit all 0

7 Modeling Process Review Models Features/Transformations KNN
Decision Tree Logistic Regression Features/Transformations Time Included Up sample Down Sample Review

8 Modeling Results –Lots of choices Lots of Overfitting

9 Conclusions Further work
Null Score on Kaggle is .500 Selected Model (Random Forest GS) score .5122 Leader Board 1st place .9827 Further Investigation: Overfitting Appears to be a problem Spend more time tuning parameters Minimize train/test split delta Explore attribution time vs click time Relationships IP addresses in Test Data not in Sample Data Scale to Full Data


Download ppt "Talking Data Click Fraud Detection"

Similar presentations


Ads by Google