Talking Data Click Fraud Detection

Slides:

Advertisements

Similar presentations

Portable Device Operating Systems. Portable Device OS Portable devices use scaled down operating systems, which are smaller than those found in notebook.

Advertisements

Indian Statistical Institute Kolkata

1. Abstract 2 Introduction Related Work Conclusion References.

Semi-Supervised Learning Using Randomized Mincuts Avrim Blum, John Lafferty, Raja Reddy, Mugizi Rwebangira.

Stat 217 – Week 10. Outline Exam 2 Lab 7 Questions on Chi-square, ANOVA, Regression  HW 7  Lab 8 Notes for Thursday’s lab Notes for final exam Notes.

1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.

WHO WE ARE ●Website Development & Design ●Web Marketing Strategy, Training, and Analysis ●Web Applications, iOS apps, Android apps.

CHURN PREDICTION MODEL IN RETAIL BANKING USING FUZZY C- MEANS CLUSTERING Džulijana Popović Consumer Finance, Zagrebačka banka d.d. Consumer Finance, Zagrebačka.

Who would be a good loanee? Zheyun Feng 7/17/2015.

Enterprise systems infrastructure and architecture DT211 4

Jay Stokes, Microsoft Research John Platt, Microsoft Research Joseph Kravis, Microsoft Network Security Michael Shilman, ChatterPop, Inc. ALADIN: Active.

Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.

NFL Play Predictions Will Burton, NCSU Industrial Engineering 2015

Use data-driven app marketing to get your app to rank #1 in the App Store and increase ROI.

Scaling up Decision Trees. Decision tree learning.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

BAGGING ALGORITHM, ONLINE BOOSTING AND VISION Se – Hoon Park.

Data Mining – Best Practices Part #2 Richard Derrig, PhD, Opal Consulting LLC CAS Spring Meeting June 16-18, 2008.

SBVC and CHC Mobile Apps PRESENTATION TO THE BOARD OF TRUSTEES.

CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.

Predicting Good Probabilities With Supervised Learning

Nurissaidah Ulinnuha. Introduction Student academic performance ( ) Logistic RegressionNaïve Bayessian Artificial Neural Network Student Academic.

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

OESAI COMPREHENSIVE LIFE INSURANCE TECHNICAL TRAINING.

ACT VS GPA(DUN DUN DDDUUUUUUNNNNNN) Jade Lonyae Vinson & Brandon Terrell Johnson.

Konstantina Christakopoulou Liang Zeng Group G21

Kaggle Competition Prudential Life Insurance Assessment

Using Classification Trees to Decide News Popularity

Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

CS 189 Brian Chu Slides at: brianchu.com/ml/ brianchu.com/ml/ Office Hours: Cory 246, 6-7p Mon. (hackerspace lounge)

Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.

Borja Sanz, Igor Santos, Carlos Laorden, Xabier Ugarte-Pedrero and Pablo Garcia Bringas The 9th Annual IEEE Consumer Communications and Networking Conference.

A Framework for Detection and Measurement of Phishing Attacks Reporter: Li, Fong Ruei National Taiwan University of Science and Technology 2/25/2016 Slide.

Kaggle Competition Rossmann Store Sales.

PREDICTING SONG HOTNESS

Predicting Mortgage Pre-payment Risk. Introduction Definition Borrower pays off the loan before the contracted term loan length. Lender loses future part.

Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.

GST Helpline - A Complete GST App TO RESOLVE GST INDIA QUERIES

Elizabeth R McMahon 14 April 2017

Sentiment Analysis of Twitter Messages Using Word2Vec

An Empirical Comparison of Supervised Learning Algorithms

Predict House Sales Price

Android Mobile apps development services company in India

Project 1 – Twitter Slang Term Extraction

AliExpress: An opportunity for Central and Eastern Europe

Employee Turnover: Data Analysis and Exploration

Transportation Mode Recognition using Smartphone Sensor Data

Researching social media

Data Mining Classification: Alternative Techniques

Components of Experiments

Machine Learning to Predict Experimental Protein-Ligand Complexes

STAT 689 Class Project STAT 689 Class Project

iSRD Spam Review Detection with Imbalanced Data Distributions

Opening Weka Select Weka from Start Menu Select Explorer Fall 2003

Benefits and Wellness – MDLIVE

CSCI N317 Computation for Scientific Applications Unit Weka

Lecture 06: Bagging and Boosting

Analysis for Predicting the Selling Price of Apartments Pratik Nikte

Predicting Loan Defaults

Grading Assignments in Google Classroom

March Madness Data Crunch Overview

Credit Card Fraudulent Transaction Detection

BVM Web Solutions is a Leading Website and Mobile App Development Company Offering best Ecommerce website and app development services for Android and.

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Outlines Introduction & Objectives Methodology & Workflow

Lecturer: Geoff Hulten TAs: Alon Milchgrub, Andrew Wei

An introduction to Machine Learning (ML)

Presentation transcript:

Talking Data Click Fraud Detection Andrew Cudworth 04/23/18

FAKE! Introduction TalkingData Objective: Does Click = Download? (70% of Chinese Mobile Devices) Chinese Data Service Company Builds IP blacklists Objective: Does Click = Download? Kaggle Data (184M Training rows 100k Sample for modeling) All Data is Anonymized ROC_AUC score FAKE! “3 billion clicks per day 90% potentially Fraudulent”

EDA – The Data! 100k Sample 187M Full Data 18.8M Predictions Score + Rank MODEL Predict Apply Submit

***100k training Sample Represented EDA -What is Unique? Unique Count ip 34857 app 161 device 100 OS 130 Channel 2 OS make up 45% of traffic iOS? Android? ***100k training Sample Represented

EDA – Unique Continued

EDA- Data Imbalance 227 attributed values 100k total records Very Unbalanced Data 227 attributed values 100k total records .99760003495 Null Accuracy Hard to Improve .778 null ROC_AUC with logistic Regression Room to Improve .5000 Kaggle Score if you submit all 0

Modeling Process Review Models Features/Transformations KNN Decision Tree Logistic Regression Features/Transformations Time Included Up sample Down Sample Review

Modeling Results –Lots of choices Lots of Overfitting

Conclusions Further work Null Score on Kaggle is .500 Selected Model (Random Forest GS) score .5122 Leader Board 1st place .9827 Further Investigation: Overfitting Appears to be a problem Spend more time tuning parameters Minimize train/test split delta Explore attribution time vs click time Relationships IP addresses in Test Data not in Sample Data Scale to Full Data