Fraud Detection Experiments Chase Credit Card –500,000 records spanning one year –Evenly distributed –20% fraud, 80% non fraud First Union Credit Card.

Slides:



Advertisements
Similar presentations
Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Shaw-hwa Lo Columbia University.
Advertisements

A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo.
A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore.
Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los.
DECISION TREES. Decision trees  One possible representation for hypotheses.
From Decision Trees To Rules
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Classification Algorithms
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Classification: Definition l Given a collection of records (training set) l Find a model.
Decision Tree.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Mining databases with different schema: Integrating incompatible classifiers Andreas L Prodromidis Salvatore Stolfo Dept of Computer Science Columbia University.
Indian Statistical Institute Kolkata
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification: Decision Trees, and Naïve Bayes etc. March 17, 2010 Adapted from Chapters 4 and 5 of the book Introduction to Data Mining by Tan, Steinbach,
Online Algorithms – II Amrinder Arora Permalink:
Decision Tree Rong Jin. Determine Milage Per Gallon.
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Feature Selection for Regression Problems
Basic Data Mining Techniques Chapter Decision Trees.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Robust Bayesian Classifier Presented by Chandrasekhar Jakkampudi.
Ensemble Learning (2), Tree and Forest
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Cost-Sensitive Bayesian Network algorithm Introduction: Machine learning algorithms are becoming an increasingly important area for research and application.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Chapter 9 – Classification and Regression Trees
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach,
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.
Evaluating Results of Learning Blaž Zupan
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
An Exercise in Machine Learning
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Data Mining and Decision Support
1 Illustration of the Classification Task: Learning Algorithm Model.
The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.
1 Study of Classification Models and Model Selection Measures based on Moment Analysis Amit Dhurandhar Amit Dhurandhar and and Alin Dobra Alin Dobra.
Classification Cheng Lei Department of Electrical and Computer Engineering University of Victoria April 24, 2015.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Supervise Learning. 2 What is learning? “Learning denotes changes in a system that... enable a system to do the same task more efficiently the next time.”
Privacy-Preserving Data Mining
Evaluating Classifiers
Data Mining – Algorithms: Instance-Based Learning
Application Level Fault Tolerance and Detection
Pfizer HTS Machine Learning Algorithms: November 2002
Bird-species Recognition Using Convolutional Neural Network
K Nearest Neighbor Classification
Discriminative Frequent Pattern Analysis for Effective Classification
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
COSC 4368 Intro Supervised Learning Organization
Presentation transcript:

Fraud Detection Experiments Chase Credit Card –500,000 records spanning one year –Evenly distributed –20% fraud, 80% non fraud First Union Credit Card –500,000 records spanning one year –Unevenly distributed –15% fraud, 85% non fraud

Intra-bank experiments Classifier Selection Algorithm: Coverage/TP-FP Let V be the validation set Until no other examples in V be covered –select the classifier with highest TP-FP rate on V –Remove covered examples from V Setting 12 subsets 5 algorithms (Bayes, C4.5, Cart, ID3, Ripper) 6-fold cross validation

TP-FP vs number of classifiers Input base classifiers: Chase Test data set: Chase Best meta classifier: Naïve Bayes with base classifiers.

TP-FP vs number of classifiers Input base classifiers: First Union Test data set: First Union Best meta classifier: Naïve Bayes with base classifiers.

Accuracy vs number of classifiers Input base classifiers: Chase Test data set: Chase Best meta classifier: Ripper with 50 base classifiers. Comparable performance is attained with classifiers

Accuracy vs number of classifiers Input base classifiers: First Union Test data set: First Union Best meta classifier: Ripper with 13 base classifiers.

Intra-bank experiments Coverage, cost model combined metric algorithm: –Let V be the validation set –Until no examples can be covered from V select classifier C j that achieves the highest savings on V Remove covered examples from V

Savings vs number of classifiers Input base classifiers: Chase Test data set: Chase Best Meta Classifiers: Single naïve bayesian base classifier (~ $820K)

Savings of base classifiers Input base classifiers: Chase Test data set: Chase Conclusion: Learning algorithms focus on binary classification problem. If base classifiers fail to detect expensive fraud, meta learning cannot improve savings.

Savings vs number of classifiers Input base classifiers: First Union Test data set: First Union Best Meta Classifiers: Naïve bayes with 22 base classifiers (~ $945K)

Savings of base classifiers Input base classifiers: First Union Test data set: First Union Conclusion: The majority of base classifiers are able to detect transactions that both fraudulent and expensive. Meta learning saves an additional $100K.

Different distributions experiments –Number of Datasites: 6 –Training sets: 50-50% Fraud/Non-Fraud –Testing sets: 20-80% Fraud/Non-Fraud –Base classifiers: ID3, CART –Meta classifiers: ID3, CART, Bayes, Ripper –Base classifiers: 81% TP, 29% FP –Meta-classifiers: 86% TP, 25% FP

Inter-bank experiments Chase includes 2 attributes not present in First Union data –Add two fictitious fields –Classifier agents support unknown values Chase and First Union define an attribute with different semantics –Project Chase values on First Union semantics

Inter-bank experiments: Input base classifiers: Chase Test data set: Chase and First Union Task: Compare TP and FP rates of a classifier on different test sets. Conclusion: Chase classifiers CAN be applied to First Union data, but not without penalty.

Inter-bank experiments: Input base classifiers: First Union Test data set: First Union and Chase Task: Compare TP and FP rates of a classifier on different test sets. Conclusion: First Union classifiers CAN be applied to Chase data, but not without penalty.

TP-FP vs number of classifiers: Input base classifiers: First Union and Chase Test data set: Chase Result: –Ripper, CART: comparable –Naïve bayes: slightly superior –C4.5, ID3: inferior

Accuracy vs number of classifiers: Input base classifiers: First Union and Chase Test data set: Chase Result: –CART, Ripper: comparable –Naïve Bayes, C4.5, ID3: inferior

TP-FP vs number of classifiers: Input base classifiers: First Union and Chase Test data set: First Union Result: –Naïve Bayes, C4.5, CART: comparable only when using all classifiers –Ripper: superior only when using all classifiers –ID3: inferior

Accuracy vs number of classifiers: Input base classifiers: First Union and Chase Test data set: First Union Result: –Naïve Bayes, C4.5, CART, Ripper: comparable only when using all classifiers –ID3: inferior

CHASE max fraud loss: $1,470K Overhead: $75

FU max fraud loss: $ 1,085K Overhead: $75

Aggregate Cost Model $X overhead to challenge a fraud

Experiment Set-up Training data set: 10/ /1996 Testing data set: 9/1996 Each data point is the average of the 10 classifiers (Oct to July 1996) Training set size: 6,400 transactions (to allow 90% of frauds)

Average Aggregate Cost (C4.5)

Accuracy (C4.5)

Average Aggregate Cost (CART)

Accuracy (CART)

Average Aggregate Cost (RIPPER)

Accuracy (RIPPER)

Average Aggregate Cost (BAYES)

Accuracy (BAYES)

Amount Saved: Overhead = $100 Fraud in training data: 30.00% Fraud in training data: 23.14% Maximum saving: $ 1337K Losses/transaction if no detection: 40.81

Do patterns change over time? Entire Chase credit card data set Original fraud rate (20% - 80%) Due to billing cycle and fraud investigation delays, training data are 2 months older than testing data Two experiments were conducted with different training data sets Test data set: 9/1996 (last month)

Training data sets Back in time experiment: –July 1996 –June + July 1996 –... –October July 1996 Forward in time experiment: –October 1995 –October November 1995 –… –October July 1996

Patterns don’t change: Accuracy

Patterns don’t change: Savings

Divide and Conquer Conflict Resolving Conflicts: Base level data with different class labels yet same predicted classifications.

Class-combiner meta-level training data

Prevalence of Conflicts in Meta-level Training Data Note: True Label: ID3:CART:RIPPER 1: fraud, 0: non-fraud

Divide and Conquer Conflict Resolving (cont’d) We divide the training sets into subsets of training data according to each conflict pattern For each subset, recursively apply divide- conquer until stopping criteria is met We use a rote table to learn meta-level training data

Experiment Set-up A full year’s Chase credit card data. Natural fraud percentage (20%). Fields not available at authorization were removed Each month from Oct to July 1996 was used as a training set Testing set was chosen from month that is 2 months older. In real world, it takes 2 month for billing and fraud investigation Result was averages of 10 runs

Results Without Conflict Resolving Technique but Uses Rote Table to learn Meta-level Data: –Overall Accuracy: 88.8% –True Positive: 59.8% –False Positive: 3.81% With Conflict Resolving Technique: –Overall Accuracy: 89.1% (increase of 0.3%) –True Positive: 61.2% (increase of 1.4% ) –False Positive: 3.88% (increase of 0.07%)

Achievable Maximum Accuracy Using nearest neighbor approach to estimate a loose upper bound of the maximum accuracy we can achieve The algorithm calculates the percentage of noise in training data Approximately: 91.0%...so the we are 1.9% close to the maximum accuracy

Accuracy Result