Download presentation
Presentation is loading. Please wait.
1
Fraud Detection Experiments Chase Credit Card –500,000 records spanning one year –Evenly distributed –20% fraud, 80% non fraud First Union Credit Card –500,000 records spanning one year –Unevenly distributed –15% fraud, 85% non fraud
2
Intra-bank experiments Classifier Selection Algorithm: Coverage/TP-FP Let V be the validation set Until no other examples in V be covered –select the classifier with highest TP-FP rate on V –Remove covered examples from V Setting 12 subsets 5 algorithms (Bayes, C4.5, Cart, ID3, Ripper) 6-fold cross validation
3
TP-FP vs number of classifiers Input base classifiers: Chase Test data set: Chase Best meta classifier: Naïve Bayes with 25-32 base classifiers.
4
TP-FP vs number of classifiers Input base classifiers: First Union Test data set: First Union Best meta classifier: Naïve Bayes with 10-17 base classifiers.
5
Accuracy vs number of classifiers Input base classifiers: Chase Test data set: Chase Best meta classifier: Ripper with 50 base classifiers. Comparable performance is attained with 25-30 classifiers
6
Accuracy vs number of classifiers Input base classifiers: First Union Test data set: First Union Best meta classifier: Ripper with 13 base classifiers.
7
Intra-bank experiments Coverage, cost model combined metric algorithm: –Let V be the validation set –Until no examples can be covered from V select classifier C j that achieves the highest savings on V Remove covered examples from V
8
Savings vs number of classifiers Input base classifiers: Chase Test data set: Chase Best Meta Classifiers: Single naïve bayesian base classifier (~ $820K)
9
Savings of base classifiers Input base classifiers: Chase Test data set: Chase Conclusion: Learning algorithms focus on binary classification problem. If base classifiers fail to detect expensive fraud, meta learning cannot improve savings.
10
Savings vs number of classifiers Input base classifiers: First Union Test data set: First Union Best Meta Classifiers: Naïve bayes with 22 base classifiers (~ $945K)
11
Savings of base classifiers Input base classifiers: First Union Test data set: First Union Conclusion: The majority of base classifiers are able to detect transactions that both fraudulent and expensive. Meta learning saves an additional $100K.
12
Different distributions experiments –Number of Datasites: 6 –Training sets: 50-50% Fraud/Non-Fraud –Testing sets: 20-80% Fraud/Non-Fraud –Base classifiers: ID3, CART –Meta classifiers: ID3, CART, Bayes, Ripper –Base classifiers: 81% TP, 29% FP –Meta-classifiers: 86% TP, 25% FP
13
Inter-bank experiments Chase includes 2 attributes not present in First Union data –Add two fictitious fields –Classifier agents support unknown values Chase and First Union define an attribute with different semantics –Project Chase values on First Union semantics
14
Inter-bank experiments: Input base classifiers: Chase Test data set: Chase and First Union Task: Compare TP and FP rates of a classifier on different test sets. Conclusion: Chase classifiers CAN be applied to First Union data, but not without penalty.
15
Inter-bank experiments: Input base classifiers: First Union Test data set: First Union and Chase Task: Compare TP and FP rates of a classifier on different test sets. Conclusion: First Union classifiers CAN be applied to Chase data, but not without penalty.
16
TP-FP vs number of classifiers: Input base classifiers: First Union and Chase Test data set: Chase Result: –Ripper, CART: comparable –Naïve bayes: slightly superior –C4.5, ID3: inferior
17
Accuracy vs number of classifiers: Input base classifiers: First Union and Chase Test data set: Chase Result: –CART, Ripper: comparable –Naïve Bayes, C4.5, ID3: inferior
18
TP-FP vs number of classifiers: Input base classifiers: First Union and Chase Test data set: First Union Result: –Naïve Bayes, C4.5, CART: comparable only when using all classifiers –Ripper: superior only when using all classifiers –ID3: inferior
19
Accuracy vs number of classifiers: Input base classifiers: First Union and Chase Test data set: First Union Result: –Naïve Bayes, C4.5, CART, Ripper: comparable only when using all classifiers –ID3: inferior
20
CHASE max fraud loss: $1,470K Overhead: $75
21
FU max fraud loss: $ 1,085K Overhead: $75
22
Aggregate Cost Model $X overhead to challenge a fraud
23
Experiment Set-up Training data set: 10/1995 - 7/1996 Testing data set: 9/1996 Each data point is the average of the 10 classifiers (Oct. 1995 to July 1996) Training set size: 6,400 transactions (to allow 90% of frauds)
24
Average Aggregate Cost (C4.5)
25
Accuracy (C4.5)
26
Average Aggregate Cost (CART)
27
Accuracy (CART)
28
Average Aggregate Cost (RIPPER)
29
Accuracy (RIPPER)
30
Average Aggregate Cost (BAYES)
31
Accuracy (BAYES)
32
Amount Saved: Overhead = $100 Fraud in training data: 30.00% Fraud in training data: 23.14% Maximum saving: $ 1337K Losses/transaction if no detection: 40.81
33
Do patterns change over time? Entire Chase credit card data set Original fraud rate (20% - 80%) Due to billing cycle and fraud investigation delays, training data are 2 months older than testing data Two experiments were conducted with different training data sets Test data set: 9/1996 (last month)
34
Training data sets Back in time experiment: –July 1996 –June + July 1996 –... –October 1995 +... + July 1996 Forward in time experiment: –October 1995 –October 1995 + November 1995 –… –October 1995 +... + July 1996
35
Patterns don’t change: Accuracy
36
Patterns don’t change: Savings
37
Divide and Conquer Conflict Resolving Conflicts: Base level data with different class labels yet same predicted classifications.
38
Class-combiner meta-level training data
39
Prevalence of Conflicts in Meta-level Training Data Note: True Label: ID3:CART:RIPPER 1: fraud, 0: non-fraud
40
Divide and Conquer Conflict Resolving (cont’d) We divide the training sets into subsets of training data according to each conflict pattern For each subset, recursively apply divide- conquer until stopping criteria is met We use a rote table to learn meta-level training data
41
Experiment Set-up A full year’s Chase credit card data. Natural fraud percentage (20%). Fields not available at authorization were removed Each month from Oct. 1995 to July 1996 was used as a training set Testing set was chosen from month that is 2 months older. In real world, it takes 2 month for billing and fraud investigation Result was averages of 10 runs
42
Results Without Conflict Resolving Technique but Uses Rote Table to learn Meta-level Data: –Overall Accuracy: 88.8% –True Positive: 59.8% –False Positive: 3.81% With Conflict Resolving Technique: –Overall Accuracy: 89.1% (increase of 0.3%) –True Positive: 61.2% (increase of 1.4% ) –False Positive: 3.88% (increase of 0.07%)
43
Achievable Maximum Accuracy Using nearest neighbor approach to estimate a loose upper bound of the maximum accuracy we can achieve The algorithm calculates the percentage of noise in training data Approximately: 91.0%...so the we are 1.9% close to the maximum accuracy
44
Accuracy Result
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.