Data Mining For Credit Card Fraud: A Comparative Study

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Neural networks Introduction Fitting neural networks
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
ONLINE ARABIC HANDWRITING RECOGNITION By George Kour Supervised by Dr. Raid Saabne.
Classification / Regression Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
An Overview of Machine Learning
Support Vector Machines
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
Sparse vs. Ensemble Approaches to Supervised Learning
A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Sparse vs. Ensemble Approaches to Supervised Learning
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
Sparse Bayesian Learning for Efficient Visual Tracking O. Williams, A. Blake & R. Cipolloa PAMI, Aug Presented by Yuting Qi Machine Learning Reading.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
A TUTORIAL ON SUPPORT VECTOR MACHINES FOR PATTERN RECOGNITION ASLI TAŞÇI Christopher J.C. Burges, Data Mining and Knowledge Discovery 2, , 1998.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Data Mining and Decision Support
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Unveiling Zeus Automated Classification of Malware Samples Abedelaziz Mohaisen Omar Alrawi Verisign Inc, VA, USA Verisign Labs, VA, USA
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
Trees, bagging, boosting, and stacking
COMP61011 : Machine Learning Ensemble Models
Basic machine learning background with Python scikit-learn
Schizophrenia Classification Using
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Pawan Lingras and Cory Butz
Support Vector Machines
CS 2750: Machine Learning Support Vector Machines
Implementing AdaBoost
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Support Vector Machine _ 2 (SVM)
Support Vector Machines and Kernels
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Linear Discrimination
Azure Machine Learning
Minimal Kernel Classifiers
Credit Card Fraudulent Transaction Detection
Machine Learning for Cyber
Presentation transcript:

Data Mining For Credit Card Fraud: A Comparative Study Xxxxxxxx DSCI 5240 | Dr. Nick Evangelopoulos Graduate Presentation

Overview Credit Card Fraud Data Mining Techniques Data Experimental Setup Results Graduate Presentation | DSCI 5240 | Xxxxxxx

Credit Card Fraud Two Types: Application Fraud Behavioral Fraud Obtain new cards using false information Behavioral Fraud Mail theft Stolen/lost card Counterfeit card Graduate Presentation | DSCI 5240 | Xxxxxxx

Credit Card Fraud Online Revenue loss due to Fraud (cybersource.com) Graduate Presentation | DSCI 5240 | Xxxxxxx

Data Mining Techniques Logistic Regression Used to predict outcome of categorical dependent variable Fraud variable is binary Support Vector Machines Random Forest Graduate Presentation | DSCI 5240 | Xxxxxxx

Support Vector Machines (SVM) Supervised learning models with associated learning algorithms that analyze and recognize patterns Linear classifiers that work in high dimensional feature space that is non-linear mapping of input space Two properties of SVM Kernel representation Margin optimization Graduate Presentation | DSCI 5240 | Xxxxxxx

Random Forest (RF) Ensemble of classification trees Performs well when individual members are dissimilar Graduate Presentation | DSCI 5240 | Xxxxxxx

Data: Datasets 13 Months of data (Jan 2006 – Jan 2007) 50 Million credit card transactions on 1 Million credit cards 2420 known fraudulent transactions with 506 credit cards Graduate Presentation | DSCI 5240 | Xxxxxxx

Percentage of Transaction by transaction type Graduate Presentation | DSCI 5240 | Xxxxxxx

Data Selection Graduate Presentation | DSCI 5240 | Xxxxxxx

Primary attributes in Dataset Graduate Presentation | DSCI 5240 | Xxxxxxx

Derived Attributes Graduate Presentation | DSCI 5240 | Xxxxxxx

Experimental Setup For SVM, Gaussian radial basis function was used as the kernel function For Random Forest, number of attributes considered at the node and number of trees was set. Data were sampled at different rates using random under sampling of majority class Graduate Presentation | DSCI 5240 | Xxxxxxx

Training and testing data Graduate Presentation | DSCI 5240 | Xxxxxxx

Results Graduate Presentation | DSCI 5240 | Xxxxxxx

Proportion of fraud captured at different depths Graduate Presentation | DSCI 5240 | Xxxxxxx

Fraud Capture Rate w/ Different Fraud Rates in Training Data Graduate Presentation | DSCI 5240 | Xxxxxxx

Conclusion Examine the performance of two data mining techniques SVM and RF together with logistic regression Used real life data set from Jan 2006 – Jan 2007 Used data undersampling approach to sample data Random forest showed much higher performance at upper file depths SVM performance at the upper file depths tended to increase with lower proportion of fraud in the training data Random forest demonstrated overall better performance Graduate Presentation | DSCI 5240 | Xxxxxxx

Questions Graduate Presentation | DSCI 5240 | Xxxxxxx