Credit Card Fraudulent Transaction Detection

Slides:



Advertisements
Similar presentations
Data Mining For Credit Card Fraud: A Comparative Study
Advertisements

Godfather to the Singularity
Finance, meet Big Data..
Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.
1. Abstract 2 Introduction Related Work Conclusion References.
Data Mining By Archana Ketkar.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data Mining Chun-Hung Chou
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Predicting Income from Census Data using Multiple Classifiers Presented By: Arghya Kusum Das Arnab Ganguly Manohar Karki Saikat Basu Subhajit Sidhanta.
Active Learning for Class Imbalance Problem
An Example of Course Project Face Identification.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Ensemble Methods: Bagging and Boosting
Introduction Use machine learning and various classifying techniques to be able to create an algorithm that can decipher between spam and ham s. .
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Applying Support Vector Machines to Imbalanced Datasets Authors: Rehan Akbani, Stephen Kwek (University of Texas at San Antonio, USA) Nathalie Japkowicz.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Collage Score Card & Software defect prediction
Classify A to Z Problem Statement Technical Approach Results Dataset
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
A Smart Tool to Predict Salary Trends of H1-B Holders
Restaurant Revenue Prediction using Machine Learning Algorithms
Machine Learning with Spark MLlib
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
Introduction Characteristics Advantages Limitations
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Machine Learning for Safer Roads
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Predict House Sales Price
Waikato Environment for Knowledge Analysis
Machine Learning Training Bootcamp
Mitchell Kossoris, Catelyn Scholl, Zhi Zheng
Dr. Morgan C. Wang Department of Statistics
I don’t need a title slide for a lecture
Machine Learning with Weka
Predicting Pneumonia & MRSA in Hospital Patients
iSRD Spam Review Detection with Imbalanced Data Distributions
Implementing AdaBoost
PROJECTS SUMMARY PRESNETED BY HARISH KUMAR JANUARY 10,2018.
Classification of highly unbalanced data using deep learning techniques
CSCI N317 Computation for Scientific Applications Unit Weka
Course Introduction CSC 576: Data Mining.
By – Amey Gangal Ganga Charan Gopisetty Rakesh Sangameswaran.
Integrating Deep Learning with Cyber Forensics
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Somi Jacob and Christian Bach
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Overview of deep learning
MTBI Personality Predictor using ML
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Predicting Loan Defaults
Elena Mikhalkova, Nadezhda Ganzherli, Yuri Karyakin, Dmitriy Grigoryev
Outlines Introduction & Objectives Methodology & Workflow
An introduction to Machine Learning (ML)
Presentation transcript:

Credit Card Fraudulent Transaction Detection As a part of CSC 219: Final Project Presentation Team Members: (Group #10) - Darshit Pandya - Sreeteja K. Guided By: - Dr. Meiliu Lu

Abstract Financial fraud is a developing threat with many consequences in the finance industries, corporate companies and government organizations. From many criminal activities occurring in the financial industry, credit card fraudulent activities are the most prevalent. It is important for the credit card companies to be able to detect the fraud transactions so that the customers won’t get charged for the items they did not purchase.

Why important? The credit card fraud detection becomes challenging for the following reasons: The profiles of the genuine users and fraudulent behaviors change constantly. Rate of online transactions have grown exponentially The credit card fraud data sets are highly skewed. Detecting fraudulent transactions using traditional method of manual detection is time consuming and inefficient Hence, it is necessary to develop a credit card fraud detection technique as a counter measure to fight illegal activities.

What will we be doing? In this term project, we will try to analyze 280k transactions with different attributes. (The name of the attributes are kept secret as to maintain the privacy of the user data) Itinerary: Analyze the correlation between attributes. Analyze the effect of attributes’ values on target Feature Engineering Balancing/Sampling the skewed dataset Application of the machine learning algorithms Trying Deep Neural Nets Comparing the models designed Improvisation techniques

How the data looks like? The original data has 280K instances and 33 attributes The Class attribute identifies transaction as Fraud[1] or Normal[0]. The distribution of data is as: HIGHLY SKEWED - WE KNOW!!!!!

Step 1:Data Visualization The original data has 225k instances and 33 attributes In this step, we have tried to visualize in the data by finding Cor-relation Target Value Impact Distribution of the attribute values Density distribution Outliers visualization

Step 2: Data preprocessing For data-preprocessing step, we performed Missing Values check and removal if any Remove unnecessary features Remove the outliers Scale the values of attributes like Time and Amount

Step 3: Application of Naïve-ML Algorithms Considering the unbalanced dataset, we will try to apply the naive machine learning algorithms like Logistic Regression K-Nearest Neighbors Support Vector Machine Decision Tree Random Forest GridSearchCV For model evaluation, we will try to evaluate the model using Confusion Matrix and F1-Score

Step 4: Deep Neural Networks We have tried to use a Dense Neural Network which is originally titled as 'Artificial Neural Network' using Keras Framework. For this approach, we have only used unbalanced dataset. We have only used the Dense Layers in our approach by applying several optimizers like Adam and varying number of neurons in the complete layer.

Step 5: Data Balancing As the data is unbalanced, the predictions are tend to be biased Naïve Machine Learning Algorithms are tend to get impacted by skewed data For doing random sampling, below equation has been implemented. value_count=Minimum Dist Value+((total_count_cat/minumum_dist_value∗2)−2)

Step 5: Contd… The balanced dataset looks as below:

Project Demo We will demo a notebook created on Google Colab with all minute details implemented in the project. Let’s GO!

Conclusion Data Imbalance can cause bias in the prediction SVM, K-NN and Random Forest performs comparatively better The Best F1-Score(the parameter of evaluation) was received using Random Forest 0.99 Applying Dense Neural Network to the dataset will help in case of unbalanced dataset too Applying data sampling techniques can help to remove the bias in the prediction.

THANK YOU!