Download presentation
Presentation is loading. Please wait.
1
Credit Card Applicants’ Credibility Prediction with Decision Tree n Dan Xiao n Jerry Yang
2
Agenda n Project Goal n Project Domains n Project Method n Implementation n Conclusions n Experience
3
Project Goal n The objective of this project is to build a decision tree to predict the classification (good or bad) of credit card applicants.
4
Project Domains n There are around 4000 records n There are 15 attributes –Credit_Card_Debt –Highest_Credit_Card_APR –Monthly_Car_Pmt –Monthly_Income –Monthly_Mortage –Martial_Status
5
Project Domains n Attributes continued –No_of_Credit_Cards –Year_of_Employment –Citizenship –Home_Ownership –Accounts –Sex –Race –Results
6
Project Method n Classification and Prediction –Construct a model (Decision Tree) with the training data –Apply the testing data to the model (Decision Tree) to predict the applicants’ credibility
7
Implementation n Software WEKA Classifier Filter Learning Schemes - ZeroR -oneR -M5 -J48
8
Implementation (Continued) n Relevance Analysis n Data Cleaning –File conversion –Missing data –Outlier
9
Implementation (Continued) n Testing Dataset –File conversion –Data testing Percentage split Supplied test set Cross-validation
10
Percentage split Split Ratio Confidence Number Of Leaves Number of Nodes Correct Classification 25%0.25305494.77% 25%0.153594.77%
11
Supplied Data test ConfidenceNumber of LeavesNumber of NodesCorrect Classification 0.153594.67% 0.25305496.08%
12
Cross-validation Fold Number ConfidenceNumber of Leaves Number of Nodes Correct Classification 50.153594.28% 100.153594.21% 150.153594.46%
13
Cross-validation (Continued) Fold Number ConfidenceNumber of Leaves Number of Nodes Correct Classification 50.25305494.28% 100.25305494.21% 150.25305494.46%
14
Conclusions n We are satisfied with the accuracy (correct classification) n Cross-validation is good for use when the dataset is small. n The pruned tree models have huge difference with confidence 0.15 and 0.25
15
Future Work n Entropy-based discretization can reduce Overfitting n Accuracy is not satisfied with big dataset n Data bias can’t be avoid completely
16
Experience n Original dataset n New dataset n Results
17
Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.