Download presentation
Presentation is loading. Please wait.
1
Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic
2
Agenda Database Task goals Tool & technique used Data preparation and cleaning Attribute selection Data transformation Data Mining/Pattern EvaluationData Mining/Pattern Evaluation Knowledge presentation Pros/Cons Questions & Demonstration
3
Database Financial Dataset from PKDD 1999 Financial Dataset from a Czech Bank Relational Dataset 8 Relations –ACCOUNT - LOAN –DEMOGRAPH- ORDER –TRANSACTION- CARD –DISPOSITION- CLIENT
4
Task Goal Determine Good Client to offer some additional service Determine Bad Client to watch carefully to minimize bank loss Offer Services : –Loan –Credit Card
5
Technique Used - Histogram SQL Statement used SELECT age, COUNT(age) FROM table_x GROUP BY age ORDER BY age
6
Technique Used – C-Tables SQL Statement used SELECT sex, COUNT(sex), age FROM table_x a, table_y b WHERE a.id = b.fid GROUP BY sex, age ORDER BY sex, age
7
Technique Used – Correlation SQL Statement used SELECT x, y FROM table_x a, table_y b WHERE a.id = b.fid ORDER BY x, y
8
Tool - Architecture
9
Tool - Description
10
Data Cleaning Missing Value –Relation DEMOGRAPHIC Incorrect Values –Relation TRANSACTION (Data reduced by 10% after cleaning)
11
Data Preparation Relation CLIENT –Separating SEX & BDATE from BIRTHNUMBER All Date fields converted to AGE –Ref 199901.
12
Data Preparation Cont…. Creating Table definitions Setting up data in table compatible format Loading data into Database Evaluate loading errors and changing attribute definitions accordingly
13
Attribute Selection Decision Relation –LOAN Decision Attributes –STATUS Classification Attributes –All other attributes that do not belong to LOAN relation. A4? A6?A1? Class1Class2Class1Class2 YN YN NY
14
Data Transformation Discretization –Continuous attributes into 4 to 10 buckets Transactions performed in the year 1997 considered for relation TRANSACTION. –Due to resource limitations –Maximum loans were approved during this period TRANSFORM
15
Data Mining/Pattern Evaluation Run Histogram on all non-key attributes to study its distribution. Discretize continuous attributes. Run Contingency Table study the reference among two attributes. Check significance with Correlation function if both attributes are continuous.
16
Knowledge Presentation - 1 All loans on accounts where a second person is allowed to dispose are GOOD LOANS (100%)
17
Knowledge Presentation - 2 Permanent Orders of type household & leasing indicates financial stability
18
Knowledge Presentation - 3 Accounts with Cash withdrawals are more likely to repay their loans
19
Knowledge Presentation - 4 Accounts with low transaction amounts indicate good loans
20
Knowledge Presentation - 5 Accounts that are in debt indicates BAD LOAN
21
Pros Flexibility to alter data presentation to understand the nature of dataFlexibility to alter data presentation to understand the nature of data Customers with no background with datamining can appreciate the output results because of its simplicityCustomers with no background with datamining can appreciate the output results because of its simplicity Since there is a provision to store the results in a file, subsequent analysis on a subset of data becomes very easySince there is a provision to store the results in a file, subsequent analysis on a subset of data becomes very easy
22
Cons Needs capability for Multi-Variable analysis.Needs capability for Multi-Variable analysis. Some kind of quantification needs to be put in.Some kind of quantification needs to be put in. Performance issues with using RDBMS.Performance issues with using RDBMS.
23
Questions & Demonstration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.