Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Similar presentations


Presentation on theme: "Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic."— Presentation transcript:

1 Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

2 Agenda Database Task goals Tool & technique used Data preparation and cleaning Attribute selection Data transformation Data Mining/Pattern EvaluationData Mining/Pattern Evaluation Knowledge presentation Pros/Cons Questions & Demonstration

3 Database Financial Dataset from PKDD 1999 Financial Dataset from a Czech Bank Relational Dataset 8 Relations –ACCOUNT - LOAN –DEMOGRAPH- ORDER –TRANSACTION- CARD –DISPOSITION- CLIENT

4 Task Goal Determine Good Client to offer some additional service Determine Bad Client to watch carefully to minimize bank loss Offer Services : –Loan –Credit Card

5 Technique Used - Histogram SQL Statement used SELECT age, COUNT(age) FROM table_x GROUP BY age ORDER BY age

6 Technique Used – C-Tables SQL Statement used SELECT sex, COUNT(sex), age FROM table_x a, table_y b WHERE a.id = b.fid GROUP BY sex, age ORDER BY sex, age

7 Technique Used – Correlation SQL Statement used SELECT x, y FROM table_x a, table_y b WHERE a.id = b.fid ORDER BY x, y

8 Tool - Architecture

9 Tool - Description

10 Data Cleaning Missing Value –Relation DEMOGRAPHIC Incorrect Values –Relation TRANSACTION (Data reduced by 10% after cleaning)

11 Data Preparation Relation CLIENT –Separating SEX & BDATE from BIRTHNUMBER All Date fields converted to AGE –Ref 199901.

12 Data Preparation Cont…. Creating Table definitions Setting up data in table compatible format Loading data into Database Evaluate loading errors and changing attribute definitions accordingly

13 Attribute Selection Decision Relation –LOAN Decision Attributes –STATUS Classification Attributes –All other attributes that do not belong to LOAN relation. A4? A6?A1? Class1Class2Class1Class2 YN YN NY

14 Data Transformation Discretization –Continuous attributes into 4 to 10 buckets Transactions performed in the year 1997 considered for relation TRANSACTION. –Due to resource limitations –Maximum loans were approved during this period TRANSFORM

15 Data Mining/Pattern Evaluation Run Histogram on all non-key attributes to study its distribution. Discretize continuous attributes. Run Contingency Table study the reference among two attributes. Check significance with Correlation function if both attributes are continuous.

16 Knowledge Presentation - 1 All loans on accounts where a second person is allowed to dispose are GOOD LOANS (100%)

17 Knowledge Presentation - 2 Permanent Orders of type household & leasing indicates financial stability

18 Knowledge Presentation - 3 Accounts with Cash withdrawals are more likely to repay their loans

19 Knowledge Presentation - 4 Accounts with low transaction amounts indicate good loans

20 Knowledge Presentation - 5 Accounts that are in debt indicates BAD LOAN

21 Pros Flexibility to alter data presentation to understand the nature of dataFlexibility to alter data presentation to understand the nature of data Customers with no background with datamining can appreciate the output results because of its simplicityCustomers with no background with datamining can appreciate the output results because of its simplicity Since there is a provision to store the results in a file, subsequent analysis on a subset of data becomes very easySince there is a provision to store the results in a file, subsequent analysis on a subset of data becomes very easy

22 Cons Needs capability for Multi-Variable analysis.Needs capability for Multi-Variable analysis. Some kind of quantification needs to be put in.Some kind of quantification needs to be put in. Performance issues with using RDBMS.Performance issues with using RDBMS.

23 Questions & Demonstration


Download ppt "Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic."

Similar presentations


Ads by Google