Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.

Slides:



Advertisements
Similar presentations
Project level information Structure of IATI XML file Includes: Activity identifier (project id) Reporting organization Participating organization Activity.
Advertisements

The General Ledger 4 Post to the ledger The 4 th step of the accounting cycle is to post to the ledger. 4 Post to the ledger The General Ledger is a book.
MAKE LOAN REPAYMENTS ON A CLIENT LOAN ACCOUNT. 2 1.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Telecom Analytics – by Arindam Guptaray. Few words about me... B. TECH FROM IIT KHARAGPUR. MBA (FINANCE) FROM UNIV. OF MINNESOTA, CARLSON SCHOOL. HAVE.
McGraw-Hill © 2008 The McGraw-Hill Companies, Inc. All rights reserved. Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES.
Collecting and Reporting Accounting Information Design of an effective AIS begins by considering outputs from the system. Outputs of an AIS include: 1.
Money Management Strategies
Data Mining: A Closer Look
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
 1.1: Introduction  1.2: Descriptions  1.2.1: White wine description  1.2.2: Brest Tissue description  1.3: Conclusion.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Exploring Formulas.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Using Credit. Terms to know Credit Creditor Revolving Charge Account Installment Account Vehicle leasing Cash loan Collateral Cosigner Home equity loan.
More on Data Mining KDnuggets Datanami ACM SIGKDD
Banking and Finance Business and Computer Science Mr. Dukes.
Università degli studi di Pavia Facoltà di Economia a.a Lesson 6 International Accounting Lelio Bigogno, Stefano Santucci 1.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
Computers Are Your Future Tenth Edition Chapter 12: Databases & Information Systems Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
Data Warehousing Seminar Chapter 5. Data Warehouse Design Methodology Data Warehousing Lab. HyeYoung Cho.
The CRISP-DM Process Model
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Page Up or Down to navigate through the program.
Hive : A Petabyte Scale Data Warehouse Using Hadoop
IE 423 – Design of Decision Support Systems Data modeling and database development.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Building Bucks Basic Financial Services. Financial Institutions 3 Main Types – Banks – Credit Unions – Savings and Loan Associations (S&L) Advantages.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Visual, Interactive Data Mining with InfoZoom – the Financial Data Set Michael Spenke Christian Beilken.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.
Financial Management Back to Table of Contents. Financial Management 2 Chapter 21 Financial Management Analyzing Your Finances Managing Your Finances.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Introduction to Oracle Chapter 1. 2 Before Databases Information was kept in files: Each field describes one piece of information about student Fields.
DATABASES AND DATA WAREHOUSES
Part Ten Other Audit. Structure of Seminar 1. Cash 2. Bank 3. Non-current assets 4. Non-current liabilities.
CISB113 Fundamentals of Information Systems Data Management.
Management Information Systems, 4 th Edition 1 Chapter 8 Data and Knowledge Management.
Cost of Credit 2015 Educurious Partners--All rights reserved UNIT 3 LESSON 8.
PKDD Discovery Challenge (not only) on Financial Data Petr Berka Laboratory for Intelligent Systems University of Economics, Prague
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Data Mining and Decision Support
1 10/15/04CS150 Introduction to Computer Science 1 Reading from and Writing to Files Part 2.
What is a Credit Card? A credit card is a plastic card issued by a financial company that allows clients to borrow money from a bank and have it billed.
Chapter 4 Partnership 2 Partnership change Two Possibilities for Partnership Changes Expansion ---- Change of profit-share ratio ----Admission of new.
Accounting for Sales and Accounts Receivable Section 3: Special Topics in Merchandising Chapter 7 Section Objectives 7. Compute trade discounts.
Credit  When goods, services, and/or money is received in exchange for a promise to pay back a definite sum of money at a futre date.  Lender: the person.
New Payday Loan Lenders and What They Can Offer There can always be times when people need money and this can be down to so many different reasons. When.
Copyright  2007 McGraw-Hill Pty Ltd PPTs t/a Marketing Research 2e by Lukas, Hair, Bush and Ortinau Slides prepared by Judy Rex 19-1 Chapter Nineteen.
A2 - 1 Analysis of Cash Flows 3 3 Chapter. A2 - 2 Chapter Objectives Describe the three components of a cash flow statement Distinguish between direct.
Data Mining and Data Warehousing: Concepts and Techniques What is a Data Warehouse? Data Warehouse vs. other systems, OLTP vs. OLAP Conceptual Modeling.
Cash Flow Statement Mr.Singh.
Data Mining.
Information Systems Lecture Series
Data warehouse and OLAP
Because, Excellence is my Style!
MIS 451 Building Business Intelligence Systems
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Cost of Credit 2015 Educurious Partners--All rights reserved UNIT 3 LESSON 8.
**Round to the nearest dollar**
PKDD Discovery Challenge (not only) on Financial Data
Take Risks but Calculated!!!
Banking What is a bank? Banks are financial institutions that accept deposits from customers and lend money. These entities are for profit. What is a.
Today we will: balance a checkbook So we can: keep track of our money
General Money Management & Personal Savings and Investments
Presentation transcript:

Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic

Agenda Database Task goals Tool & technique used Data preparation and cleaning Attribute selection Data transformation Data Mining/Pattern EvaluationData Mining/Pattern Evaluation Knowledge presentation Pros/Cons Questions & Demonstration

Database Financial Dataset from PKDD 1999 Financial Dataset from a Czech Bank Relational Dataset 8 Relations –ACCOUNT - LOAN –DEMOGRAPH- ORDER –TRANSACTION- CARD –DISPOSITION- CLIENT

Task Goal Determine Good Client to offer some additional service Determine Bad Client to watch carefully to minimize bank loss Offer Services : –Loan –Credit Card

Technique Used - Histogram SQL Statement used SELECT age, COUNT(age) FROM table_x GROUP BY age ORDER BY age

Technique Used – C-Tables SQL Statement used SELECT sex, COUNT(sex), age FROM table_x a, table_y b WHERE a.id = b.fid GROUP BY sex, age ORDER BY sex, age

Technique Used – Correlation SQL Statement used SELECT x, y FROM table_x a, table_y b WHERE a.id = b.fid ORDER BY x, y

Tool - Architecture

Tool - Description

Data Cleaning Missing Value –Relation DEMOGRAPHIC Incorrect Values –Relation TRANSACTION (Data reduced by 10% after cleaning)

Data Preparation Relation CLIENT –Separating SEX & BDATE from BIRTHNUMBER All Date fields converted to AGE –Ref

Data Preparation Cont…. Creating Table definitions Setting up data in table compatible format Loading data into Database Evaluate loading errors and changing attribute definitions accordingly

Attribute Selection Decision Relation –LOAN Decision Attributes –STATUS Classification Attributes –All other attributes that do not belong to LOAN relation. A4? A6?A1? Class1Class2Class1Class2 YN YN NY

Data Transformation Discretization –Continuous attributes into 4 to 10 buckets Transactions performed in the year 1997 considered for relation TRANSACTION. –Due to resource limitations –Maximum loans were approved during this period TRANSFORM

Data Mining/Pattern Evaluation Run Histogram on all non-key attributes to study its distribution. Discretize continuous attributes. Run Contingency Table study the reference among two attributes. Check significance with Correlation function if both attributes are continuous.

Knowledge Presentation - 1 All loans on accounts where a second person is allowed to dispose are GOOD LOANS (100%)

Knowledge Presentation - 2 Permanent Orders of type household & leasing indicates financial stability

Knowledge Presentation - 3 Accounts with Cash withdrawals are more likely to repay their loans

Knowledge Presentation - 4 Accounts with low transaction amounts indicate good loans

Knowledge Presentation - 5 Accounts that are in debt indicates BAD LOAN

Pros Flexibility to alter data presentation to understand the nature of dataFlexibility to alter data presentation to understand the nature of data Customers with no background with datamining can appreciate the output results because of its simplicityCustomers with no background with datamining can appreciate the output results because of its simplicity Since there is a provision to store the results in a file, subsequent analysis on a subset of data becomes very easySince there is a provision to store the results in a file, subsequent analysis on a subset of data becomes very easy

Cons Needs capability for Multi-Variable analysis.Needs capability for Multi-Variable analysis. Some kind of quantification needs to be put in.Some kind of quantification needs to be put in. Performance issues with using RDBMS.Performance issues with using RDBMS.

Questions & Demonstration