PKDD Discovery Challenge (not only) on Financial Data

Slides:



Advertisements
Similar presentations
EuropeAid PARTICIPATORY SESSION 2: Managing contract/Managing project… Question 1 : What do you think are the expectations and concerns of the EC task.
Advertisements

Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
PKDD Discovery Challenges short review Jan Rauch EuroMISE – Cardio University of Economics, Prague This work is supported by the project LN00B107 of the.
University of Economics, Prague MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering
ECML/PKDD Discovery Challenges Petr Berka University of Economics, Prague
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1. Abstract 2 Introduction Related Work Conclusion References.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Mining Financial Data Histograms & Contingency Tables Shishir Gupta Under the guidance of Dr. Mirsad Hadzikadic.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Business Intelligence & Exam 1 Review
Business Intelligence
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Enterprise systems infrastructure and architecture DT211 4
Data Mining Techniques
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
INTELLIGENT SYSTEMS BUSINESS MOTIVATION BUSINESS INTELLIGENCE M. Gams.
Martin Ralbovský KIZI FIS VŠE The GUHA method Provides a general mainframe for retrieving interesting information from data Strong foundations.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
1 SHIM 413 Database Applications for Healthcare Fall 2006 Slides by H. T. Bao.
The Assessment of COST Actions PHOENIX Workshop in Kyrgyzstan, May 2007 “Road to excellence: Research evaluation in SSH“
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Workshop on the Implementation of the 2008 SNA in EECCA Countries and Linkages with BPM 6 and GFSM May 2015, Istanbul, Turkey 1.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
1 A Conceptual Framework of Data Mining Y.Y. Yao Department of Computer Science, University of Regina Regina, Sask., Canada S4S 0A2
Money Matters Financial literacy for youth By Andrea Kulkarni.
The CRISP Data Mining Process. August 28, 2004Data Mining2 The Data Mining Process Business understanding Data evaluation Data preparation Modeling Evaluation.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Software Architecture Evaluation Methodologies Presented By: Anthony Register.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
PKDD Discovery Challenge (not only) on Financial Data Petr Berka Laboratory for Intelligent Systems University of Economics, Prague
MIS2502: Data Analytics Advanced Analytics - Introduction.
Requirements Engineering Processes. Syllabus l Definition of Requirement engineering process (REP) l Phases of Requirements Engineering Process: Requirements.
Risk Solutions & Research © Copyright IBM Corporation 2005 Default Risk Modelling : Decision Tree Versus Logistic Regression Dr.Satchidananda S Sogala,Ph.D.,
Clustering Algorithms Minimize distance But to Centers of Groups.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
Applying Adaptive Software Development (ASD) Agile Modeling on Predictive Data Mining Applications: ASD-DM Methodology M. Alnoukari 1 Z.Alzoabi 2 S.Hanna.
Introduction to Machine Learning, its potential usage in network area,
Educational Communication & E-learning
Data Mining.
Guide to the Clickstream Data
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
INTELLIGENT SYSTEMS BUSINESS MOTIVATION BUSINESS INTELLIGENCE
Eick: Introduction Machine Learning
Using DLESE: Finding Resources to Enhance Teaching
©Jiawei Han and Micheline Kamber
MIS5101: Data Analytics Advanced Analytics - Introduction
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Prepared by: Mahmoud Rafeek Al-Farra
Systems Analysis and Design
Supporting End-User Access
Understanding Customer Behaviors with Information Technologies
KNOWLEDGE MANAGEMENT (KM) Session # 37
Reengineering the Audit with Blockchain and Smart Contracts
MIS2502: Data Analytics Introduction to Advanced Analytics
SMART & CARING GRANT APPLICATION WORKSHOP
Christoph F. Eick: A Gentle Introduction to Machine Learning
CSE591: Data Mining by H. Liu
Presentation transcript:

PKDD Discovery Challenge (not only) on Financial Data Petr Berka Laboratory for Intelligent Systems University of Economics, Prague berka@vse.cz

Cups, Challenges, Competitions KDD Cups (since 1997) KDD Sisyphus at ECML 1998 PKDD Discovery Challenges (since 1999) COIL Competition 2000 PAKDD Challenge 2000 PT Challenge 2000, 2001 JSAI KDD Challenge 2001 EUNITE Competition 2001, 2002 . . . Bold typed – I participated or was involved in organizing DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

PKDD Discovery Challenge Idea Realistic data mining conditions collaborative rather then competitive nature rather vague specification of the problem Differences to real KDD projects short time for analysis (2-3 months) only indirect access to domain and data experts during KDD process The idea originates from Jan Zytkow who suggested to organize at PKDD99 in Prague a little bit different event than are the KDD Cups. Ideal contribution should describe the goals of the challenge (in business terms), the method used (preprocessing, data mining) and the achieved results. 2-3 months is enough for building a model to classify a singe table but is too short to understand a complex domain and the data collected. DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 Challenge Settings Data and their full description available on the web for all participants Submissions evaluated by domain experts (but no ordering, no winners and losers) Workshop at PKDD to present the results and discus them with domain experts Results and comments of experts available on the web (after the workshop) Web based access – registering only to keep track of interested people. DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

PKDD Challenges http://lisp.vse.cz/challenge 1999, Prague financial data, thrombosis data 2000, Lyon financial data, modified thrombosis data 2001, Freiburg modified thrombosis data 2002, Helsinki atherosclerosis data, hepatitis data I’d like to acknowledge the contribution of Shusaku Tsumoto, who provided the challenge with the medical (thrombosis, hepatitis) data. DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

Financial Challenge Background Czech bank offering private accounts Available data for pilot study (29000 clients) personal characteristics basic info about accounts transactions for three months Proposed tasks segmentation (defining different types of clients w.r.t. debt) early detection of debts DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

Financial Challenge Data DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 Contributions Method oriented show a method/system working on the data Problem oriented (prototype solutions) loan and/or credit cards description loan and/or credit cards classification initial exploration relation between branches clients segmentation DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 Description of loans Relations between loan category and account characteristics [Coufal et al, 1999 - GUHA] [Mikšovský et al, 1999 - EXCEL] DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

Classification of loans Detecting risky clients before they are granted a loan [Mikšovský et al, 1999 - C5.0] decision tree to find the relevance of attributes decision tree for classification (using misclassification costs) DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

Credit Cards Promotion Description - find characteristics of a card holder deviation detection Classification - predict score for „card value“ k-nearest neighbour [Putten, 1999] DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 Clients Segmentation Description - segmentation of clients according to transactions [Hotho, Meadche, 2000] Kohonen map + decision trees Rule #1 for Cluster 3 If ATTR5 > 9945 and ATTR13 > 0 Then -> Cluster 3 (115, 0.983) DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

Challenge Organizing Lessons To get and prepare real data is difficult The time for analyzes should be as long as possible The response rate was rather low (~ 10%) No synergy effect observed DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 DM Lessons (1/4) Cooperate with experts domain experts data experts . . . … and with users DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DM Lessons (2/4) Use knowledge intensive preprocessing methods … compute age and sex from birth_number set flags for different types of operations compute monthly characteristics of transactions (sum, avg, min, max) lbalance = 1/30 i balance(i)  days(i). DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 DM Lessons (3/4) Make the results understandable [Werner, Fogarty 2001] DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 DM Lessons (4/4) Show some (even preliminary) results soon experts are interested in solutions not in applying sophisticated methods DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

Discovery Challenge Benefits Experts deeper insight into the data Participants experience with analyzing large real data motivations for further research ML/KDD Community prototype tasks/solutions (like the MiningMart project?) Organizators … invitation to DMLL Workshop :-) DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 Thank You DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002 Contributions DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002