University of Economics, Prague MLNET related activities of Laboratory for Intelligent Systems and Dept. of Information and Knowledge Engineering
(c) Petr Berka, LISp, Research ä probabilistic methods - decomposable probability models and bayesian networks ä symbolic methods - generalized association rules and decision rules ä logical calculi for knowledge discovery in databases
(c) Petr Berka, LISp, People Jiří IvánekRadim Jiroušek Petr Berka Jan Rauch Tomáš KočkaVojtěch Svátek
(c) Petr Berka, LISp, Software LISp-Miner ä two data mining procedures: 4FT Miner (generalised association rules) and KEX (decision rules), ä large preprocessing module including SQL, ä output of rules in database format enables the users to implement own interpretation procedures.
(c) Petr Berka, LISp, LISP-Miner procedures ä 4FT-Miner (GUHA procedure) generalised association rules in the form Ant ~ Suc / Cond ä KEX weighted decision rules in the form Ant ==> C (weight)
(c) Petr Berka, LISp, FT-Miner Data Matrix: CLIENTS LOANS Id Age Sex Salary District Amount Payment Months Quality 1 45 F Prague good M Brno bad Problem: Are there segments of clients SC and segments of loans SL such that To be in SC is at 90% equivalent to have a loan from SL and there is at least 100 such clients Ant is at 90% equivalent to Suc Ant 0.90%, 100 Suc is true iff a/(a+b+c) 0.9 a 100 Suc Suc a - number of objects satisfying Ant and Suc Ant a b b- number of objects satisfying Ant and not satisfying Suc Ant c d c- number of objects not satisfying Ant and satisfying Suc d- number of objects satisfying neither Ant nor Suc
(c) Petr Berka, LISp, FT Miner Input: Data matrix, quantifier 0.90%, 100 Derived attributes for SC (possible Ant): Age (7 values), Sex (2 values), Salary (3 values), District (77 values) Derived attributes for SL (possible Suc): Amount (6 values), Duration (5 values), Quality (2 values) Output: All Ant 0.90%, 100 Suc true in data matrix (5 equivalences from about 5 milions possible relations) an example: Age( ) Sex(F) Salary(low) District (Prague) 0.90%, 100 Amount<20,50) Quality(Bad) Suc Suc a/(a+b+c) = 0.95 0.9 Ant 950 100 Ant
(c) Petr Berka, LISp, KEX - classification
(c) Petr Berka, LISp, KEX - learning
(c) Petr Berka, LISp, LISp-Miner
(c) Petr Berka, LISp, LISp-Miner
(c) Petr Berka, LISp, LISp-Miner
(c) Petr Berka, LISp, LISp-Miner
(c) Petr Berka, LISp, FT Miner and KEX Applications ä truck reliability assessment ä quality control in a brewery ä segmentation of clients of a bank ä short-term electric load prediction
(c) Petr Berka, LISp, LISp Miner References: Berka,P. - Ivanek,J.: Automated Knowledge Acquisition for PROSPECTOR-like Expert Systems. In: (Bergadano, deRaedt eds.) Proc. ECML'94, Springer 1994, Berka,P. - Rauch,J.: Data Mining using GUHA and KEX. In: (Callaos, Yang, Aguilar eds.) 4th. Int. Conf. on Information Systems, Analysis and Synthesis ISAS'98, 1998, Vol 2, Rauch,J.: Classes of Four Fold Table Quantifiers. In: (Zytkow, Quafafou eds.) Principles of Data Mining and Knowledge Discovery. Springer 1998,
(c) Petr Berka, LISp, Datasets PKDD‘99 Discovery Challenge data ( ä financial data: clients of a bank, their accounts, transactions, loans etc, ä medical data: patients with collagen disease
(c) Petr Berka, LISp, Financial data
(c) Petr Berka, LISp, Medical data
(c) Petr Berka, LISp, Other activities ä Organized conferences ä Teaching (in czech) ä KDD ä KDD seminar ä ML
(c) Petr Berka, LISp, New projects SOL-EU-NET project „Data Mining and Decision Support for Business Competitiveness: A European Virtual Enterprise“ (supported by EU grant IST ) (supported by EU grant IST )