Active subgroup mining for descriptive induction tasks Dragan Gamberger Rudjer Bošković Instute, Zagreb Zdenko Sonicki University of Zagreb.

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Office of SA to CNS GeoIntelligence Introduction Data Mining vs Image Mining Image Mining - Issues and Challenges CBIR Image Mining Process Ontology.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Building Global Models from Local Patterns A.J. Knobbe.
WRSTA, 13 August, 2006 Rough Sets in Hybrid Intelligent Systems For Breast Cancer Detection By Aboul Ella Hassanien Cairo University, Faculty of Computer.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Research Basics PE 357. What is Research? Can be diverse General definition is “finding answers to questions in an organized and logical and systematic.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Learning From Data Chichang Jou Tamkang University.
Data Mining.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
1 An Excel-based Data Mining Tool Chapter The iData Analyzer.
Data Mining – Intro.
Part I: Classification and Bayesian Learning
1 Enviromatics Decision support systems Decision support systems Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Techniques
Intelligent Data Analysis (IDA) by Josipa Kern, PhD Andrija Stampar School of Public Health Medical School University of Zagreb Zagreb, Croatia.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
A Few Answers Review September 23, 2010
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
In search for patterns of user interaction for digital libraries Jela Steinerová Comenius University, Bratislava, Slovakia
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
COMP3503 Intro to Inductive Modeling
An Approach of Artificial Intelligence Application for Laboratory Tests Evaluation Ş.l.univ.dr.ing. Corina SĂVULESCU University of Piteşti.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
The present publication was developed under grant X from the U.S. Department of Education, Office of Special Education Programs. The views.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Lecture 5: Writing the Project Documentation Part III.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Human Inquiry and Science Holographic Overview. Questions for Discussion What are the common errors of human inquiry? What are quantitative and qualitative.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
CS690L Data Mining: Classification
THE SUPPORTING ROLE OF ONTOLOGY IN A SIMULATION SYSTEM FOR COUNTERMEASURE EVALUATION Nelia Lombard DPSS, CSIR.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Decision Mining in Prom A. Rozinat and W.M.P. van der Aalst Joosung, Ko.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Data Mining and Decision Support
Approach to building ontologies A high-level view Chris Wroe.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
TDM in the Life Sciences Application to Drug Repositioning *
Data Mining – Intro.
Profiling based unstructured process logs
Rule Induction for Classification Using
Overview of Statistics
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
Supporting End-User Access
2. An overview of SDMX (What is SDMX? Part I)
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

Active subgroup mining for descriptive induction tasks Dragan Gamberger Rudjer Bošković Instute, Zagreb Zdenko Sonicki University of Zagreb

Talk overview: - descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example

Descriptive induction is aimed at generating (inducing) knowledge that is understandable (interpretable) by humans. It is different from classification aimed induction where the main goal is high classification quality (but induced classification schemes are typically too complex for human interpretation).

Main properties of descriptive induction: - simple rules - reasonable prediction quality (both on available and future cases) Main problem: overfitting functional genomics domain has 150 examples with measured attribute values

- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example

Active subgroup mining is a data analysis approach specially developed for medical applications (but applicable also for other domains). It is based on the observation that expert knowledge (in medical domains it means knowledge and experience of medical doctors) is very important for the quality of obtained results.

In active subgroup mining the expert is positioned in the center of the process and machine learning (subgroup discovery) is only a tool that helps him in the data analysis process.

definition of task(s) induction of models presentation visualization integration statistical evaluation selection of models expert subgroup discovery

- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example

classical versus subgroup discovery induction

very specific subgroup very sensitive subgroup generality – the main parameter of the subgroup induction process

Subgroup discovery is a beam search algorithm which generates short rules in the form of conjunctions of conditions. Conditions are based on the values of available attributes. example: CHD 53 AND T.CH > 6.1 AND BMI < 30

- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example

dms.irb.hr

meningoencephalitis domain subgroup describing bacteria in contrast to the virus type disease

- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example

Conclusions: -descriptive induction and active subgroup mining are novel concepts potentially very interesting for data analysis and knowledge induction in medical applications - active and central role of medical experts is essential

- we have extensive and positive experience with these methodology on different medical domains but no experience in constructing medical guidelines. For such applications potentially useful might be: - detection of decision points for numerical attributes - detection of apparent but significant contradictions - explicit noise detection