Active subgroup mining for descriptive induction tasks Dragan Gamberger Rudjer Bošković Instute, Zagreb Zdenko Sonicki University of Zagreb
Talk overview: - descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example
Descriptive induction is aimed at generating (inducing) knowledge that is understandable (interpretable) by humans. It is different from classification aimed induction where the main goal is high classification quality (but induced classification schemes are typically too complex for human interpretation).
Main properties of descriptive induction: - simple rules - reasonable prediction quality (both on available and future cases) Main problem: overfitting functional genomics domain has 150 examples with measured attribute values
- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example
Active subgroup mining is a data analysis approach specially developed for medical applications (but applicable also for other domains). It is based on the observation that expert knowledge (in medical domains it means knowledge and experience of medical doctors) is very important for the quality of obtained results.
In active subgroup mining the expert is positioned in the center of the process and machine learning (subgroup discovery) is only a tool that helps him in the data analysis process.
definition of task(s) induction of models presentation visualization integration statistical evaluation selection of models expert subgroup discovery
- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example
classical versus subgroup discovery induction
very specific subgroup very sensitive subgroup generality – the main parameter of the subgroup induction process
Subgroup discovery is a beam search algorithm which generates short rules in the form of conjunctions of conditions. Conditions are based on the values of available attributes. example: CHD 53 AND T.CH > 6.1 AND BMI < 30
- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example
dms.irb.hr
meningoencephalitis domain subgroup describing bacteria in contrast to the virus type disease
- descriptive induction - active subgroup mining - subgroup discovery - data mining server - a real medical example
Conclusions: -descriptive induction and active subgroup mining are novel concepts potentially very interesting for data analysis and knowledge induction in medical applications - active and central role of medical experts is essential
- we have extensive and positive experience with these methodology on different medical domains but no experience in constructing medical guidelines. For such applications potentially useful might be: - detection of decision points for numerical attributes - detection of apparent but significant contradictions - explicit noise detection