Download presentation
Presentation is loading. Please wait.
Published byBaldric Harvey Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Knowledge discovery with classification rules in a cardiovascular dataset Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Viii Podgorelec a,*, Peter Kokol a, Milojka Molan Sti81ic b, Marjan Heri :ko a, Ivan Rozrnan a Computer Methods and Programs in Biomedicine, Volume 80, Supplement 1, December 2005, Pages S39-S49
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Motivation Objective Introduction The AREX algorithm Experiment Conclusions Outline
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Modern medicine generates huge amounts of data and there is an acute and widening gap between data collection and data comprehension. it is very difficult for a human to make use of such amount of information (i.e. hundreds of attributes, thousand of images, several channels of 24 hours of ECG or EEG signals)
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective enable searching for new facts, which should reveal some new interesting patterns and possibly improve the existing medical knowledge.
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction Decision tree ─ Advantage transparency of the classification process that one can easily interpret, understand and criticize. ─ Disadvantages poor processing of incomplete, noisy data, inability to build several trees for the same dataset inability to use the preferred attributes, etc.
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 The AREX algorithm s N 1.2 Classify object with nt randomly chosen trees from all N trees S* 1.4 From all N decision create M initial classification rules 1 multi-population self-adapting genetic algorithm for the induction of decision trees. 2 evolution of programs in an arbitrary programming language, which is used to evolve classification 2.4 an optimal set of classification rules is determined with a simple genetic algorithm 1 2 3 1.3 if frequency of the most frequent decision class classified by nt trees > nt - ct Oi (ct=nt/2) 2.1 create M/2+1 rules (randomly) 2.3 If s is not empty Add |s| randomly chosen objects from s* to s ct=ct+1 repeat 1.1 1.1 Build N decision trees upon objects from S 2.2
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 2. Select an attribute Xi Genetic algorithm for the construction of decision trees population 1.Number of attribute nodes M that will be in the tree M attributes Xi null root null Xi null 4. Randomly select an attribute Xi ( 還沒被選過的機率較高 ) For each empty leaf the following algorithm determines the appropriate decision class 3. 選一空節點, (tree 深度 愈高,選中機率愈低 ) Xi null (1)Continuous attributes →split constant (2)Discrete attributes →randomly defined two disjunctive sets 未使用的 leaf nodes
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 proGenesys system & Finding the optimal set of rules (most of the objects covered by the rule should fall into the same decision class) (it covers many objects - otherwise it tends to be too specific).
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Dataset contains data of 100 patients from Maribor Hospital. The attributes include ─ general data (age, sex, etc.) ─ health status (data from family history and child's previous illnesses), ─ general cardiovascular data (blood pressure, pulse, chest pain, etc.) ─ more specialized cardiovascular data - data from child's cardiac history and clinical examinations (with findings of ultrasound, ECG, etc.). dataset five different diagnoses are possible: ─ innocent heart murmur 良性雜音 ─ congenital heart disease with left-to-right shunt 先天性心臟病 ( 左向右 分流 ) ─ aortic valve disease with aorta coarctation, 主動脈辨疾病 ( 主動脈縮窄 ) ─ arrhythmias 心律不整 ─ chest pain. 心悸
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Classification result –training set Overfitting
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Classification result –testing set
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Classification result
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Conclusions One of the most evident advantages of AREX is the simultaneous very good ─ Generalization → high and similar overall accuracy on both training set and test set ─ Specialization → high and very similar accuracy of all decision classes, also the least frequent ones. equip physicians with a powerful technique to ─ (1) confirm their existing knowledge about some medical problem ─ (2) enable searching for new facts, which should reveal some new interesting patterns and possibly improve the existing medical knowledge.
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 My opinion Advantage: 依類別給予權重 Disadvantage: Apply: 實際應用於臨床上
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.