Medical Diagnosis via Genetic Programming

Medical Diagnosis via Genetic Programming
AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2006 SNU CSE Biointelligence Lab
Project Purpose Medical Diagnosis To predict the presence or absence of a disease given the results of various medical tests carried out on a patient Human experts (M.D.) vs Machine (GP) Two Data Sets Heart Disease Diabetes © 2006 SNU CSE Biointelligence Lab

Heart Disease Data Description Number of patients (270) Absence (150) Presence (120) 13 attributes age sex chest pain type (4 values) resting blood pressure serum cholestoral in mg/dl fasting blood sugar > 120 mg/dl resting electrocardiographic results (values 0,1,2) maximum heart rate achieved exercise induced angina oldpeak = ST depression induced by exercise relative to rest the slope of the peak exercise ST segment number of major vessels (0-3) colored by flourosopy thal: 3 = normal; 6 = fixed defect; 7 = reversable defect © 2006 SNU CSE Biointelligence Lab

Learning a Classifier GP settings Functions Numerical and condition operators {+, -, *, /, exp, log, sin, cos, sqrt, iflte ifltz, …} Some operators should be protected from the illegal operation. Terminals Input attributes and constants {x1, x2, … x13, R} where R  [a, b] Additional parameters Threshold value For preprocessing (normalization) © 2006 SNU CSE Biointelligence Lab

Cross Validation (1/3) K-fold Cross Validation The data set is randomly divided into k subsets. One of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. D1 D2 D3 D4 D5 D6 45 45 45 45 45 45 D1 D2 D3 D4 D6 D5 45 45 45 45 45 45 D2 D3 D4 D5 D6 D1 45 45 45 45 45 45 © 2006 SNU CSE Biointelligence Lab

Cross Validation (3/3) Cross validation and Confusion Matrix At least 10 runs for your k value. Show the confusion matrix for the best result of your experiments. Run Accuracy 1 2  10 Average © 2006 SNU CSE Biointelligence Lab

Initialization Maximum initial depth of trees Dmax is set. Full method (each branch has depth = Dmax): nodes at depth d < Dmax randomly chosen from function set F nodes at depth d = Dmax randomly chosen from terminal set T Grow method (each branch has depth  Dmax): nodes at depth d < Dmax randomly chosen from F  T nodes at depth d = Dmax randomly chosen from T Common GP initialisation: ramped half-and-half, where grow and full method each deliver half of initial population © 2006 SNU CSE Biointelligence Lab

Fitness Function Maximization problem Number of the correctly classified patients Minimization problem Number of the incorrectly classified patients Mean Squared Error N: number of training data © 2006 SNU CSE Biointelligence Lab

Selection (1/2) Fitness proportional (roulette wheel) selection The roulette wheel can be constructed as follows. Calculate the total fitness for the population. Calculate selection probability pk for each chromosome vk. Calculate cumulative probability qk for each chromosome vk. © 2006 SNU CSE Biointelligence Lab

Procedure: Proportional_Selection Generate a random number r from the range [0,1]. If r  q1, then select the first chromosome v1; else, select the kth chromosome vk (2 k  pop_size) such that qk-1 < r  qk. pk qk 1 2 3 4 5 6 7 8 9 10 © 2006 SNU CSE Biointelligence Lab

Bloat Bloat = “survival of the fattest”, i.e., the tree sizes in the population are increasing over time Ongoing research and debate about the reasons Needs countermeasures, e.g. Prohibiting variation operators that would deliver “too big” children Parsimony pressure: penalty for being oversized © 2006 SNU CSE Biointelligence Lab

Experiments Two problems Heart Disease Pima Indian diabetes Various experimental setup Termination condition: maximum_generation Various settings Effects of the penalty term Different function and terminal sets Selection methods and their parameters Crossover and mutation probabilities © 2006 SNU CSE Biointelligence Lab

Results For each problem Result table and your analysis Present the optimal classifier Draw a learning curve for the run where the best solution was found. Compare with the results of neural networks (optional). Different k for cross validation (optional) Training Test Average  SD Best Worst Setting 1 Setting 2 Setting 3 © 2006 SNU CSE Biointelligence Lab

Pay Attention! Due: May 11, 2006 Submission Source code and executable file(s) Proper comments in the source code Via Report: Hardcopy!! Running environments and libraries (or packages) which you used. Results for many experiments with various parameter settings Analysis and explanation about the results in your own way © 2006 SNU CSE Biointelligence Lab

Medical Diagnosis via Genetic Programming

Similar presentations

Presentation on theme: "Medical Diagnosis via Genetic Programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Medical Diagnosis via Genetic Programming

Similar presentations

Presentation on theme: "Medical Diagnosis via Genetic Programming"— Presentation transcript:

Similar presentations

About project

Feedback