Presentation is loading. Please wait.

Presentation is loading. Please wait.

Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.

Similar presentations


Presentation on theme: "Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive."— Presentation transcript:

1 Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive Processes

2 © 2007 SNU CSE Biointelligence Lab 2 Project Purpose Medical Diagnosis  To predict either benign or malignant case of breast cancer  Human experts (M.D.) vs Machine (GP) Data Sets  came from Wisconsin Diagnostic Breast Cancer (WDBC) data in UCI Machine Leaning Repository ( http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/ ) http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/  Two text files for training and test data, respectively  You can download them in the course web page.

3 © 2007 SNU CSE Biointelligence Lab 3 Wisconsin Diagnostic Breast Cancer Data Description  Number of patients: 569  Benign (0): 357, Malignant (1): 212  Training: 456, Test: 113  Features : 10 attributes × 3 kinds = 30 features  Real-valued features are computed from the digitized images. 1) Attributes radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, fractal dimension 2) Kinds Mean value, Standard Error, Worst or Largest Value Mean of attributesSE of attributesLV of attributesclass Patients 110 real values 0 or 1 Patients 210 real values 0 or 1 ……………

4 © 2007 SNU CSE Biointelligence Lab 4 Evolving a Classifier GP settings  Functions  Numerical operators  {+, -, *, /, exp, log, sin, cos, sqrt, …}  Some operators should be protected from the illegal operation.  Terminals  Input features and constants  {x 1, x 2, … x 30, R} where R  [a, b]  Additional parameters  Threshold value for the decision  Crossover and mutation rates  Population size and the maximum number of generations

5 © 2007 SNU CSE Biointelligence Lab 5 Fitness Function Maximization problem  Classification accuracy  Confusion matrix for the training data Minimization problem  Classification error  Number of the incorrectly classified patients: q + r True Predict PositiveNegative Positivepq Negativers

6 © 2007 SNU CSE Biointelligence Lab 6 Bloat Bloat = “ survival of the fattest ”, i.e., the tree sizes in the population are increasing over time There are many studies devoted to understanding why bloat occurs. For reducing the tree growth  We need countermeasures, e.g.  Prohibiting variation operators that would deliver “ too big ” children → discard big children and perform crossover again  Parsimony pressure: penalty for being oversized

7 © 2007 SNU CSE Biointelligence Lab 7 Experiments One problems  WDBC Diagnostics Various experimental setup  Termination condition: maximum_generation  A GP run is stopped when the number of generation reaches a given limit.  Various settings  Effects of the penalty term: adjusting α  Different function sets: different models (e.g. polynomial vs. complex functions)  Selection methods and their parameters  Crossover and mutation rates

8 © 2007 SNU CSE Biointelligence Lab 8 Results For each problem  Show the result table and write your own analysis  At least 10 runs for one setting  Present the optimal classifier (the best GP tree).  Write the confusion matrix for the test data by using the best tree.  Draw learning curves of your experiments.  Compare with the results of neural networks (optional). TrainingTest Average  SD BestWorst Average  SD BestWorst Setting 1 Setting 2 Setting 3

9 © 2007 SNU CSE Biointelligence Lab 9 Generation Fitness (Error)

10 © 2007 SNU CSE Biointelligence Lab 10 References Source Codes  GP libraries (C, C++, JAVA, …)  MATLAB Tool box Web sites  http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html  http://cs.gmu.edu/~eclab/projects/ecj/ http://cs.gmu.edu/~eclab/projects/ecj/  http://www.geneticprogramming.com/GPpages/softwar e.html http://www.geneticprogramming.com/GPpages/softwar e.html  …

11 © 2007 SNU CSE Biointelligence Lab 11 Pay Attention! Due: May 17, 2007 Submission  Source code and executable file(s)  Proper comments in the source code  Via e-mail (jwha@bi.snu.ac.kr)  Report: Hardcopy!! (Submit to 301-419)  Running environments and libraries (or packages) which you used.  Results for many experiments with various parameter settings  Analysis and explanation about the results in your own way


Download ppt "Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive."

Similar presentations


Ads by Google