Download presentation
Presentation is loading. Please wait.
Published byJessie Rich Modified over 9 years ago
1
Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive Processes
2
© 2007 SNU CSE Biointelligence Lab 2 Project Purpose Medical Diagnosis To predict either benign or malignant case of breast cancer Human experts (M.D.) vs Machine (GP) Data Sets came from Wisconsin Diagnostic Breast Cancer (WDBC) data in UCI Machine Leaning Repository ( http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/ ) http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/ Two text files for training and test data, respectively You can download them in the course web page.
3
© 2007 SNU CSE Biointelligence Lab 3 Wisconsin Diagnostic Breast Cancer Data Description Number of patients: 569 Benign (0): 357, Malignant (1): 212 Training: 456, Test: 113 Features : 10 attributes × 3 kinds = 30 features Real-valued features are computed from the digitized images. 1) Attributes radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, fractal dimension 2) Kinds Mean value, Standard Error, Worst or Largest Value Mean of attributesSE of attributesLV of attributesclass Patients 110 real values 0 or 1 Patients 210 real values 0 or 1 ……………
4
© 2007 SNU CSE Biointelligence Lab 4 Evolving a Classifier GP settings Functions Numerical operators {+, -, *, /, exp, log, sin, cos, sqrt, …} Some operators should be protected from the illegal operation. Terminals Input features and constants {x 1, x 2, … x 30, R} where R [a, b] Additional parameters Threshold value for the decision Crossover and mutation rates Population size and the maximum number of generations
5
© 2007 SNU CSE Biointelligence Lab 5 Fitness Function Maximization problem Classification accuracy Confusion matrix for the training data Minimization problem Classification error Number of the incorrectly classified patients: q + r True Predict PositiveNegative Positivepq Negativers
6
© 2007 SNU CSE Biointelligence Lab 6 Bloat Bloat = “ survival of the fattest ”, i.e., the tree sizes in the population are increasing over time There are many studies devoted to understanding why bloat occurs. For reducing the tree growth We need countermeasures, e.g. Prohibiting variation operators that would deliver “ too big ” children → discard big children and perform crossover again Parsimony pressure: penalty for being oversized
7
© 2007 SNU CSE Biointelligence Lab 7 Experiments One problems WDBC Diagnostics Various experimental setup Termination condition: maximum_generation A GP run is stopped when the number of generation reaches a given limit. Various settings Effects of the penalty term: adjusting α Different function sets: different models (e.g. polynomial vs. complex functions) Selection methods and their parameters Crossover and mutation rates
8
© 2007 SNU CSE Biointelligence Lab 8 Results For each problem Show the result table and write your own analysis At least 10 runs for one setting Present the optimal classifier (the best GP tree). Write the confusion matrix for the test data by using the best tree. Draw learning curves of your experiments. Compare with the results of neural networks (optional). TrainingTest Average SD BestWorst Average SD BestWorst Setting 1 Setting 2 Setting 3
9
© 2007 SNU CSE Biointelligence Lab 9 Generation Fitness (Error)
10
© 2007 SNU CSE Biointelligence Lab 10 References Source Codes GP libraries (C, C++, JAVA, …) MATLAB Tool box Web sites http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://cs.gmu.edu/~eclab/projects/ecj/ http://cs.gmu.edu/~eclab/projects/ecj/ http://www.geneticprogramming.com/GPpages/softwar e.html http://www.geneticprogramming.com/GPpages/softwar e.html …
11
© 2007 SNU CSE Biointelligence Lab 11 Pay Attention! Due: May 17, 2007 Submission Source code and executable file(s) Proper comments in the source code Via e-mail (jwha@bi.snu.ac.kr) Report: Hardcopy!! (Submit to 301-419) Running environments and libraries (or packages) which you used. Results for many experiments with various parameter settings Analysis and explanation about the results in your own way
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.