Biological data mining by Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

Slides:



Advertisements
Similar presentations
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
Advertisements

Decision Tree Approach in Data Mining
Using data sets to simulate evolution within complex environments Bruce Edmonds Centre for Policy Modelling Manchester Metropolitan University.
Optimization Problem with Simple Genetic Algorithms Cho, Dong-Yeon
Indian Statistical Institute Kolkata
GP Applications Two main areas of research Testing genetic programming in areas other techniques have been applied to. Applying genetic programming to.
Biologically Inspired AI (mostly GAs). Some Examples of Biologically Inspired Computation Neural networks Evolutionary computation (e.g., genetic algorithms)
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
TEMPLATE DESIGN © Genetic Algorithm and Poker Rule Induction Wendy Wenjie Xu Supervised by Professor David Aldous, UC.
A Classification Approach for Effective Noninvasive Diagnosis of Coronary Artery Disease Advisor: 黃三益 教授 Student: 李建祥 D 楊宗憲 D 張珀銀 D
Genetic algorithms applied to multi-class prediction for the analysis of gene expressions data C.H. Ooi & Patrick Tan Presentation by Tim Hamilton.
Doug Downey, adapted from Bryan Pardo, Machine Learning EECS 349 Machine Learning Genetic Programming.
PGM: Tirgul 11 Na?ve Bayesian Classifier + Tree Augmented Na?ve Bayes (adapted from tutorial by Nir Friedman and Moises Goldszmidt.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
Chapter 6: Transform and Conquer Genetic Algorithms The Design and Analysis of Algorithms.
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
Genetic Programming Chapter 6. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Genetic Programming GP quick overview Developed: USA.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Initial Value Problem: Find y=f(x) if y’ = f(x,y) at y(a)=b (1) Closed-form solution: explicit formula -y’ = y at y(0)=1 (Separable) Ans: y = e^x (2)
Medical Diagnosis via Genetic Programming Project #2 Artificial Intelligence: Biointelligence Computational Neuroscience Connectionist Modeling of Cognitive.
Genetic Algorithm.
Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.
Problems Premature Convergence Lack of genetic diversity Selection noise or variance Destructive effects of genetic operators Cloning Introns and Bloat.
SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
What is Genetic Programming? Genetic programming is a model of programming which uses the ideas (and some of the terminology) of biological evolution to.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Introduction to Evolutionary Algorithms Session 4 Jim Smith University of the West of England, UK May/June 2012.
Project 1: Machine Learning Using Neural Networks Ver 1.1.
Reducing the Response Time for Data Warehouse Queries Using Rough Set Theory By Mahmoud Mohamed Al-Bouraie Yasser Fouad Mahmoud Hassan Wesam Fathy Jasser.
Genetic Algorithms. Evolutionary Methods Methods inspired by the process of biological evolution. Main ideas: Population of solutions Assign a score or.
Artificial Intelligence Chapter 4. Machine Evolution.
Initial Population Generation Methods for population generation: Grow Full Ramped Half-and-Half Variety – Genetic Diversity.
Genetic Programming. GP quick overview Developed: USA in the 1990’s Early names: J. Koza Typically applied to: machine learning tasks (prediction, classification…)
Genetic Algorithms CSCI-2300 Introduction to Algorithms
Genetic Algorithms What is a GA Terms and definitions Basic algorithm.
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Project 2: Classification Using Genetic Programming Kim, MinHyeok Biointelligence laboratory Artificial.
Genetic Programming A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 6.
Automated discovery in math Machine learning techniques (GP, ILP, etc.) have been successfully applied in science Machine learning techniques (GP, ILP,
Alice E. Smith and Mehmet Gulsen Department of Industrial Engineering
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Solving Function Optimization Problems with Genetic Algorithms September 26, 2001 Cho, Dong-Yeon , Tel:
Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.
GENETIC ALGORITHM Basic Algorithm begin set time t = 0;
Artificial Intelligence Project 1 Neural Networks Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Genetic Programming COSC Ch. F. Eick, Introduction to Genetic Programming GP quick overview Developed: USA in the 1990’s Early names: J. Koza Typically.
John R. Koza [Edited by J. Wiebe] 1. GENETIC PROGRAMMING 2.
John R. Koza [Edited by J. Wiebe] 1. GENETIC PROGRAMMING 2.
Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon
Genetic Programming Using Simulated Natural Selection to Automatically Write Programs.
Modeling of Core Protection Calculator System Software February 28, 2005 Kim, Sung Ho Kim, Sung Ho.
1 Context-aware Data Mining using Ontologies Sachin Singh, Pravin Vajirkar, and Yugyung Lee Springer-Verlag Berlin Heidelberg 2003, pp Reporter:
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Genetic Programming.
Medical Diagnosis via Genetic Programming
Artificial Intelligence Project 2 Genetic Algorithms
Group 7 • Shing • Gueye • Thakur
Optimization and Learning via Genetic Programming
Artificial Intelligence Chapter 4. Machine Evolution
Project 1: Text Classification by Neural Networks
Artificial Intelligence Chapter 4. Machine Evolution
EE368 Soft Computing Genetic Algorithms.
Genetic Programming Chapter 6.
Genetic Programming.
Genetic Programming Chapter 6.
Genetic Programming Chapter 6.
Beyond Classical Search
Presentation transcript:

Biological data mining by Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon

© 2006 SNU CSE Biointelligence Lab 2 Project Purpose Medical Diagnosis  To predict the presence or absence of a disease given the results of various medical tests carried out on a patient  Human experts (M.D.) vs Machine (GP) Two Data Sets  Heart Disease  Diabetes

© 2006 SNU CSE Biointelligence Lab 3 Heart Disease Data Description  Number of patients (270)  Absence (150)  Presence (120)  13 attributes  age  sex  chest pain type (4 values)  resting blood pressure  serum cholestoral in mg/dl  fasting blood sugar > 120 mg/dl  resting electrocardiographic results (values 0,1,2)  maximum heart rate achieved  exercise induced angina  oldpeak = ST depression induced by exercise relative to rest  the slope of the peak exercise ST segment  number of major vessels (0-3) colored by flourosopy  thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

© 2006 SNU CSE Biointelligence Lab 4 Learning a Classifier GP settings  Functions  Numerical and condition operators  {+, -, *, /, exp, log, sin, cos, sqrt, iflte ifltz, …}  Some operators should be protected from the illegal operation.  Terminals  Input attributes and constants  {x 1, x 2, … x 13, R} where R  [a, b]  Additional parameters  Threshold value  For preprocessing (normalization)

© 2006 SNU CSE Biointelligence Lab 5 Cross Validation (1/3) K-fold Cross Validation  The data set is randomly divided into k subsets.  One of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set. 45 D1D1 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1 D2D2 D3D3 D4D4 D6D6 D5D5 D2D2 D3D3 D4D4 D5D5 D6D6 D1D1

© 2006 SNU CSE Biointelligence Lab 6 Cross Validation (2/3) Confusion Matrix for test data sets  Number of patients = p + q + r + s  Accuracy True Predict PositiveNegative Positivepq Negativers

© 2006 SNU CSE Biointelligence Lab 7 Cross Validation (3/3) Cross validation and Confusion Matrix  At least 10 runs for your k value.  Show the confusion matrix for the best result of your experiments. RunAccuracy 1 2  10 Average

© 2006 SNU CSE Biointelligence Lab 8 Initialization Maximum initial depth of trees D max is set. Full method (each branch has depth = D max ):  nodes at depth d < D max randomly chosen from function set F  nodes at depth d = D max randomly chosen from terminal set T Grow method (each branch has depth  D max ):  nodes at depth d < D max randomly chosen from F  T  nodes at depth d = D max randomly chosen from T Common GP initialisation: ramped half-and-half, where gr ow and full method each deliver half of initial population

© 2006 SNU CSE Biointelligence Lab 9 Fitness Function Maximization problem  Number of the correctly classified patients Minimization problem  Number of the incorrectly classified patients  Mean Squared Error  N: number of training data

© 2006 SNU CSE Biointelligence Lab 10 Selection (1/2) Fitness proportional (roulette wheel) selection  The roulette wheel can be constructed as follows.  Calculate the total fitness for the population.  Calculate selection probability p k for each chromosome v k.  Calculate cumulative probability q k for each chromosome v k.

© 2006 SNU CSE Biointelligence Lab 11 Procedure: Proportional_Selection  Generate a random number r from the range [0,1].  If r  q 1, then select the first chromosome v 1 ; else, select the kth chromosome v k (2  k  pop_size) such that q k-1 < r  q k. pkpk qkqk

© 2006 SNU CSE Biointelligence Lab 12 Selection (2/2) Tournament selection  Tournament size q Ranking-based selection  2    POP_SIZE  1   +  2 and  - = 2 -  +

© 2006 SNU CSE Biointelligence Lab 13 GP Flowchart GA loopGP loop

© 2006 SNU CSE Biointelligence Lab 14 Bloat Bloat = “ survival of the fattest ”, i.e., the tree sizes in the population are increasing over time Ongoing research and debate about the reasons Needs countermeasures, e.g.  Prohibiting variation operators that would deliver “ too big ” children  Parsimony pressure: penalty for being oversized

© 2006 SNU CSE Biointelligence Lab 15

© 2006 SNU CSE Biointelligence Lab 16 Experiments Two problems  Heart Disease  Pima Indian diabetes Various experimental setup  Termination condition: maximum_generation  Various settings  Effects of the penalty term  Different function and terminal sets  Selection methods and their parameters  Crossover and mutation probabilities

© 2006 SNU CSE Biointelligence Lab 17 Results For each problem  Result table and your analysis  Present the optimal classifier  Draw a learning curve for the run where the best solution was found.  Compare with the results of neural networks (optional).  Different k for cross validation (optional) TrainingTest Average  SD BestWorst Average  SD BestWorst Setting 1 Setting 2 Setting 3

© 2006 SNU CSE Biointelligence Lab 18 Generation Fitness (Error)

© 2006 SNU CSE Biointelligence Lab 19 References Source Codes  GP libraries (C, C++, JAVA, …)  MATLAB Tool box Web sites    e.html e.html  …

© 2006 SNU CSE Biointelligence Lab 20 Pay Attention! Due: Nov. 16, 2006 Submission  Source code and executable file(s)  Proper comments in the source code  Via  Report: Hardcopy!!  Running environments and libraries (or packages) which you used.  Results for many experiments with various parameter settings  Analysis and explanation about the results in your own way