Download presentation
Presentation is loading. Please wait.
Published byAshley Craig Modified over 9 years ago
1
Symbolic Regression via Genetic Programming AI Project #2 Biointelligence lab Cho, Dong-Yeon (dycho@bi.snu.ac.kr)
2
© 2005 SNU CSE Biointelligence Lab 2 Example (1/2) Data Relationship between A and P AP 0.390.24 0.720.61 1.00 1.521.84 5.2011.9 9.5329.4 19.183.5
3
© 2005 SNU CSE Biointelligence Lab 3 Example (2/2) Kepler’s Third Law Square of any planet's orbital period (sidereal) is proportional to cube of its mean distance (semi-major axis) from Sun PlanetAP Mercury0.390.24 Venus0.720.61 Earth1.00 Mars1.521.84 Jupiter5.2011.9 Saturn9.5329.4 Uranus19.183.5
4
© 2005 SNU CSE Biointelligence Lab 4 Koza’s Algorithm 1. Choose a set of possible functions and terminals for the program. F = {+, - *, /, }, T = {A} 2. Generate an initial population of random trees (programs) using the set of possible functions and terminals. 3. Calculate the fitness of each program in the population by running it on a set of “fitness cases” (a set of input for which the correct output is known). 4. Apply selection, crossover, and mutation to the population to form a new population. 5. Steps 3 and 4 are repeated for some number of generations. Evolving the Programs (1/2)
5
© 2005 SNU CSE Biointelligence Lab 5 Evolving Lisp Programs (2/2) Kepler’s Third Law: P 2 = cA 3 FORTRAN LISP PROGRAM ORBITAL_PERIORD C# Mars # A = 1.52 P = SQRT(A * A * A) PRINT P END ORBITAL_PERIORD (defun orbital_period () ; Mars ; (setf A 1.52) (sqrt (* A (* A A)))) Parse tree
6
© 2005 SNU CSE Biointelligence Lab 6 Symbolic Regression by GP Objective Find the function f for the given data (x, y) Data Sets Set 1 and 2: 11 pairs Set 3: 50 pairs
7
© 2005 SNU CSE Biointelligence Lab 7 Functions and Terminals Functions Numerical operators {+, -, *, /, exp, log, sin, cos, sqrt} Some operators should be protected from the illegal operation. Terminals Input and constants {x, R} where R [a, b]
8
© 2005 SNU CSE Biointelligence Lab 8 Initialization Maximum initial depth of trees D max is set. Full method (each branch has depth = D max ): nodes at depth d < D max randomly chosen from function set F nodes at depth d = D max randomly chosen from terminal set T Grow method (each branch has depth D max ): nodes at depth d < D max randomly chosen from F T nodes at depth d = D max randomly chosen from T Common GP initialisation: ramped half-and-half, where gr ow and full method each deliver half of initial population
9
© 2005 SNU CSE Biointelligence Lab 9 Fitness Functions Relative Squared Error The number of outputs that are within % of the correct value
10
© 2005 SNU CSE Biointelligence Lab 10 Selection (1/2) Fitness proportional (roulette wheel) selection The roulette wheel can be constructed as follows. Calculate the total fitness for the population. Calculate selection probability p k for each chromosome v k. Calculate cumulative probability q k for each chromosome v k.
11
© 2005 SNU CSE Biointelligence Lab 11 Procedure: Proportional_Selection Generate a random number r from the range [0,1]. If r q 1, then select the first chromosome v 1 ; else, select the kth chromosome v k (2 k pop_size) such that q k-1 < r q k. pkpk qkqk 10.082407 20.1106520.193059 30.1319310.324989 40.1214230.446412 50.0725970.519009 60.1288340.647843 70.0779590.725802 80.1020130.827802 90.0836630.911479 100.0885211.000000
12
© 2005 SNU CSE Biointelligence Lab 12 Selection (2/2) Tournament selection Tournament size q Ranking-based selection 2 POP_SIZE 1 + 2 and - = 2 - +
13
© 2005 SNU CSE Biointelligence Lab 13 GP Flowchart GA loopGP loop
14
© 2005 SNU CSE Biointelligence Lab 14 Bloat Bloat = “ survival of the fattest ”, i.e., the tree sizes in the population are increasing over time Ongoing research and debate about the reasons Needs countermeasures, e.g. Prohibiting variation operators that would deliver “ too big ” children Parsimony pressure: penalty for being oversized
15
© 2005 SNU CSE Biointelligence Lab 15
16
© 2005 SNU CSE Biointelligence Lab 16 Experiments At least three problems (+ your own data) Various experimental setup Termination condition: maximum_generation 2 Models 3 settings 20 runs Polynomial and general Effects of the penalty term Selection methods and their parameters Crossover p c and mutation p m
17
© 2005 SNU CSE Biointelligence Lab 17 Results For each problem Result table and your analysis Present the optimal function. Readable form and predicted function graph with data Draw a learning curve for the run where the best solution was found. You can draw all learning curves in one plot. PolynomialGeneral Average SD BestWorst Average SD BestWorst Setting 1 Setting 2 Setting 3
18
© 2005 SNU CSE Biointelligence Lab 18 Generation Fitness (Error)
19
© 2005 SNU CSE Biointelligence Lab 19 References Source Codes GP libraries (C, C++, JAVA, …) MATLAB Tool box Web sites http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://www.cs.bham.ac.uk/~cmf/GPLib/GPLib.html http://cs.gmu.edu/~eclab/projects/ecj/ http://cs.gmu.edu/~eclab/projects/ecj/ http://www.geneticprogramming.com/GPpages/softwar e.html http://www.geneticprogramming.com/GPpages/softwar e.html …
20
© 2005 SNU CSE Biointelligence Lab 20 Pay Attention! Due: May 3, 2005 Submission Source code and executable file(s) Proper comments in the source code Via e-mail Report: Hardcopy!! Running environments Results for many experiments with various parameter settings Analysis and explanation about the results in your own way
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.