Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston.

Similar presentations


Presentation on theme: "The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston."— Presentation transcript:

1 The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration Kim Kaminsky Kaminsky@uhcl.edu Univ. of Houston - Clear Lake, Houston, TX, USA

2 About the Author: Gary D. Boetticher http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration  Ph.D. in Machine Learning and Software Engineering A neural network-based software reuse economic model  Executive member of IEEE Reuse Standard Committees (1990s)  Commercial consultant: U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, …  Currently: Associate Professor Department of Comp. Science/Software Engineering University of Houston - Clear Lake, Houston, TX, USA boetticher@uhcl.edu  Research interests: Data mining, ML, Computational Bioinformatics, and Software metrics

3 Motivating Questions Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems? If so, how could these insights be utilized to make better breeding decisions? http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

4 2) Determine the fitness for each (1 /Stand. Error) http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration Genetic Program Overview X, Y, and Z  RESULT? XYZRESULT 24530 53216 :::: 13624 1) Create a population of equations Eq#Equation 1X+Y 2(Z-X)*Y+X :: 1000(X*X)-Z 87 84 : 57 3) Breed Equations X + Y (Z-X) * Y+X (Z-X) + Y X * Y+X 4) Generate new populations and breed until a solution is found

5 Genetic Program Overview EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZYZY 75 :: Y22 Y - X18 Generation N Generation N+1 EquationFitness (X - Z) (X + Y) * (Y * Y) Z + Y : X Y + Y Why discard legacy information? http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

6 Goal: Examine fitness patterns over time EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W44 EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W44 EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W44 http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration Generation 1 Generation 2 Generation 3 Localized? Volatile?

7 Proof of Concept Experiments - 1 5 experiments using synthetic equations: Z = W + X + Y Z = 2 * X + Y – W Z = X / Y Z = X 3 Z = W 2 + W * X - Y Data slightly perturbed to prevent premature convergence Genetic Program 1000 Chromosomes (Equations) 50 Generations Breeding based on fitness rank http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

8 Proof of Concept Experiments - 2 For the 1000 Chromosomes: Divide into 5 groups of 200 (by fitness) Focus on the best, middle, and worst groups See where each group’s offspring occur in the next generation http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

9 Results for Z = W + X + Y Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

10 Results for Z = 2 * X + Y – W Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

11 Results for Z = X / Y Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

12 Results for Z = X 3 Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

13 Results for Z = W 2 + W * X - Y Best Middle Worst http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration

14 Applied Experiments Best class produces best offspring. Now what? Compare 2 Genetic Programs (GPs) 1) Use a vanilla-based GP 2) Use a GP that breeds only the top 20% of a population and replicates 5 times. http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration Genetic Program 1000 Chromosomes (Equations) 50 Generations 20 Trials Equations to model Z = Sin(W) + Sin(X) + Sin(Y) Z = log 10 (W X ) + (Y * Z)

15 Results for Z = Sin(W) + Sin(X) + Sin(Y) http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration Vanilla-Based GP Lineage-Based GP Average Fitness591.8740.9 Average r 2 0.87340.9315 Ave. Generations needed to complete 29.1 28.5

16 Results for Z = log 10 (W X ) + (Y * Z) http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration Vanilla-Based GP Lineage-Based GP Average Fitness210.9346.5 Average r 2 0.72440.8069 Ave. Generations needed to complete 50.0 48.6

17 Conclusions http://nas.cl.uh.edu/boetticher/publications.htmlThe 2006 IEEE International Conference on Information Reuse and Integration Proof of concept experiments demonstrate the viability of considering lineage in GPs Applied experiments show that lineage-based GP modeling produce better results faster


Download ppt "The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston."

Similar presentations


Ads by Google