The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Univ. of Houston - Clear Lake, Houston, TX, USA IEEE International Conference on Information Reuse and Integration Kim Kaminsky Univ. of Houston - Clear Lake, Houston, TX, USA
About the Author: Gary D. Boetticher IEEE International Conference on Information Reuse and Integration Ph.D. in Machine Learning and Software Engineering A neural network-based software reuse economic model Executive member of IEEE Reuse Standard Committees (1990s) Commercial consultant: U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … Currently: Associate Professor Department of Comp. Science/Software Engineering University of Houston - Clear Lake, Houston, TX, USA Research interests: Data mining, ML, Computational Bioinformatics, and Software metrics
Motivating Questions Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems? If so, how could these insights be utilized to make better breeding decisions? IEEE International Conference on Information Reuse and Integration
2) Determine the fitness for each (1 /Stand. Error) IEEE International Conference on Information Reuse and Integration Genetic Program Overview X, Y, and Z RESULT? XYZRESULT :::: ) Create a population of equations Eq#Equation 1X+Y 2(Z-X)*Y+X :: 1000(X*X)-Z : 57 3) Breed Equations X + Y (Z-X) * Y+X (Z-X) + Y X * Y+X 4) Generate new populations and breed until a solution is found
Genetic Program Overview EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZYZY 75 :: Y22 Y - X18 Generation N Generation N+1 EquationFitness (X - Z) (X + Y) * (Y * Y) Z + Y : X Y + Y Why discard legacy information? IEEE International Conference on Information Reuse and Integration
Goal: Examine fitness patterns over time EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W44 EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W44 EquationFitness (X+Y)87 (X - Z) * (Y * Y)86 ZY85 (X - Z) * (Y * Y)84 Y79 Y - X75 Z + Y75 (X - Z) * (Y * Y)75 Y73 Y - X71 (X - Z) * (Y * Y) + W + W68 Y - X67 ZY66 (X - Z) * (Y * Y)66 Y65 Y - X65 (X - Z) * (Y * Y) + W + W64 Y - X64 Z - Y62 (X - Z) * (Y * Y)59 Y58 Y - X55 (X - Z) * (Y * Y) + W + W IEEE International Conference on Information Reuse and Integration Generation 1 Generation 2 Generation 3 Localized? Volatile?
Proof of Concept Experiments experiments using synthetic equations: Z = W + X + Y Z = 2 * X + Y – W Z = X / Y Z = X 3 Z = W 2 + W * X - Y Data slightly perturbed to prevent premature convergence Genetic Program 1000 Chromosomes (Equations) 50 Generations Breeding based on fitness rank IEEE International Conference on Information Reuse and Integration
Proof of Concept Experiments - 2 For the 1000 Chromosomes: Divide into 5 groups of 200 (by fitness) Focus on the best, middle, and worst groups See where each group’s offspring occur in the next generation IEEE International Conference on Information Reuse and Integration
Results for Z = W + X + Y Best Middle Worst IEEE International Conference on Information Reuse and Integration
Results for Z = 2 * X + Y – W Best Middle Worst IEEE International Conference on Information Reuse and Integration
Results for Z = X / Y Best Middle Worst IEEE International Conference on Information Reuse and Integration
Results for Z = X 3 Best Middle Worst IEEE International Conference on Information Reuse and Integration
Results for Z = W 2 + W * X - Y Best Middle Worst IEEE International Conference on Information Reuse and Integration
Applied Experiments Best class produces best offspring. Now what? Compare 2 Genetic Programs (GPs) 1) Use a vanilla-based GP 2) Use a GP that breeds only the top 20% of a population and replicates 5 times IEEE International Conference on Information Reuse and Integration Genetic Program 1000 Chromosomes (Equations) 50 Generations 20 Trials Equations to model Z = Sin(W) + Sin(X) + Sin(Y) Z = log 10 (W X ) + (Y * Z)
Results for Z = Sin(W) + Sin(X) + Sin(Y) IEEE International Conference on Information Reuse and Integration Vanilla-Based GP Lineage-Based GP Average Fitness Average r Ave. Generations needed to complete
Results for Z = log 10 (W X ) + (Y * Z) IEEE International Conference on Information Reuse and Integration Vanilla-Based GP Lineage-Based GP Average Fitness Average r Ave. Generations needed to complete
Conclusions IEEE International Conference on Information Reuse and Integration Proof of concept experiments demonstrate the viability of considering lineage in GPs Applied experiments show that lineage-based GP modeling produce better results faster