Kim Kaminsky Gary D. Boetticher Department of Computer Science Building a Genetically Engineerable Evolvable Program (GEEP) Using Breadth-Based Explicit Knowledge for Predicting Software Defects Kim Kaminsky Gary D. Boetticher Department of Computer Science University of Houston - Clear Lake Houston, Texas, USA
Key Concept: Duality in Research
A Genetically Engineerable Evolvable Program (GEEP) Genetic Program Process - 1 Fitness Value = Model performance on data. 2 (of many) Chromosomes Data + A B * - 3 D 888 out of 1000 913 out of 1000
A Genetically Engineerable Evolvable Program (GEEP) Genetic Program Process - 2 Mutation 2 Chromosomes Crossover + B - 3 D * A + A B + B - D 3.1 * A - 3 D
A Genetically Engineerable Evolvable Program (GEEP) GEEP uses Domain Specific Equations Pythag. Thm. Cosine
Breadth-Based Explicit Knowledge Akiyama D = 4.86 + 0.018 * L D = 0.12 * C – 0.84 Compton D = 0.069 + 0.00156*L + 0.00000047*L2 Gaffney D = 4.2 + 0.0015*(L)4/3 Halstead D = Volume / 3,000 Lipow D = L * (0.000844 + (0.0007842) ln2* L + (0.00001546)ln2L)
The GEEP Experiments Akiyama Compton Gaffney Halstead No External All Lipow All Models No External Knowledge
} } NASA Data Description NASA Defect Data (KC2 Dataset) 379 Unique tuples } Input: Product Metrics (Size, Complexity, Vocaulary) } Output: Defect Count
Experiment Configuration 30 Trials 128 Generations Max. 512 Chromosomes Initial Maximum Tree Height = 6 Fitness = 1 - Standard error
Experimental Results Akiyama Compton Gaffney Halstead No External All Lipow All Models 0.0169 0.0009 0.0012 0.0003 0.0823 0.1626 No External Knowledge 0.0006 8.023E-06 0.0246 1.733E-06
Discussion
Conclusions Feasibility of leveraging explicit knowledge Data Mining Algorithm Mining
Future Directions Apply to other NASA Defect Datasets Equation Reduction ((((Lipow((Compton((9 + 8) ^ ( iv(g) - 9)) + d )) + (AkiyamaLoc((Compton(Gaffney(Lipow(9) ^ lOBlank )) ^ Lipow(b) ^ Lipow(((6 * AkiyamaLoc ((Gaffney((6 * AkiyamaLoc((Lipow(9)) + Lipow((Compton(AkiyamaComp((( Halstead((Compton(( Halstead(Lipow(9) + AkiyamaLoc( AkiyamaLoc(l)) - Halstead(AkiyamaComp( iv(g) ) * Halstead(Compton( AkiyamaComp( AkiyamaLoc(Compton(3)) ^ AkiyamaLoc(Halstead( AkiyamaComp( iv(g) ) ^ Gaffney((lOComment ^ (Gaffney(Lipow(2)) ^ (Compton(6) ^ AkiyamaLoc((6 - 10)) + Compton(Gaffney(9) * Compton(Lipow(iv(g))) ^ Lipow(((6 * AkiyamaLoc((Gaffney(Lipow(((6 * AkiyamaLoc((Gaffney( Lipow((Compton(AkiyamaComp(AkiyamaLoc(Compton(AkiyamaLoc(l) ^ Gaffney(((Halstead((AkiyamaLoc(AkiyamaLoc((3 ^ lOBlank)) - 7)) * Halstead(AkiyamaComp( iv(g))^ (Gaffney(Lipow(2)) ^ (Compton(6) ^ AkiyamaLoc((6 - 10)) + Compton(3) ^ Gaffney(10))) + Compton(l) * Compton(l ))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) Performance Improvement