How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.

How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department of Computer Science University of Houston - Clear Lake Houston, Texas, USA

Preamble The maturing of Software Engineering as a discipline requires a better understanding of the complexity of the software process. Empirical-based modeling is one mechanism for improving the understanding, and thus management of the software process.

Data Starvation Issues in Software Engineering
Heavily context dependent Measure A from Project X  Measure B from Project Y Unreliable data due to poor processes Organizations do not share data Projects are large project estimation data occurs infrequently

Implicitly Data Starved Domains
Lots of this Number of modules Little of that Defect counts

Equalized Learning Balance Data by Replicating Sparse Instances [Mizuno99] 300 Instances of 0 Defects of 5 Defects of 9 Defects 3 Colors = 3 Diff. Instances 300 Instances of 0 Defects 20 Instances/ 5 Defects 10 Instances/9 Defects

Genetic Programming Process - 1
Fitness Value = Model performance on data. 2 (of many) Chromosomes Data + A B * - 3 D 888 out of 1000 913 out of 1000

Genetic Programming Process - 2
Mutation 2 Chromosomes Crossover + B - 3 D * A + A B + B - D 3.1 * A - 3 D

} NASA KC2 Defect Dataset Equalized produces 3013 samples 379 Unique
tuples } Output: Defect Count Input: Product Metrics (Size, Complexity, Vocabulary)

Original versus Equalized Data Experiment Configuration
2000 Characters 1000 Chromosomes 50 Generations Max. 20 Trials

Original versus Equalized Data t-test Results

Conclusions Equalized learning spawns large datasets
Equalized learning produces better models

Future Directions Apply to other NASA datasets
Improve Performance: Distributed GP

How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.

Similar presentations

Presentation on theme: "How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.

Similar presentations

Presentation on theme: "How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department."— Presentation transcript:

Similar presentations

About project

Feedback