Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department.

Similar presentations


Presentation on theme: "How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department."— Presentation transcript:

1 How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department of Computer Science University of Houston - Clear Lake Houston, Texas, USA

2 Preamble The maturing of Software Engineering as a discipline requires a better understanding of the complexity of the software process. Empirical-based modeling is one mechanism for improving the understanding, and thus management of the software process.

3 Data Starvation Issues in Software Engineering
Heavily context dependent Measure A from Project X  Measure B from Project Y Unreliable data due to poor processes Organizations do not share data Projects are large project estimation data occurs infrequently

4 Implicitly Data Starved Domains
Lots of this Number of modules Little of that Defect counts

5 Equalized Learning Balance Data by Replicating Sparse Instances [Mizuno99] 300 Instances of 0 Defects of 5 Defects of 9 Defects 3 Colors = 3 Diff. Instances 300 Instances of 0 Defects 20 Instances/ 5 Defects 10 Instances/9 Defects

6 Genetic Programming Process - 1
Fitness Value = Model performance on data. 2 (of many) Chromosomes Data + A B * - 3 D 888 out of 1000 913 out of 1000

7 Genetic Programming Process - 2
Mutation 2 Chromosomes Crossover + B - 3 D * A + A B + B - D 3.1 * A - 3 D

8 } NASA KC2 Defect Dataset Equalized produces 3013 samples 379 Unique
tuples } Output: Defect Count Input: Product Metrics (Size, Complexity, Vocabulary)

9 Original versus Equalized Data Experiment Configuration
2000 Characters 1000 Chromosomes 50 Generations Max. 20 Trials

10 Original versus Equalized Data t-test Results

11 Conclusions Equalized learning spawns large datasets
Equalized learning produces better models

12 Future Directions Apply to other NASA datasets
Improve Performance: Distributed GP


Download ppt "How to Predict More with Less: Defect Prediction Using Machine Learners in an Implicitly Data Starved Domain Kim Kaminsky Gary D. Boetticher Department."

Similar presentations


Ads by Google