Download presentation
Presentation is loading. Please wait.
Published byHortense Walters Modified over 9 years ago
1
Introduction to Defect Prediction Cmpe 589 Spring 2008
2
Problem 1 How to tell if the project is on schedule and within budget? Earned-value charts.
3
Problem 2 How hard will it be for another organization to maintain this software? McCabe Complexity
4
Problem 3 How to tell when the subsystems are ready to be integrated Defect Density Metrics.
5
Problem Definition Software development lifecycle: Requirements Design Development Test (Takes ~50% of overall time) Detect and correct defects before delivering software. Test strategies: Expert judgment Manual code reviews Oracles/ Predictors as secondary tools
6
Problem Definition
7
Testing
8
Defect Prediction 2-Class Classification Problem. Non-defective If error = 0 Defective If error > 0 2 things needed: Raw data: Source code Software Metrics -> Static Code Attributes
10
Static Code Attributes void main() { //This is a sample code //Declare variables int a, b, c; // Initialize variables a=2; b=5; //Find the sum and display c if greater than zero c=sum(a,b); if c < 0 printf(“%d\n”, a); return; } int sum(int a, int b) { // Returns the sum of two numbers return a+b; } c > 0 c ModuleLOCLOCCVCCError main()164522 sum()51310 LOC: Line of Code LOCC: Line of commented Code V: Number of unique operands&operators CC: Cyclometric Complexity
13
+
14
Defect Prediction Machine Learning based models. Defect density estimation Regression models: error pronness First classification then regression Defect prediction between versions Defect prediction for embedded systems
15
Constructing Predictors Baseline: Naive Bayes. Why?: Best reported results so far (Menzies et al., 2007) Remove assumptions and construct different models. Independent Attributes ->Multivariate dist. Attributes of equal importance
16
Weighted Naive Bayes Naive Bayes Weighted Naive Bayes
17
Datasets Name# Features#ModulesDefect Rate(%) CM1 38 5059 PC1 38 11076 PC2 38 55890.6 PC3 38 156310 PC4 38 145812 KC3 38 4589 KC4 38 1254040 MW1 38 4039
18
Performance Measures Defects Actual noyes Prd no AB yes CD Accuracy: (A+D)/(A+B+C+D) Pd (Hit Rate): D / (B+D) Pf (False Alarm Rate): C / (A+C)
20
Results: InfoGain&GainRatio Data WNB+IG (%)WNB+GR (%)IG+NB (%) pdpdpfbalpdpfbalpdpfbal CM1823970823970833274 PC1693567693567401257 PC2721577662072 1577 PC3803571813572601570 PC4882779872481922978 KC3802776833076481562 KC4773570783571793372 MW1703866683467440760 Avg:773172773272652061
21
Results: Weight Assignments
22
Benefiting from defect data in practice Within Company vs Cross Company Data Investigated in cost estimation literature No studies in defect prediction! No conclusions in cost estimation… Straight forward interpretation of results in defect prediction. Possible reason: well defined features.
23
How much data do we need? Consider: Dataset size:1000 Defect rate: 8% Training instances: %90 1000*8%*90%=72 defective instances (1000-72) non-defective instances
24
Intelligent data sampling With random sampling of 100 instances we can learn as well as thousands. Can we increase the performance with wiser sampling strategies? Which data? Practical aspects: Industrial case study.
25
ICSOFT’07 WC vs CC Data? When to use WC or CC? How much data do we need to construct a model?
26
ICSOFT’07
28
Module Structure vs Defect Rate Fan-in, fan-out Page Rank Algorithm Call graph information on the code “small is beautiful”
29
Performance vs. Granularity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.