Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo
2 Outline Background Support Vector Machine Basic theory Ranking SVM Other types of SVM Our proposed framework Experiments Conclusions
3 Background Modern society is fast becoming dependent on software products and systems. Achieving high reliability is one of the most important challenges facing the software industry. Software quality models are in desperate need.
4 Background Software quality model A software quality model is a tool for focusing software enhancement efforts. Such a model yield timely predictions on a module-by-module basis, enabling one to target high-risk modules.
5 Background Software complexity metrics A quantitative description of program attributes. Closely related to the distribution of faults in program modules. Playing a critical role in predicting the quality of the resulting software.
6 Background Software quality prediction Software quality prediction aims to evaluate software quality level periodically and to indicate software quality problems early. Investigating the relationship between the number of faults in a program and its software complexity metrics
7 Several different techniques have been proposed to develop predictive software metrics for the classification of software program modules into fault- prone and non fault-prone categories. Discriminant analysis, Factor analysis, Classification trees, Pattern recognition, EM algorithm, Feedforward neural networks, Random forests Related work
8 The limitation of current models Two categories can not fully reflect the characteristics (human, time, equipment, etc) are limited, some of fault-prone modules should be tested with higher priority An ideal approach is ranking all the modules according to their fault-prone level
9 Research Objectives In search of a well accepted mathematical model for software quality ranking. Lay out the integrated solution of software quality prediction for real-world project. Perform experimental comparison for the assessment of the proposed model.
10 Support Vector Machine Introduced by Vapnik in the late 1960s on the foundation of statistical learning theory Traced back to the classical structural risk minimization (SRM) approach Generalize well even in high dimensional spaces under small training sample conditions
11 The current state-of-the-art classifier Decision Plane Support Vectors Margin Basic theory of SVM
12 The Optimal Separating Hyperplane Place a linear boundary between the two different classes, and orient the boundary in such a way that the margin is maximized. The optimal hyperplane is required to satisfy the following constrained minimization as: Basic theory of SVM
13 The Generalized Optimal Separating Hyperplane For the linearly non-separable case, positive slack variables are introduced: C is used to weight the penalizing variables, and a larger C corresponds to assigning a higher penalty to errors. Basic theory of SVM
14 Rank each sample to an appropriate position. For linear case, find a weight vector w which makes the maximum number of the following inequalities hold: Constrained optimization problem: Ranking SVM
15 Other types of SVM SVM with risk control Transductive Support Vector Machines Support Vector Regression
16 Our framework
17 Experiments Data Description Medical Imaging System (MIS) data set. 11 software complexity metrics were measured for each of the modules Change Reports (CRs) represent faults detected.
18 Total lines of code including comments (LOC) Total code lines (CL) Total character count (TChar) Total comments (TComm) Number of comment characters (MChar) Number of code characters (DChar) Halstead’s program length (N) Halstead’s estimated program length ( ) Jensen’s estimator of program length (N F ) McCabe’s cyclomatic complexity (v(G)) Belady’s bandwidth metric (BW), …… Metrics of MIS data
19 Experiments on Model Selection The later the errors are found, the higher the risk will be Risk increases as time goes by e.g. r(t)=bt 2 r(t)=ae bt
20 Experiments on Model Selection Measure of risk
21 Experiments on Model Selection Software Development Process Simulation, Case1 # of developed software modules are increasing at a speed of 40 modules at each time advancement 10 percent of all the modules have fault data available The modules with fault data for training model The 40 newly developed modules for testing
22 Experiments on Model Selection
23 Experiments on Model Selection Software Development Process Simulation, Case2 # of developed software modules are increasing at a speed of 40 modules at each time advancement The fault data of all the previous modules can be obtained The modules with fault data for training model The 40 newly developed modules for testing
24 Experiments on Model Selection
25 Comparison of ranking models Applied models LOC: Lines of code PCA: Principal Component Analysis Regression tree SVR: Support Vector Regression Ranking SVM Evaluation criteria Normalized Discounted Cumulative Gain (nDCG) Average Distance Measure (ADM)
26 Normalized Discounted Cumulative Gain (nDCG) The Gain (G) of each software module is its fault-prone score
27 Comparison on nDCG measure
28 Average Distance Measure (ADM)
29 Comparison on ADM measure
30 Features of this work Introduce ranking model instead of classification model into software quality prediction Propose an integrated framework of software quality prediction on real-world project Discussion
31 Conclusions Ranking SVM offers a promising technique in software module ranking. The ranking model is more efficient than classification model on the case of enough fault data. For the case of limited fault data, classification model is better than ranking model
The end Thanks Q&A