Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Introduction to Support Vector Machines (SVM)
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
Support Vector Machines
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVM Support Vectors Machines
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
An Introduction to Support Vector Machines Martin Law.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Handwritten digit recognition
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
An Introduction to Support Vector Machine (SVM)
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Support Vector Machines Tao Department of computer science University of Illinois.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Lecture 14. Outline Support Vector Machine 1. Overview of SVM 2. Problem setting of linear separators 3. Soft Margin Method 4. Lagrange Multiplier Method.
1 An introduction to support vector machine (SVM) Advisor : Dr.Hsu Graduate : Ching –Wen Hong.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Experience Report: System Log Analysis for Anomaly Detection
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
CS 9633 Machine Learning Support Vector Machines
Basic machine learning background with Python scikit-learn
An Introduction to Support Vector Machines
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
COSC 4335: Other Classification Techniques
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Clinical Data
Presentation transcript:

Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo

2 Outline Background Support Vector Machine Basic theory Ranking SVM Other types of SVM Our proposed framework Experiments Conclusions

3 Background Modern society is fast becoming dependent on software products and systems. Achieving high reliability is one of the most important challenges facing the software industry. Software quality models are in desperate need.

4 Background Software quality model A software quality model is a tool for focusing software enhancement efforts. Such a model yield timely predictions on a module-by-module basis, enabling one to target high-risk modules.

5 Background Software complexity metrics A quantitative description of program attributes. Closely related to the distribution of faults in program modules. Playing a critical role in predicting the quality of the resulting software.

6 Background Software quality prediction Software quality prediction aims to evaluate software quality level periodically and to indicate software quality problems early. Investigating the relationship between the number of faults in a program and its software complexity metrics

7 Several different techniques have been proposed to develop predictive software metrics for the classification of software program modules into fault- prone and non fault-prone categories. Discriminant analysis, Factor analysis, Classification trees, Pattern recognition, EM algorithm, Feedforward neural networks, Random forests Related work

8 The limitation of current models Two categories can not fully reflect the characteristics (human, time, equipment, etc) are limited, some of fault-prone modules should be tested with higher priority An ideal approach is ranking all the modules according to their fault-prone level

9 Research Objectives In search of a well accepted mathematical model for software quality ranking. Lay out the integrated solution of software quality prediction for real-world project. Perform experimental comparison for the assessment of the proposed model.

10 Support Vector Machine Introduced by Vapnik in the late 1960s on the foundation of statistical learning theory Traced back to the classical structural risk minimization (SRM) approach Generalize well even in high dimensional spaces under small training sample conditions

11 The current state-of-the-art classifier Decision Plane Support Vectors Margin Basic theory of SVM

12 The Optimal Separating Hyperplane Place a linear boundary between the two different classes, and orient the boundary in such a way that the margin is maximized. The optimal hyperplane is required to satisfy the following constrained minimization as: Basic theory of SVM

13 The Generalized Optimal Separating Hyperplane For the linearly non-separable case, positive slack variables are introduced: C is used to weight the penalizing variables, and a larger C corresponds to assigning a higher penalty to errors. Basic theory of SVM

14 Rank each sample to an appropriate position. For linear case, find a weight vector w which makes the maximum number of the following inequalities hold: Constrained optimization problem: Ranking SVM

15 Other types of SVM SVM with risk control Transductive Support Vector Machines Support Vector Regression

16 Our framework

17 Experiments Data Description Medical Imaging System (MIS) data set. 11 software complexity metrics were measured for each of the modules Change Reports (CRs) represent faults detected.

18 Total lines of code including comments (LOC) Total code lines (CL) Total character count (TChar) Total comments (TComm) Number of comment characters (MChar) Number of code characters (DChar) Halstead’s program length (N) Halstead’s estimated program length ( ) Jensen’s estimator of program length (N F ) McCabe’s cyclomatic complexity (v(G)) Belady’s bandwidth metric (BW), …… Metrics of MIS data

19 Experiments on Model Selection The later the errors are found, the higher the risk will be Risk increases as time goes by e.g. r(t)=bt 2 r(t)=ae bt

20 Experiments on Model Selection Measure of risk

21 Experiments on Model Selection Software Development Process Simulation, Case1 # of developed software modules are increasing at a speed of 40 modules at each time advancement 10 percent of all the modules have fault data available The modules with fault data for training model The 40 newly developed modules for testing

22 Experiments on Model Selection

23 Experiments on Model Selection Software Development Process Simulation, Case2 # of developed software modules are increasing at a speed of 40 modules at each time advancement The fault data of all the previous modules can be obtained The modules with fault data for training model The 40 newly developed modules for testing

24 Experiments on Model Selection

25 Comparison of ranking models Applied models LOC: Lines of code PCA: Principal Component Analysis Regression tree SVR: Support Vector Regression Ranking SVM Evaluation criteria Normalized Discounted Cumulative Gain (nDCG) Average Distance Measure (ADM)

26 Normalized Discounted Cumulative Gain (nDCG) The Gain (G) of each software module is its fault-prone score

27 Comparison on nDCG measure

28 Average Distance Measure (ADM)

29 Comparison on ADM measure

30 Features of this work Introduce ranking model instead of classification model into software quality prediction Propose an integrated framework of software quality prediction on real-world project Discussion

31 Conclusions Ranking SVM offers a promising technique in software module ranking. The ranking model is more efficient than classification model on the case of enough fault data. For the case of limited fault data, classification model is better than ranking model

The end Thanks Q&A