Japan Advanced Institute of Science and Technology

Slides:

Advertisements

Similar presentations

Brief introduction on Logistic Regression

Advertisements

CLUSTERING SUPPORT FOR FAULT PREDICTION IN SOFTWARE Maria La Becca Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy

Software Metrics for Object Oriented Design

Presentation of the Quantitative Software Engineering (QuaSE) Lab, University of Alberta Giancarlo Succi Department of Electrical and Computer Engineering.

Towards Logistic Regression Models for Predicting Fault-prone Code across Software Projects Erika Camargo and Ochimizu Koichiro Japan Institute of Science.

1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Case Studies Instructor Paulo Alencar.

1 Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile.

1 Predicting Bugs From History Software Evolution Chapter 4: Predicting Bugs from History T. Zimmermann, N. Nagappan, A Zeller.

Prediction of fault-proneness at early phase in object-oriented development Toshihiro Kamiya †, Shinji Kusumoto † and Katsuro Inoue †‡ † Osaka University.

Figures – Chapter 24.

March 25, R. McFadyen1 Metrics Fan-in/fan-out Lines of code Cyclomatic complexity Comment percentage Length of identifiers Depth of conditional.

Nov R. McFadyen1 Metrics Fan-in/fan-out Lines of code Cyclomatic complexity* Comment percentage Length of identifiers Depth of conditional.

Object Oriented Design

Page 1 Building Reliable Component-based Systems Chapter 7 - Role-Based Component Engineering Chapter 7 Role-Based Component Engineering.

Analysis of CK Metrics “Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults” Yuming Zhou and Hareton Leung,

Design Metrics Software Engineering Fall 2003 Aditya P. Mathur Last update: October 28, 2003.

The Unified Software Development Process - Workflows Ivar Jacobson, Grady Booch, James Rumbaugh Addison Wesley, 1999.

Object-Oriented Metrics

Empirical Validation of OO Metrics in Two Different Iterative Software Processes Mohammad Alshayeb Information and Computer Science Department King Fahd.

March R. McFadyen1 Software Metrics Software metrics help evaluate development and testing efforts needed, understandability, maintainability.

1 Complexity metrics  measure certain aspects of the software (lines of code, # of if-statements, depth of nesting, …)  use these numbers as a criterion.

Predicting Class Testability using Object-Oriented Metrics M. Bruntink and A. van Deursen Presented by Tom Chappell.

Object Oriented Metrics XP project group – Saskia Schmitz.

Database Complexity Metrics Brad Freriks SWE6763, Spring 2011.

Classification and Prediction: Regression Analysis

1 Prediction of Software Reliability Using Neural Network and Fuzzy Logic Professor David Rine Seminar Notes.

Chidamber & Kemerer Suite of Metrics

CS4723 Software Validation and Quality Assurance

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.

Software Measurement & Metrics

Quality Assessment for CBSD: Techniques and A Generic Environment Presented by: Cai Xia Supervisor: Prof. Michael Lyu Markers: Prof. Ada Fu Prof. K.F.

The CK Metrics Suite. Weighted Methods Per Class b To use this metric, the software engineer must repeat this process n times, where n is the number of.

1 OO Metrics-Sept2001 Principal Components of Orthogonal Object-Oriented Metrics Victor Laing SRS Information Services Software Assurance Technology Center.

The CK Metrics Suite. Weighted Methods Per Class b To use this metric, the software engineer must repeat this process n times, where n is the number of.

A Validation of Object-Oriented Design Metrics As Quality Indicators Basili et al. IEEE TSE Vol. 22, No. 10, Oct. 96.

1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.

Supporting Release Management & Quality Assurance for Object-Oriented Legacy Systems - Lionel C. Briand Visiting Professor Simula Research Labs.

Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.

1 Metrics and lessons learned for OO projects Kan Ch 12 Steve Chenoweth, RHIT Above – New chapter, same Halstead. He also predicted various other project.

An Automatic Software Quality Measurement System.

CSc 461/561 Information Systems Engineering Lecture 5 – Software Metrics.

Measurement and quality assessment Framework for product metrics – Measure, measurement, and metrics – Formulation, collection, analysis, interpretation,

A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.

Daniel Liu & Yigal Darsa - Presentation Early Estimation of Software Quality Using In-Process Testing Metrics: A Controlled Case Study Presenters: Yigal.

Object-Oriented (OO) estimation Martin Vigo Gabriel H. Lozano M.

1 Predicting Classes in Need of Refactoring – An Application of Static Metrics Liming Zhao Jane Hayes 23 September 2006.

Object Oriented Metrics

26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.

1 OO Technical Metrics CIS 375 Bruce R. Maxim UM-Dearborn.

Software Engineering Object Oriented Metrics. Objectives 1.To describe the distinguishing characteristics of Object-Oriented Metrics. 2.To introduce metrics.

1 740f02classsize18 The Confounding Effect of Class Size on the Validity of Object- Oriented Metrics Khaled El Eman, etal IEEE TOSE July 01.

WELCOME TO OUR PRESENTATION UNIFIED MODELING LANGUAGE (UML)

NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.

Design Metrics CS 406 Software Engineering I Fall 2001 Aditya P. Mathur Last update: October 23, 2001.

26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.

Object Oriented Metrics

A Hierarchical Model for Object-Oriented Design Quality Assessment

Assessment of Geant4 Software Quality

Course Notes Set 12: Object-Oriented Metrics

Object-Oriented Metrics

Design Metrics Software Engineering Fall 2003

A Pluggable Tool for Measuring Software Metrics from Source Code

Design Metrics Software Engineering Fall 2003

A UML Approximation of a Subset of the CK Metrics and Their Ability to Predict Faulty Classes CAMARGO CRUZ Ana Erika Advisor: Prof. OCHIMIZU Koichiro July.

Information flow-Test coverage measure

Mei-Huei Tang October 25, 2000 Computer Science Department SUNY Albany

Predict Failures with Developer Networks and Social Network Analysis

Predicting Fault-Prone Modules Based on Metrics Transitions

Software Metrics SAD ::: Fall 2015 Sabbir Muhammad Saleh.

PPT6: Object-oriented design

Presentation transcript:

Japan Advanced Institute of Science and Technology Quality prediction model for object oriented software using UML metrics Ana Erika Camargo, Koichiro Ochimizu Japan Advanced Institute of Science and Technology プレゼンタイトルページのフォーマットプレゼンタイトル著者所属 4th World Congress for Software Quality – Bethesda, Maryland, USA – September 2008

Outline Objective Scope Our Approach Related Work Design complexity metrics and UML Prediction Technique Case Study Conclusions and Future work

Objective To create a model which is able : To predict fault-prone* code in early phases of the life cycle of the software To detect possible defects in the software (*) Fault-prone code: Code capable of having bugs.

Scope Causal-Loop Diagram : Variables Fault-prone code S S S S : Change in the Same Direction O : Change in the Opposite Direction : Scope of this study Causal-Loop Diagram Fault-prone code S S S Wrong Implementation S Complex Implementation Complex Specifications Complex Design O S O O O Wrong Design S 中間ページのフォーマット Slide Title スライドタイトル Use this format for all slides between the title slide and the concluding slide. タイトルページと最終ページの間に使用するスライドには全てこのフォーマットを使ってください Consider one slide with a brief statement of your abstract, followed by an outline slide, and then multiple slides for the body of the presentation. アブストラクトの概略に１スライド、続いて発表内容のアウトラインに１スライド、続いて発表の主文に10-15スライドという割り当てで作成してください These font sizes and lines of text are what is recommended. このテンプレートで使われているフォントサイズと行送りを使用することを推奨します Shortened Presentation Title [slide number] of [total] プレゼンタイトルの短縮したもの [スライド番号] of [スライド総数] Misunderstanding of Requirements Developers' experience Designers' experience O

Design Complexity Metrics Design Complexity Metrics Our Approach *Related existing works Predict Fault-prone code Design Complexity Metrics FROM: Code‏ Approximation: To obtain good candidates of fault-proneness prediction Design Complexity Metrics Predict before coding FROM: UML Artifacts

Related work: Fault prediction Prediction models of fault-proneness: Study Input: Design Complexity Metrics Output Prediction Technique Basili et al. [1996] CK Metrics among others Fault-prone classes Multivariate Logistic Regression Briand et al.[2000] Kanmani et al.[2004] Fault ratio General Regression Neural Network Nachiappan et al.[2005] Multiple Linear Regression Olague et al.[2007] CK, QMOOD 中間ページのフォーマット Slide Title スライドタイトル Use this format for all slides between the title slide and the concluding slide. タイトルページと最終ページの間に使用するスライドには全てこのフォーマットを使ってください Consider one slide with a brief statement of your abstract, followed by an outline slide, and then multiple slides for the body of the presentation. アブストラクトの概略に１スライド、続いて発表内容のアウトラインに１スライド、続いて発表の主文に10-15スライドという割り当てで作成してください These font sizes and lines of text are what is recommended. このテンプレートで使われているフォントサイズと行送りを使用することを推奨します Shortened Presentation Title [slide number] of [total] プレゼンタイトルの短縮したもの [スライド番号] of [スライド総数] CK : Chidamber & Kemerer, QMOOD: Quality Metrics for Object Oriented Design

Related work: Fault prediction From these studies, we identified useful metrics to predict fault-proneness of code : Chidamber and Kemerer – CK Depth of inheritance tree (DIT) Number of children (NOC) Weighted Methods per Class (WMC) Coupling Between objects (CBO) Response for class (RFC) Lack of Cohesion of methods (LCOM) Bansiyana and Davi's quality metrics - QMOOD Average of DIT for all classes in the system (ANA) Class Interface Size (CIS) Data Access Metric (DAM) Direct Class Coupling (DCC) Measure of aggregation (MOA) Measure of functionality abstraction (MFA) Number of methods (NOM - same as WMC)

Related work: UML & Design Complexity Metrics Tang et. al[2002]: Measures CK metrics from data structures , which are created from Rational Rose class, collaboration and activity diagrams. Issue: To obtain accurate measures, assumptions are made, related to the level of details in the diagrams. For example: one activity diagram per operation in the system is required

Related work: UML & Design Complexity Metrics Baroni [2002]: formal definition of CK and QMOOD metrics, among others. This work uses UML class diagrams. Issues: RFC, LCOM calculations are code dependent CBO calculation, does not have a clear inclusion of methods used or variables instantiated of different classes, within every method of a class.

UML & Design Complexity Metrics *Related existing works Predict Fault-prone code Design Complexity Metrics FROM: Code‏ Approximation: To obtain good candidates of fault-proneness prediction Design Complexity Metrics Predict before coding FROM: UML Artifacts

UML & Design Complexity Metrics Design complexity metrics that can be approximated using UML class diagrams: Chidamber and Kemerer – CK Weighted Methods per Class (WMC) Depth of inheritance tree (DIT) Number of children (NOC) Coupling Between objects (CBO) Response for class (RFC) Lack of Cohesion of methods (LCOM) Bansiyana and Davi's quality metrics - QMOOD Average of DIT for all classes in the system (ANA) Class Interface Size (CIS) Data Access Metric (DAM) Direct Class Coupling (DCC) Measure of aggregation (MOA) Measure of functionality abstraction (MFA) Number of methods (NOM - same as WMC) Can be obtained straightforward from CLASS Diagrams Cannot be calculated precisely from CLASS Diagrams. Implementation of the bodies of the classes is needed.

UML & Design Complexity Metrics CBO Approximation CBO-code: Num. Classes Couple to a given Class * CBO-UML Approach 1 (UML Collaboration Diagram): A count of all messages Sent to different objects CBO-UML Approach 2 (UML Collaboration Diagram): The same as Approach 1, but eliminating those which RETURN a value. (*) If a method within a class uses a method or instance of a variable of a different class, it is said that this pair of classes is coupled

UML & Design Complexity Metrics CBO Approximation R7: fundsStatus : = CommtiFunds() aCustomer

UML & Design Complexity Metrics CBO Evaluation using an e-commerce system (*). (*) Described in: Gomaa Hassan, Designing Concurrent, Distributed, and Real-Time Applications with UML, Addison Wesley-Object Technology Series Editors, July 2000.

UML & Design Complexity Metrics CBO Evaluation For CBO-code and CBO-UML Approach 1 correlation coefficient = 0.81 For CBO-code and CBO-UML Approach 2 correlation coefficient = 0.89 CBO-UML Approach 2 is slightly more linear to CBO-code

UML & Design Complexity Metrics RFC Approximation RFC-code: Num. of Methods of a given class + Num. of methods of other classes directly called by any of the methods of the given class. RFC-UML Approach 1 (UML Collaboration Diagrams): Messages Received + Messages Sent RFC-UML Approach 2 (UML Collaboration & Class Diagrams): (Messages Received + Number of attributes*2) + Messages Sent, where: (Messages Received + Number of attributes*2) ~ Num. of Methods of a given class. Considering 2 public methods per attribute to get and to set its value.

UML & Design Complexity Metrics RFC Approximation class C { A a; void m() { D d ; d.dosth(); …….. } void setA (A a) { this.a = a; A getA() { return a; dosth() c d m() x RFC (C) = 3 + 1 = 4

UML & Design Complexity Metrics RFC Evaluation using the same e-commerce system.

UML & Design Complexity Metrics RFC Evaluation For RFC-Code and RFC-UML Approach 1 correlation coefficient = -0.07 For RFC-Code and RFC-UML Approach 2 correlation coefficient = 0.67 RFC-UML Approach 2 has a stronger linear relationship with RFC-Code

UML & Design Complexity Metrics Remark If true that our 2nd approach’s assumption might not be all valid, it still obtained an acceptable performance. Which might be explained to the fact that private attributes in a class are moderate correlated to its number of methods, according to Olague’s research [2007].

UML & Design Complexity Metrics Design complexity metrics that can be approximated using UML diagrams: Chidamber and Kemerer – CK Weighted Methods per Class (WMC) Depth of inheritance tree (DIT) Number of children (NOC) Coupling Between objects (CBO) Response for class (RFC) Lack of Cohesion of methods (LCOM) Bansiyana and Davi's quality metrics - QMOOD Average of DIT for all classes in the system (ANA) Class Interface Size (CIS) Data Access Metric (DAM) Direct Class Coupling (DCC) Measure of aggregation (MOA) Measure of functionality abstraction (MFA) Number of methods (NOM - same as WMC) Can be obtained straightforward from CLASS Diagrams Can be approximated by using COLLABORATION Diagrams Can not be approximated precisely using UML Diagrams

Design Complexity Metrics (12) Prediction Technique Related existing works Predict : How? Design Complexity Metrics (12) FROM: UML Artifacts Fault-prone code Design Complexity Metrics (13) Predict FROM: Code‏ Approximation

Prediction Technique Logistic Regression Use. When we have one variable (y) with two values (e.g. faulty /no faulty, 1/0) and one or more measurement variables (xs). Goal. To predict the probability of getting a particular value of y , given xs variables, through a logit model. Key Points. No assumptions on the distribution of variables are made.

Prediction Technique Logistic Regression

Prediction Technique Example. We want to estimate the probability of a class to be highly FAULTY, in terms of a design complexity metric: Mx.

Prediction Technique Faulty: Most Faulty (MF) = 1 Least Faulty (LF) = 2 Design complexity Metric: Mx CLASS FAULTY Mx CLASS FAULTY Mx ---------------------------------------- --------------------------------------------- 1 1 13 2 1 1 1 14 2 0 1 1 15 2 0 1 1 16 2 0 1 1 17 2 0 1 1 18 2 0 1 1 19 2 0 1 1 20 2 0 1 1 21 2 0 1 1 22 2 0 1 0 23 2 0 1 0 24 2 0 CLASS Mx=1 Mx=0 Total -------------------------------------------- MF=1 10 2 12 LF=2 1 11 12 Total 11 13 24

Prediction Technique Probabilities CLASS Mx=1 Mx=0 Total -------------------------------------------- MF 10 2 12 LF 1 11 12 Total 11 13 24 Probabilities The probability of any given CLASS will be MF: P(MF) = 12 /24 = 0.50 The probability of any given CLASS will be MF given that Mx=1: P(MF|Mx=1) = 10/11= 0.909 The probability of any given CLASS will be MF given that Mx=0: P(MF|Mx=0) = 2/13= 0.154

Prediction Technique Odds The odds of a CLASS being MF: CLASS Mx=1 Mx=0 Total -------------------------------------------- MF 10 2 12 LF 1 11 12 Total 11 13 24 Odds The odds of a CLASS being MF: Odds(MF) = 12 /12 = 1 The odds of a CLASS being MF given that Mx=1 : Odds(MF| Mx=1) = 10/1= 10 …. (1) The odds of a CLASS being MF given that Mx=0 : Odds(MF| Mx=0) = 2/11= 0.182 … (2)

Prediction Technique Odds and Probabilities provide the same information but in different ways. It is easy to convert odds y probabilities and vice-versa, e.g. : 10 odds (MF| Mx=1) = = 0.909 P(MF| Mx=1) = 1+10 1 + odds (MF| Mx=1) 0.909 P (MF| Mx=1) = = 10 Odds(MF| Mx=1) = 1-0.909 1 - P (MF| Mx=1)

Prediction Technique Applying the natural log of (1) and (2) : ln [ Odds(MF|Mx=1) ] = ln ( 10 ) = 2.303 …………(3) ln [ Odds(MF|Mx=0) ] = ln (0.182) = -1.704 ………(4) We can generalize (3) and (4) in the following: ln[ Odds(MF|Mx) ] = A + B*Mx ………..(5) From (3) and (5), when Mx = 1: ln[ Odds(MF|Mx) ] = A + B = 2.303 ….(6) From (4) and (5), when Mx=0: ln[ Odds(MF|Mx) ] = A = -1.704 ……..(7) From (6) and (7): A = -1.704 , B = 4.007 Finally we can re-write (5) as follows: ln[ Odds(MF|Mx) ] = -1.704 + 4.007 *Mx

ln[ Odds(MF|Mx) ] = -1.704 + 4.007 *Mx Prediction Technique ln[ Odds(MF|Mx) ] = -1.704 + 4.007 *Mx If: We can re-write our final equations as: 1 - p Odds(MF|Mx) = ; p = P (MF|Mx) p p 1 - p ln [ ] = -1.704 + 4.007 *Mx 1 (1+e-(-1.704+4.007Mx) ) p = P (MF|Mx) =

Design Complexity Metrics (12) Case study Related existing works Design Complexity Metrics (12) FROM: UML Artifacts Fault-prone code Design Complexity Metrics (13) Predict FROM: Code‏ Approximation Predict using: Logistic Regression Are the candidate UML metrics good enough to predict fault-proneness?

Case study Objective: Estimate the probability of having a faulty class during the testing phase, using Logistic Regression.

Case study Description. Using the design and implementation of the e-commerce system described in Gomaa’s book, this case study was carried out as follows: Collection of UML and Code metrics (Xs) Collection of data related to the faults of the e-commerce system from the logs of the CVS repository used (Y) Evaluation of the relationship between each metric to fault-proneness, using Univariate Logistic Models

Case study Metrics to evaluate. Due to the manner the e-commerce system was designed and implemented, without inheritance classes: SUITE Code Metric Level Inheritance Metric UML Metric to evaluate QMOOD Average Number of Ancestors (ANA) System Yes  Measure of Aggregation (MOA) No Class Interface Size (CIS)* Class  Data Access of Metric (DAM) Direct Class Coupling (DCC) Measure of Functional Abstraction (MFA) CK Number of Methods (NOM) = Weighted Methods per class (WMC) * Depth of Inheritance (DIT) Number of Children (NOC) Response For Class (RFC)* Coupling Between Objects (CBO) (*) Were found good predictors of fault-prone code in Olague’s work [2007].

Case study Estimation of the probability of a class of being faulty, using CBO-code. Correctness: 12/13 classes 92.3% classes correct classified Sensitivity: 8/9 faulty classes 88.8% Faulty classes correct classified Specificity: 4/4 no-faulty classes 100% No-faulty classes correct classified

Case study Results. From the univariate models using each one of the metrics proposed. Metrics Correctness [classes] Sensitivity [ faulty classes] Specificity [no-faulty classes] CBO-code 92.3 % 88.88% 100% CBO-UML(1) 69.2% 66.66% 75% CBO-UML(2) 55.55% RFC-code 84.61% RFC-UML(1) 76.92% 77.77% RFC-UML(2) WMC-code 90.9% 85.7% WMC-UML 72.7% 71.42% CIS-code CIS-UML DAM-code 36.3% 57.14% 0% DAM-UML 85.7 50% CIS: Public Methods in a class DAM: Ratio of number of private and protected attributes to the total number of attributes DCC measures were not significant for this study

Case study Results Our second approach to approximate RFC with UML diagrams performed equally to the RFC metric measured from code UML CIS approximation performed similarly to the CIS metric measured from the code The rest of the UML metrics’ performance was somewhat acceptable

Case study Can we apply the obtained models to other case studies? System Metrics Correctness [classes] Sensitivity [ faulty classes] Specificity [no-faulty classes] E-commerce CBO-UML(1) 69.2% 66.66% 75% Banking 72.7% 100% 50% RFC-UML(1) 76.92% 77.77% CBO-UML(2) 55.55% 63.6% 80% RFC-UML(2) 84.61% 88.88% 66.6%

Conclusions and Future work UML metrics can be acceptable predictors of fault-prone code UML CIS and UML RFC metrics showed strong relationship to fault-proneness of code We might be able to create a more robust model to predict fault-prone code before its implementation.

Conclusions and Future work Further study and evaluation of other metrics using other UML artifacts (e.g. sequence diagrams, state diagrams and description of use cases) is needed. Construction of a more robust model using multivariate logistic regression Evaluation of the final model obtained, using different study cases

Camargo Ana Erika erika.camargo@jaist.ac.jp Quality prediction model for object oriented software using UML metrics Camargo Ana Erika erika.camargo@jaist.ac.jp プレゼン最終ページのフォーマットプレゼンタイトル著者所属 If you wish to display an institutional or corporate logo, please limit it to this concluding slide. 所属団体や企業のロゴを掲示したい場合、この最終ページ上のみとして下さい