Ehsan Salamati Taba, Foutse Khomh, Ying Zou, Meiyappan Nagappan, Ahmed E. Hassan 1.

Slides:



Advertisements
Similar presentations
Ranking Refactoring Suggestions based on Historical Volatility Nikolaos Tsantalis Alexander Chatzigeorgiou University of Macedonia Thessaloniki, Greece.
Advertisements

A Metric for Software Readability by Raymond P.L. Buse and Westley R. Weimer Presenters: John and Suman.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Min Zhang School of Computer Science University of Hertfordshire
MetriCon 2.0 Correlating Automated Static Analysis Alert Density to Reported Vulnerabilities in Sendmail Michael Gegick, Laurie Williams North Carolina.
1 Predicting Bugs From History Software Evolution Chapter 4: Predicting Bugs from History T. Zimmermann, N. Nagappan, A Zeller.
TA-RE 1 : An Exchange Language for Mining Software Repositories Sunghun Kim, Thomas Zimmermann, Miryung Kim, Ahmed Hassan, Audris Mockus, Tudor Girba,
What causes bugs? Joshua Sunshine. Bug taxonomy Bug components: – Fault/Defect – Error – Failure Bug categories – Post/pre release – Process stage – Hazard.
1 Software Maintenance and Evolution CSSE 575: Session 8, Part 3 Predicting Bugs Steve Chenoweth Office Phone: (812) Cell: (937)
Memories of Bug Fixes Sunghun Kim, Kai Pan, and E. James Whitehead Jr., University of California, Santa Cruz Presented By Gleneesha Johnson CMSC 838P,
Multiple Regression Models
University of Southern California Center for Systems and Software Engineering © 2009, USC-CSSE 1 Assessing and Estimating Corrective, Enhancive, and Reductive.
Chapter 15: Model Building
Sustainment Management Systems
A Comparative Analysis of the Efficiency of Change Metrics and Static Code Attributes for Defect Prediction Raimund Moser, Witold Pedrycz, Giancarlo Succi.
1 Software Maintenance and Evolution CSSE 575: Session 8, Part 2 Analyzing Software Repositories Steve Chenoweth Office Phone: (812) Cell: (937)
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
CS4723 Software Validation and Quality Assurance
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
Aim: How do we test hypothesis? HW#6: complete last slide.
Statistical Power The ability to find a difference when one really exists.
Japan Advanced Institute of Science and Technology
Chapter 6 : Software Metrics
Security of Open Source Web Applications Maureen Doyle, James Walden Northern Kentucky University Students: Grant Welch, Michael Whelan Acknowledgements:
1 Quality Center 10.0 NOTE: Uninstall the current version of QC before downloading QC All QC 10.0 documents can be located on the BI Shared Services.
Introduction to Defect Prediction Cmpe 589 Spring 2008.
Simulation – Stat::Fit
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
Software Measurement & Metrics
Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10 th August 2011.
1 Prices of Antique Clocks Antique clocks are sold at auction. We wish to investigate the relationship between the age of the clock and the auction price.
01/20151 EPI 5344: Survival Analysis in Epidemiology Survival curve comparison (non-regression methods) March 3, 2015 Dr. N. Birkett, School of Epidemiology,
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
1 The Personal Software Process Estimation Based on Real Data* * Would Martin Fowler approve? “I want you to take this personally…”
1 Metrics and lessons learned for OO projects Kan Ch 12 Steve Chenoweth, RHIT Above – New chapter, same Halstead. He also predicted various other project.
THE IRISH SOFTWARE ENGINEERING RESEARCH CENTRELERO© What we currently know about software fault prediction: A systematic review of the fault prediction.
Maureen Doyle, James Walden Northern Kentucky University Students: Grant Welch, Michael Whelan Acknowledgements: Dhanuja Kasturiratna.
CSc 461/561 Information Systems Engineering Lecture 5 – Software Metrics.
Evolution in Open Source Software (OSS) SEVO seminar at Simula, 16 March 2006 Software Engineering (SU) group Reidar Conradi, Andreas Røsdal, Jingyue Li.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
+ Moving Targets: Security and Rapid-Release in Firefox Presented by Carlos Bernal-Cárdenas.
Warm-up Wednesday, You are a scientist and you finished your experiment. What do you do with your data? Discuss with your group members and we.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Predicting Classes in Need of Refactoring – An Application of Static Metrics Liming Zhao Jane Hayes 23 September 2006.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1 SMU EMIS 7364 NTU TO-570-N Control Charts Basic Concepts and Mathematical Basis Updated: 3/2/04 Statistical Quality Control Dr. Jerrell T. Stracener,
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
Metrics "A science is as mature as its measurement tools."
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means II: Nonparametric techniques.
Class-oriented metrics – Weighted methods per class, depth of the inheritance tree, number of children, coupling, response for class, lack of cohesion.
Presented by Lu Xiao Drexel University Quantifying Architectural Debt.
Quality Assessment based on Attribute Series of Software Evolution Paper Presentation for CISC 864 Lionel Marks.
DevCOP: A Software Certificate Management System for Eclipse Mark Sherriff and Laurie Williams North Carolina State University ISSRE ’06 November 10, 2006.
Defect Prediction Techniques He Qing What is Defect Prediction? Use historical data to predict defect. To plan for allocation of defect detection.
Steve Chenoweth Office Phone: (812) Cell: (937)
Assessment of Geant4 Software Quality
Course Notes Set 12: Object-Oriented Metrics
Do Developers Focus on Severe Code Smells?
Design Metrics Software Engineering Fall 2003
Design Metrics Software Engineering Fall 2003
Predict Failures with Developer Networks and Social Network Analysis
Process Capability.
Predicting Fault-Prone Modules Based on Metrics Transitions
Software Metrics SAD ::: Fall 2015 Sabbir Muhammad Saleh.
Exploring Complexity Metrics as Indicators of Software Vulnerability
Recommending Adaptive Changes for Framework Evolution
Presentation transcript:

Ehsan Salamati Taba, Foutse Khomh, Ying Zou, Meiyappan Nagappan, Ahmed E. Hassan 1

2

Predict Bugs Model 3 Past Defects, History of Churn (Zimmermann, Hassan et al.) Topic Modeling (Chen et al.)

4

 not technically incorrect and don't prevent a system from functioning  weaknesses in design 5

Indicate a deeper problem in the system 6

Antipatterns indicate weaknesses in the design that may increase the risk for bugs in the future. (Fowler 1999) 7

There is not a lot of refactoring activities when developing a system. (Olbrich et al.) 8

CVS Repository Mining Source Code Repositories Detecting Antipatterns Mining Bug Repositories Bugzilla Calculating Metrics Analyzing RQ1 RQ2 RQ3 9

10 SystemsRelease(#)ChurnLOCs Eclipse (12)148,45426,209,669 ArgoUML (9)21,4272,025,730 Studied Systems

CVS Repository Mining Source Code Repositories Detecting Antipatterns Mining Bug Repositories Bugzilla Calculating Metrics Analyzing RQ1 RQ2 RQ3 11

12  13 different antipatterns  DECOR (Moha et al.) # of Antipatterns # Files Systems#Antipatterns Eclipse273,766 ArgoUML15,100

CVS Repository Mining Source Code Repositories Detecting Antipatterns Mining Bug Repositories Bugzilla Calculating Metrics Analyzing RQ1 RQ2 RQ3 13 Systems#Post Bugs#Pre Bugs Eclipse27,40623,554 ArgoUML2,5492,569

RQ1: Do antipatterns affect the density of bugs in files? RQ2: Do the proposed antipattern based metrics provide additional explanatory power over traditional metrics? RQ3: Can we improve traditional bug prediction models with antipatterns information? 14

15

16 SystemsReleases(#)D A – D NA > 0p-value<0.05 Eclipse1288 ArgoUML966 Files with Antipatterns Density of Bugs Files without Antipatterns Density of Bugs

RQ1: Do antipatterns affect the density of bugs in files? RQ2: Do the proposed antipattern based metrics provide additional explanatory power over traditional metrics? RQ3: Can we improve traditional bug prediction models with antipatterns information? 17

 Average Number of Antipatterns (ANA)  Antipattern Cumulative Pairwise Differences (ACPD) 18  Antipattern Recurrence Length(ARL)  Antipattern Complexity Metric (ACM)

a.java b.java c.java ANA(a.java) =2.16, ARL(a.java) = 18.76, ACPD(a.java) = 0

20

21  Provide additional explanatory power over traditional metrics  ARL shows the biggest improvement

RQ1: Do antipatterns affect the density of bugs in files? RQ2: Do the proposed antipattern based metrics provide additional explanatory power over traditional metrics? RQ3: Can we improve traditional bug prediction models with antipatterns information? 22

Step-wise analysis 1)Removing Independent Variables 2)Collinearity Analysis 23 Metric nameDescription LOCSource lines of codes MLOCExecutable lines of codes PARNumber of parameters NOFNumber of attributes NOMNumber of methods NOCNumber of children VGCyclomatic complexity DITDepth of inheritance tree LCOMLack of cohesion of methods NOTNumber of classes WMCNumber of weighted methods per class PRENumber of pre-released bugs ChurnNumber of lines of code added modified or deleted

24  ARL remained statistically significant and had a low collinearity with other metrics # Versions

F-measure 25  ARL can improve cross- system bug prediction on the two studied systems

Backup Slides 27

a.java b.java c.java ANA(a.java) =2.16, ARL(a.java) = 18.76, ACPD(a.java) = 0

29 Anti SingletonBlobClass Data Should be PrivateComplex ClassLarge Class Lazy ClassLPLLong MethodMessage ChainRPB Spaghetti CodeSGSwissArmyKnife--

30

Hypothesis There is no difference between the density of future bugs of the files with antipatterns and the other files without antipatterns. Wilcoxon rank sum test We perform a Wilcoxon rank sum test to accept or refuse the hypothesis, using the 5% level (i.e., p- value < 0:05). Findings In general, the density of bugs in a file with antipatterns is higher than the density of bugs in a file without antipatterns. 31