1 Predicting Classes in Need of Refactoring – An Application of Static Metrics Liming Zhao Jane Hayes 23 September International PROMISE Workshop Predicting Classes in Need of Refactoring – An Application of Static Metrics Liming Zhao Jane Hayes 23 September International PROMISE Workshop
2 Predicting Classes in Need of Refactoring: An Application of Static Metrics Motivation Motivation Related work Related work Design of the tool Design of the tool Study design Study design Study results Study results Conclusions and future work Conclusions and future work
3 Motivation Cost of software maintenance Cost of software maintenance Refactoring Refactoring Importance of refactoring planning Importance of refactoring planning Need for improving multiple classes Need for improving multiple classes Deadlines, human resources, budgets Deadlines, human resources, budgets
4 Refactoring steps Identify code segments Identify code segments Evaluate possible costs and benefits Evaluate possible costs and benefits Develop refactoring plan Develop refactoring plan Apply the refactorings Apply the refactorings
5 Why use a class-based approach? Common programmer(s) Common programmer(s) Cohesive code Cohesive code Learning effect Learning effect Comprehension overhead Comprehension overhead
6 Code Repository Analysis ComplexitySizeCouplingHistory Maintainability Prediction Model Maintainability Measurement & Prediction Component (MMPC) Cost-Benefits Estimation Candidate Classes Refactoring Planning Cost & Benefit Estimation Component (CBEC) Proposed approach Visualizer Prioritized Class List Managers Cost Benefits
7 Design of the tool Size Maintainability Prediction Model Complexity Coupling Refactoring Planning Candidate Classes History Code Repository Analysis
8 Related work: Maintainability assessment OO metrics – Chidamber and Kemerer [1994] OO metrics – Chidamber and Kemerer [1994] MI – Welker [1995] MI – Welker [1995] PM and MP – Hayes et al. [2004] PM and MP – Hayes et al. [2004] Assessment of UML artifacts - Hassan et al. [2005] Assessment of UML artifacts - Hassan et al. [2005] RDC ratio – Hayes and Zhao [2005] RDC ratio – Hayes and Zhao [2005]
9 Related work: Refactoring Preconditions – Opdyke [1992] Preconditions – Opdyke [1992] Bad smells – Fowler et al. [1999] Bad smells – Fowler et al. [1999] Detecting bad smells – Mens et al. [2003] Detecting bad smells – Mens et al. [2003] Member similarity – Simon et al. [2001] Member similarity – Simon et al. [2001]
10 Metrics examined Halstead metrics Halstead metrics Cyclomatic complexity Cyclomatic complexity Weighted method per class Weighted method per class Maintainability index Maintainability index
11 Class-based rank Cost concern Cost concern Priority class list Priority class list Individual rank Individual rank Comprehensive rank Comprehensive rank
12 Code repository analysis and metrics collection Code repository analyzer Code repository analyzer Java front end Java front end Abstract syntax tree Abstract syntax tree Metrics Metrics Complexity Complexity Size Size Other Other Weighted maintainability rank (WMR) Weighted maintainability rank (WMR)
13 Initial validation on student projects Objective of the study Objective of the study Measures examined Measures examined Tool ’ s performance Tool ’ s performance Programmers ’ decisions Programmers ’ decisions Tool vs. programmer Tool vs. programmer Time spent Time spent
14 Study design Participants: Graduate students in computer science Participants: Graduate students in computer science Subjects: Java source code (20 classes) from software engineering class project Subjects: Java source code (20 classes) from software engineering class project Pre-reading: Questionnaire about background, smell-list, etc. Pre-reading: Questionnaire about background, smell-list, etc.
15 Study design: procedure Procedure instructed: Procedure instructed: Fill in the questionnaire and read about the smell list Fill in the questionnaire and read about the smell list Read the code and look for “ bad smells ” ; select a smell from a provided list or fill in a problem observed that was not in the list Read the code and look for “ bad smells ” ; select a smell from a provided list or fill in a problem observed that was not in the list Find the class with the most serious problems that should be refactored first Find the class with the most serious problems that should be refactored first Record the time spent reviewing/reading each class Record the time spent reviewing/reading each class
16 Study Result: Compare reviewers ’ selection with tool ’ s selection Priorityrv1rv2rv3rv4rv5rv6Tool #1I1AGAAGA #2FWTPEHLO #3WLOASGH #4GVLWHP #5PSMLOF #6PL Legend of class name abbreviations: A - AnimationScreen, F-FileOutPut, L- LoginScreen, P- PitchAnalyzer, S- SessionReport, E- ErrorWindow, G- Graphic, LO - LoudnessAnalyzer, R- RegistrationScreen, T - TestScreen, F- FileInputHandler, H - HistoryReport, M - MenuScreen, W- Welcome, I1 – ImagePanel1, I2 – ImagePanel2, L – LoginScreen Legend of class name abbreviations: A - AnimationScreen, F-FileOutPut, L- LoginScreen, P- PitchAnalyzer, S- SessionReport, E- ErrorWindow, G- Graphic, LO - LoudnessAnalyzer, R- RegistrationScreen, T - TestScreen, F- FileInputHandler, H - HistoryReport, M - MenuScreen, W- Welcome, I1 – ImagePanel1, I2 – ImagePanel2, L – LoginScreen
17 Class Animation Noted by 67% of the reviewers and the tool (ranked 1 st by 50% of the reviewers and the tool) Noted by 67% of the reviewers and the tool (ranked 1 st by 50% of the reviewers and the tool) Largest halstead_effort and smallest MI Largest halstead_effort and smallest MI Long method per class (LMC) and Complex method per class (CMC) above average (Slide 43) Long method per class (LMC) and Complex method per class (CMC) above average (Slide 43)
18 Metrics of class animation Halstead-LHalstead-vHalstead-EMILMCCMC Animation (max) Average
19 Study results Tool and half of reviewers noted class Animation as worst Tool and half of reviewers noted class Animation as worst Reviewers spent a significant amount of time (1-3 hours) Reviewers spent a significant amount of time (1-3 hours) More “ easy ” problems found (90%) More “ easy ” problems found (90%) Reviewers looked for different smells (28% not in list) Reviewers looked for different smells (28% not in list)
20 Conclusions Complexity and size are among the major factors making comprehension hard Complexity and size are among the major factors making comprehension hard Reviewers are looking for different problems Reviewers are looking for different problems Code reviewing is time-consuming Code reviewing is time-consuming Automation can help consistency, efficiency, and effectiveness Automation can help consistency, efficiency, and effectiveness
21 Future work Predict the possible cost of the refactoring and its impact on code maintainability Predict the possible cost of the refactoring and its impact on code maintainability Use metrics from evolution history Use metrics from evolution history Use metrics reflecting inter-class relationships Use metrics reflecting inter-class relationships
22 Acknowledgements Thanks to Edison Design Group for providing JFE Thanks to Edison Design Group for providing JFE Thanks to the graduate student volunteers for participating in the study Thanks to the graduate student volunteers for participating in the study
23 Questions?