Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information.

Slides:



Advertisements
Similar presentations
Duplicate code detection using Clone Digger Peter Bulychev Lomonosov Moscow State University CS department.
Advertisements

ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Code recognition & CL modeling through AST Xingzhong Xu Hong Man.
A Tool Support to Merge Similar Methods with a Cohesion Metric COB ○ Masakazu Ioka 1, Norihiro Yoshida 2, Tomoo Masai 1,Yoshiki Higo 1, Katsuro Inoue 1.
13/07/2015Dr Andy Brooks1 Fyrirlestrar 9 & 10 CCFinder: A Tool to Detect Clones “I can just copy these lines. That is the safest thing to do. The code.
Refactoring Support Tool: Cancer Yoshiki Higo Osaka University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Automatic Categorization.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Kinds of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 ARIES: Refactoring.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University 1 Evaluation of a Business Application Framework Using Complexity.
Effective Testing and Debugging Methods and Its Supporting System with Program Deltas Makoto Matsushita, Masayoshi Teraguchi, and Katsuro Inoue Osaka University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Technology and Science, Osaka University Dependence-Cache.
Presented by: Ashgan Fararooy Referenced Papers and Related Work on:
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Assessment of the Quality of Refactoring.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Investigation of Opportunities for Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Experience of Finding Inconsistently-Changed Bugs in Code Clones of Mobile Software Katsuro Inoue†, Yoshiki Higo†, Norihiro Yoshida†, Eunjong Choi†, Shinji.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 コードクローン解析に基づくリファクタリング支援.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Towards a Collection of Refactoring Patterns Based.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Aries: Refactoring.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Module Road Map Refactoring Why Refactoring? Examples
Source File Set Search for Clone-and-Own Reuse Analysis
MultiRefactor: Automated Refactoring To Improve Software Quality
Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
Objective of This Course
Predicting Fault-Prone Modules Based on Metrics Transitions
Refactoring Support Tool: Cancer
Quaid-i-Azam University
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue
On Refactoring Support Based on Code Clone Dependency Relation
Chapter 10: Compilers and Language Translation
Research Activities of Software Engineering Lab in Osaka University
Dotri Quoc†, Kazuo Kobori†, Norihiro Yoshida
Unit 1 Programming - Assignment 3
Presentation transcript:

Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information Science and Technology, Osaka University

Outline  Review Code Clones  Prior Code Clone Research  Refactoring/Merging Code Clones  Complete Code-Clone Merge Explanation  Basic Case and Illustration  Expand to Difficult Case (Overlapping and Embedded Code Clones)  Prototype tool and its application  Conclusions 2

What are code clones?  Code clones – sections of code that are the same or very similar to each other  How similar they must be depends on what kind of clone and how one measures their similarity. 3 Image:

Types of Code Clones  Type 1 – Identical  Type 2 – Different variable names/values  Type 3 – May have additions, deletions, altered statements due to editing  Type 4 – Semantic, has same function but different structure or syntax 4

Why do code clones matter?  Code clones increase maintenance costs  Inconsistent changes lead to bugs [1]  “Nearly every second unintentionally inconsistent change to a code clone leads to a fault” [2]  As project increases in size, more likely for unintentional code clones to appear [3] 5 [1] Chanchal K. Roy, James R. Cordy, Rainer Koschke, Comparison and evaluation of code clone detection techniques and tools: A qualitative approach, Sci. Comput. Program., Vol.74, No.7, pp (2007). [2] Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner, Do code clones matter?, In Proceedings of the 31st Inter-national Conference on Software Engineering (ICSE ’09), pp (2009). [3] Michel Dagenais, Ettore Merlo, Bruno Lagu¨e, and Daniel Proulx. Clones occurrence in large object oriented software packages. In Pro-ceedings of the 8th IBM Centre for Advanced Studies Conference (CASCON ’98), pp (1998).

Should we get rid of clones?  Quantitative evaluation of code clones may help us decide  How much of the software system is made of code clones?  How much of the system size will be reduced if we merge all code clones?  Code clone detection tools exist to answer the first question. 6

What is Merging?  Merging – we mean a kind of refactoring  Code refactoring – restructuring preexistent code without changing external behavior or final execution result [4]  Code clone refactor technique [5] –  Extract clones from the code  Create shared function that contains cloned portion  Create calls to that shared function 7 [4] Martin Fowler, Refactoring: Improving the Design of Existing Code, Addison-Wesley (1999). [5] Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, Refactoring Support Based on Code Clone Analysis, In Proceedings of 5th International Conference on Product Focused Software Process Improvement, pp (2004).

Complete Code-Clone Merge  How much of the system size will be reduced if we merge all code clones?  Complete Code-Clone Merge (CCM) is an algorithm designed to help answer that question 8

CCM Explained  We have a source file S of a certain line length |S|  Each code clone will have a unique ID.  Each unique code clone will be extracted to a shared function. 9

CCM Explained  Within S, each clone will be replaced with a call to their respective shared functions.  Merging all code clones creates S’ of a certain line length |S’|  We expect |S’| < |S| 10

Basic Case and Illustration  |S| = 100 lines  Recognize clones A and B.  A = 15 lines, B = 10 lines  POP of A = 2, POP of B = 2  POP (population) – number of times a clone appears  Merge clones into individual shared functions 11

12 Clone Detection Software Clone Pair Data CCM Source Code: S |S| = 100 Lines A: 15 Lines B: 10 Lines A: 15 Lines B: 10 Lines 1 A: Function Call B: Function Call S’ - 1 Line 83 A: 15 Lines B: 10 Lines A: Initialization A: Termination B: Initialization B: Termination - 1 Line |S’| = 83 Lines

Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 13

Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 14 Sum of all Unique Code Clone Lengths x POP Clone IDAB Lines1510 POP22 Total Size302050

Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 15 (|S| - Total Clone Length) + Total Function Calls + Total Shared Function Size 50 Lines + 4 Lines + 29 Lines Function(Clone ID)AB Core Lines1510 Initialization Lines11 Termination Lines11 Total Size Note: Initialization and Termination may be configured to be a value other than the 1 Line default value.

Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 16 |S| - |S’| = Lines of Code Reduced = 17

Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 17 (Lines of Code Reduced / |S|) x 100 = Percent Reduction (17 Lines / 100 Lines) x 100 = 17%

Overlapping and Embedded Code Clones B: 15 Lines A: 15 Lines B: 15 Lines  Sections of code, identified as code clones that share a portion of their code with another unique code clone  Not uncommon, must be accounted for.

Overlapping and Embedded Code Clones B: 15 Lines A: 15 Lines B: 15 Lines  Can no longer simply create shared function for A and B  We decide to use the “Chunking Method”

Overlapping and Embedded Code Clones B: 15 Lines A: 15 Lines B: 15 Lines C: 5 Lines |S| = B’: 10 Lines A’: 10 Lines B’: 10 Lines C: 5 Lines

B’: 10 Lines A’: 10 Lines B’: 10 Lines C: 5 Lines Overlapping and Embedded Code Clones  After creating “chunks” can create a shared method for each  Create calls as normal  Overlaps increase the number of lines required in |S’|

CCM Size Estimation Prototype Tool  Tool used to estimate system size after merging all code clones.  Tool uses CCFinderX as part of the required input [6]  Generates clone pair data used by the algorithm  Source code S is also required input.  Removal of whitespace/comments before running CCFinderX and tool. 22 [6] CCFinderX Official site,

Application of the Tool  Three examples of source codes used as part of CCM Prototype application  Multilap.java  Java JDK [7]  Quake Engine [8]  Java JDK and Quake Engine chosen due to large size. [7] Java SE j Oracle Technology Network j Oracle, Java. SE Development Kit 8, Update 77 Release Notes, [8] GitHub - id-Software/Quake: Quake GPL Source Release, ©

Multilap.java  Control to show multiple overlapping code clones.  Can follow the calculations for this step- by-step in paper. 24

Java JDK Code clone volume:  Calculated via: (Total Clone Length/|S|) x Result Summary Initial Size |S|813,546 Lines Total Clone Length207,072 Lines Code Clone Volume25.45% Reduced Size |S’|708,139 Lines Lines of Code Reduced105,407 Lines Percent Reduction12.96% Java JDK 1.8.0_77-b03

Java JDK  Code clone volume: Approx. 25%  Most common POP is 2  If we assume every clone has POP of 2, expected reduction percent would be about half of code clone volume. (12.73%)  Actual Reduction: 12.96% 26

Quake Engine 27 Result Summary Initial Size |S|216,722 Lines Total Clone Length49,098 Lines Code Clone Volume22.66% Reduced Size |S’|194,324 Lines Lines of Code Reduced22,398 Lines Percent Reduction10.33%

Quake Engine  Code clone volume: Approx %  POP 2 is again most frequent, although to a lesser extent.  Expected reduction: 11.33%  Actual reduction: 10.33% 28

Conclusions  Quantitative evaluation:  What percentage of the source code could theoretically be reduced?  Application results seem reasonable  Analyzing the POP frequencies, reduction seems consistent with what is expected  Code clones with POP value of 2 most common in large sources analyzed by prototype 29