Download presentation
Presentation is loading. Please wait.
Published byCaitlin Bradford Modified over 8 years ago
1
Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information Science and Technology, Osaka University
2
Outline Review Code Clones Prior Code Clone Research Refactoring/Merging Code Clones Complete Code-Clone Merge Explanation Basic Case and Illustration Expand to Difficult Case (Overlapping and Embedded Code Clones) Prototype tool and its application Conclusions 2
3
What are code clones? Code clones – sections of code that are the same or very similar to each other How similar they must be depends on what kind of clone and how one measures their similarity. 3 Image: http://learn.genetics.utah.edu/content/cloning/whyclone/images/clones.jpg
4
Types of Code Clones Type 1 – Identical Type 2 – Different variable names/values Type 3 – May have additions, deletions, altered statements due to editing Type 4 – Semantic, has same function but different structure or syntax 4
5
Why do code clones matter? Code clones increase maintenance costs Inconsistent changes lead to bugs [1] “Nearly every second unintentionally inconsistent change to a code clone leads to a fault” [2] As project increases in size, more likely for unintentional code clones to appear [3] 5 [1] Chanchal K. Roy, James R. Cordy, Rainer Koschke, Comparison and evaluation of code clone detection techniques and tools: A qualitative approach, Sci. Comput. Program., Vol.74, No.7, pp.470-497 (2007). [2] Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, Stefan Wagner, Do code clones matter?, In Proceedings of the 31st Inter-national Conference on Software Engineering (ICSE ’09), pp.485-495 (2009). [3] Michel Dagenais, Ettore Merlo, Bruno Lagu¨e, and Daniel Proulx. Clones occurrence in large object oriented software packages. In Pro-ceedings of the 8th IBM Centre for Advanced Studies Conference (CASCON ’98), pp. 192-200 (1998).
6
Should we get rid of clones? Quantitative evaluation of code clones may help us decide How much of the software system is made of code clones? How much of the system size will be reduced if we merge all code clones? Code clone detection tools exist to answer the first question. 6
7
What is Merging? Merging – we mean a kind of refactoring Code refactoring – restructuring preexistent code without changing external behavior or final execution result [4] Code clone refactor technique [5] – Extract clones from the code Create shared function that contains cloned portion Create calls to that shared function 7 [4] Martin Fowler, Refactoring: Improving the Design of Existing Code, Addison-Wesley (1999). [5] Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, Refactoring Support Based on Code Clone Analysis, In Proceedings of 5th International Conference on Product Focused Software Process Improvement, pp.220-233 (2004).
8
Complete Code-Clone Merge How much of the system size will be reduced if we merge all code clones? Complete Code-Clone Merge (CCM) is an algorithm designed to help answer that question 8
9
CCM Explained We have a source file S of a certain line length |S| Each code clone will have a unique ID. Each unique code clone will be extracted to a shared function. 9
10
CCM Explained Within S, each clone will be replaced with a call to their respective shared functions. Merging all code clones creates S’ of a certain line length |S’| We expect |S’| < |S| 10
11
Basic Case and Illustration |S| = 100 lines Recognize clones A and B. A = 15 lines, B = 10 lines POP of A = 2, POP of B = 2 POP (population) – number of times a clone appears Merge clones into individual shared functions 11
12
12 Clone Detection Software Clone Pair Data CCM Source Code: S |S| = 100 Lines 1 100 A: 15 Lines B: 10 Lines A: 15 Lines B: 10 Lines 1 A: Function Call B: Function Call S’ - 1 Line 83 A: 15 Lines B: 10 Lines A: Initialization A: Termination B: Initialization B: Termination - 1 Line |S’| = 83 Lines
13
Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 13
14
Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 14 Sum of all Unique Code Clone Lengths x POP Clone IDAB Lines1510 POP22 Total Size302050
15
Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 15 (|S| - Total Clone Length) + Total Function Calls + Total Shared Function Size 50 Lines + 4 Lines + 29 Lines Function(Clone ID)AB Core Lines1510 Initialization Lines11 Termination Lines11 Total Size171229 Note: Initialization and Termination may be configured to be a value other than the 1 Line default value.
16
Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 16 |S| - |S’| = Lines of Code Reduced 100 - 83 = 17
17
Basic Case and Illustration Result Summary Initial Size |S|100 Lines Total Clone Length50 Lines Reduced Size |S’|83 Lines Lines of Code Reduced17 Lines Percent Reduction17% 17 (Lines of Code Reduced / |S|) x 100 = Percent Reduction (17 Lines / 100 Lines) x 100 = 17%
18
Overlapping and Embedded Code Clones 18 1 100 B: 15 Lines A: 15 Lines B: 15 Lines Sections of code, identified as code clones that share a portion of their code with another unique code clone Not uncommon, must be accounted for.
19
Overlapping and Embedded Code Clones 19 1 100 B: 15 Lines A: 15 Lines B: 15 Lines Can no longer simply create shared function for A and B We decide to use the “Chunking Method”
20
Overlapping and Embedded Code Clones 20 1 100 B: 15 Lines A: 15 Lines B: 15 Lines C: 5 Lines |S| = 100 1 100 B’: 10 Lines A’: 10 Lines B’: 10 Lines C: 5 Lines
21
B’: 10 Lines A’: 10 Lines B’: 10 Lines C: 5 Lines Overlapping and Embedded Code Clones 21 1 100 After creating “chunks” can create a shared method for each Create calls as normal Overlaps increase the number of lines required in |S’|
22
CCM Size Estimation Prototype Tool Tool used to estimate system size after merging all code clones. Tool uses CCFinderX as part of the required input [6] Generates clone pair data used by the algorithm Source code S is also required input. Removal of whitespace/comments before running CCFinderX and tool. 22 [6] CCFinderX Official site, http://www.ccfinder.net/.
23
Application of the Tool Three examples of source codes used as part of CCM Prototype application Multilap.java Java JDK [7] Quake Engine [8] Java JDK and Quake Engine chosen due to large size. [7] Java SE j Oracle Technology Network j Oracle, http://www.oracle.com/technetwork/java/javase. Java. SE Development Kit 8, Update 77 Release Notes, http://www.oracle.com/technetwork/java/javase/8u77-relnotes-2944725.html. [8] GitHub - id-Software/Quake: Quake GPL Source Release, https://github.com/id-Software/Quake. © 1992 23
24
Multilap.java Control to show multiple overlapping code clones. Can follow the calculations for this step- by-step in paper. 24
25
Java JDK Code clone volume: Calculated via: (Total Clone Length/|S|) x 100 25 Result Summary Initial Size |S|813,546 Lines Total Clone Length207,072 Lines Code Clone Volume25.45% Reduced Size |S’|708,139 Lines Lines of Code Reduced105,407 Lines Percent Reduction12.96% Java JDK 1.8.0_77-b03
26
Java JDK Code clone volume: Approx. 25% Most common POP is 2 If we assume every clone has POP of 2, expected reduction percent would be about half of code clone volume. (12.73%) Actual Reduction: 12.96% 26
27
Quake Engine 27 Result Summary Initial Size |S|216,722 Lines Total Clone Length49,098 Lines Code Clone Volume22.66% Reduced Size |S’|194,324 Lines Lines of Code Reduced22,398 Lines Percent Reduction10.33%
28
Quake Engine Code clone volume: Approx. 22.66% POP 2 is again most frequent, although to a lesser extent. Expected reduction: 11.33% Actual reduction: 10.33% 28
29
Conclusions Quantitative evaluation: What percentage of the source code could theoretically be reduced? Application results seem reasonable Analyzing the POP frequencies, reduction seems consistent with what is expected Code clones with POP value of 2 most common in large sources analyzed by prototype 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.