Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.

Similar presentations


Presentation on theme: "Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code."— Presentation transcript:

1 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code Clones for Refactoring Using Combinations of Clone Metrics 1 †Osaka University, Japan ‡ Nara Institute of Science and Technology, Japan *NEC Corporation, Japan Eunjong Choi †, Norihiro Yoshida ‡, Takashi Ishio †, Katsuro Inoue †, and Tateki Sano*

2 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Background: Clone Set A set of code clones that is similar or identical to each other 2 Clone Set:S 1 ={Code Clone 1, Code Clone 3} S 2 ={Code Clone 2, Code Clone 4, Code Clone 5} Code Clone 4 Code Clone 5 Code Clone 3 Code Clone 2 Code Clone 1 similar identical

3 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Background: Refactoring Code Clone Merge code clones into a single program unit 3 Refactoring Code Clone 3 Code Clone 2 Code Clone 1 Code Clone 2 Code Clone’ 1

4 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache file); if == null) { (file); ; } 4 Background: Language-dependent Code Clone It is unavoidable to exist in source code  because of features of the used program language. /* Code Clone A */ replacement.setTaskType(taskType); replacement.setTaskName(taskName); replacement.setLocation(location); replacement.setOwningTarget(target); replacement.setRuntime (wrapper); wrapper.setProxy(replacement); /* … */ /* Code Clone B */ def.setName(name); def.setClassName(classname); def.setClass(cl); def.setAdapterClass(adapterClass); def.setAdaptToClass(adaptToClass); def.setClassLoader(al); /* … */ Example of the language-dependent code clone (Consecutive setter invocations)

5 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Background: Clone Metrics [Higo2007] Quantitative information on clone sets  E.g., LEN(S), RNR(S), POP(S) Purposes  To check features of code clones in software  To extract code clones for several purposes  E.g., r efactoring, defect-prone code clones 5 [Higo2007] Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985-998 (2007-9)

6 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone Metrics: LEN(S) The average length of token sequences of code clones in a clone set S 6 Clone set S A token sequence [c c* ] is detected as a code clone from a token sequence Superscript * indicated that the token is in a repeated token sequence LEN(S) = 2

7 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone Metrics: RNR(S) The ratio of non-repeated token sequences of code clones in a clone set S 7 Clone set S RNR(S) = 100 = 50 1 2 The length of non-repeated token sequence token sequence The length of whole token sequence A token sequence [c c* ] is detected as a code clone from a token sequence

8 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone Metrics: POP(S) The number of code clones in a clone set S 8 Clone set S POP(S) = 6 1 2 3 4 5 6

9 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Single Clone Metric (1/2) Clone sets whose RNR(S) is higher  They do not organize a single semantic unit  semantic unit : many instructions forming a single functionality 9 /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {x a part of semantic unit Not Appropriate for Refactoring!

10 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Single Clone Metric (2/2) Clone sets whose POP(S) is higher  They Include many language-dependent code clones 10 /* Code Clone in a clone set whose POP(S) is the first highest in Ant 1.7.0 */ out.println("\">"); out.println(""); out.print("<!ELEMENT project (target | "); out.print(TASKS); out.print(" | "); out.print(TYPES); Not Appropriate for Refactoring!

11 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Key Idea It is not appropriate to extract refactorable code clones using just a single clone metric  According to our experiences We propose a method based on combined clone metrics  To improve the weakness of single-metric-based extraction 11

12 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Combined Clone Metrics Clone sets whose RNR(S), POPS(S) are higher  Each code clone organizes a single semantic units 12 /* Code Clone in a clone set whose RNR(S), POP(S) are higher than others*/ if (ifProperty != null && p.getProperty(ifProperty) == null) { return false; } else if (unlessProperty != null && p.getProperty(unlessProperty) != null) { return false; } return true; } Appropriate for Refactoring!

13 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study (1/2) Goal: validating our key idea  Using combined clone metrics is a feasible method to extract code clone for refactoring Target System  Industrial Java software developed by NEC  110KLOC, 736 clone sets 13

14 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study (2/2) Experimental Step 1. Selected 62 clone sets from CCFinder's output using clone metrics. 2. Conducted a survey about these clone sets and got feedback from a developer. 14 Source files CCFinder Clone sets using clone metrics Survey Feed back

15 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (1/2) Clone sets whose either clone metric value is high  Clone sets whose LEN(S) value is top 10 high  Clone sets whose RNR(S) value is top 10 high  Clone sets whose POP(S) value is top 10 high 15

16 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (2/2) Clone sets whose combined clone metrics values are high  15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15  7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15  18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15  1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15 16

17 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Results of Case Study (1/2) 17 #Selected Clone Sets: The number of selected clones #Refactoring: The number of clone sets marked as “Perform refactoring“ in survey Filtering #Selected Clone Sets #RefactoringPrecision Each Single Clone metric 30140.47 Combined Clone metrics 41340.87

18 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Results of Case Study (2/2) 18 Precision : “How many refactoring candidates were accepted by a developer?“ Combined clone metrics is more accepted as refactoring candidates by a developer #Refactoring #Selected Clone Sets Precision = Filtering #Selected Clone Sets #RefactoringPrecision Each Single Clone metric 30140.47 Combined Clone metrics 41340.87

19 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Summary and Future Work Summary  Our Industrial case study shows that our key idea is appropriate. Future Work  Investigate about recall  Conduct case studies of open source software  Suggest a new metric 19

20 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20 Thank You

21 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone sets whose RNR(S) is higher than others Each code clone in a clone set S consists of more non-repeated token sequences 21 /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) { /* … */

22 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone sets whose RNR(S) is lower than others Consists of more repeated token sequences  Involve in language-dependent code clone 22 /* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1.7.0 */ String sosCmdDir = null; …… skip code…. private String filename = null; private boolean noCompress = false; private boolean noCache = false; private boolean recursive = false; private boolean verbose = false; /* … */ Consecutive variable declarations

23 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Survey Format: About Clone set XXX (1) Do you think that this clone set need a practice? [] Yes [] No ( →Jump to next clone set) (2) If you marked “Yes” in your answer to (1), what practice is appropriate for this clone set? [] Refactoring [] Write comments about code clones, but don’t perform refactoring. [] Change nothing. [] Others. ( (3) Write the reason why did you mark in your answer to (2) Reason : 23

24 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Results, and Precision of each clone set in the survey 24 Filtering#Selected Clone Sets #RefactoringPrecision Clone sets whose LEN(S) value is top 10 high 1070.70 Clone sets whose RNR(S) value is top 10 high 1040.40 Clone sets whose POP(S) value is top 10 high 1030.30 Clone sets whose LEN(S) and RNR(S) values are high rank in the top 15 15130.87 Clone sets whose LEN(S) and POP(S) values are high rank in the top 760.86 RNR(S) and POP(S) values are high rank in the top 15 18140.78 Clone sets whose 1 clone set whose LEN(S), RNR(S), and POP(S) values are high rank in the top 15 111.00

25 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone metric: RNR(S) (1/2) File:  F 1 : a b c a b,  F 2 : c c* c* a b,  F 3 : d a b, e f  F 4 : c c* d e f  Superscript * indicated that the token is in a repeated token sequence  RNR(S 1 ) of Clone Set S 1 is 25 RNR(S 1 ) = 100 = 100 2 + 2 + 2 + 2 Clone Set: S 1 : {,,, } ab

26 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone metric: RNR(S) (2/2) File:  F 1 : a b c a b,  F 2 : c c* c* a b,  F 3 : d a b, e f  F 4 : c c* d e f  Superscript * indicated that the token is in a repeated token sequence  RNR(S 2 ) of Clone Set S 2 is 26 Clone Set: S 2 : {,, } c c* c* c*c c* RNR(S 2 ) = 100 = 33.3 1 + 0 + 1 2 + 2 + 2

27 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones 62 clone sets  clone sets whose individual clone metric value is high  S LEN Clone sets whose LEN(S) value is top 10 high.  S RNR Clone sets whose RNR(S) value is top 10 high.  S POP Clone sets whose POP(S) value is top 10 high.  clone sets whose combined clone metrics values are high  S LEN∙RNR 15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15.  S LEN∙POP 7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15.  S RNR∙POP 18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15.  S LEN∙RNR∙POP 1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15. 27

28 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University | S RNR ∩ S POP ∩ S RNR ∙ POP | = 1 | S RNR ∩ S RNR ∙ POP | = 2 | S POP ∩ S RNR ∙ POP | = 2 | S LEN ∙ RNR ∩ S LEN ∙ POP ∩ S RNR ∙ POP ∩ S LEN ∙ RNR ∙ POP | = 1 CS セミナー 2010/12/01 28 The Number of Duplicate Clone Set

29 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Example of clone set that are not selected… It is too short to organize a semantic unit. RNR metric sometimes extract unintentional code clones  E.g., Language-dependent code clones 29 boolean isEqual(final DeweyDecimal other) { final int max = Math.max(other.components.length, components.length); for (int i = 0; i < max; i++) { final int component1 = (i < components.length) ? components[ i ] : 0; final int component2 = (i < other.components.length) ? other.components[ i ] : 0; if (


Download ppt "Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code."

Similar presentations


Ads by Google