Download presentation
Presentation is loading. Please wait.
Published byJoella Jennings Modified over 9 years ago
1
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones for Refactoring with Clone Metrics : A Case Study of Open Source Software 1 †Osaka University, Japan ‡ Nara Institute of Science and Technology, Japan *NEC Corporation, Japan Eunjong Choi †, Norihiro Yoshida ‡, Takashi Ishio †, Katsuro Inoue †, and Tateki Sano*
2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Contents 1. Background 2. Clone Metrics 3. Industrial Case Study 4. Case Study of Open Source Software 5. Summary and Future Work 2
3
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Background: Clone Clone Identical or similar code fragments in source code The presence of code clones indication of low maintainability of software if a bug is found in a code clone, the other code clone have to be checked for defect detection. 3 Similar
4
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Refactoring is a process of restructuring an existing code. Alter software’s internal structure without changing its external behavior Improve the maintainability of software Background: Refactoring [Fowler1999] (1/2) 4 [Fowler1999] M. Fowler, et al., Refactoring: Improving The Design of Existing Code, Addition Wesley, 1999.
5
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Refactoring Code Clones Merge code clones into a single program unit Background: Refactoring [Fowler1999] (2/2) 5 Refactoring call statement [Fowler1999] M. Fowler, et al., Refactoring: Improving The Design of Existing Code, Addition Wesley, 1999.
6
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University It is unavoidable to exist in source code because of specifications of the used program language. 6 Background: Language-dependent Code Clone Example of the language-dependent code clone (Consecutive setter invocations) replacement.setTaskType(taskType); replacement.setTaskName(taskName); replacement.setLocation(location); replacement.setOwningTarget(target); replacement.setRuntime (wrapper); wrapper.setProxy(replacement);
7
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Background: Clone Set A set of code clones 7 Code Clone 1 Code Clone 2 Code Clone 3 Clone Set
8
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Background: Clone Metrics [Higo2007] Quantitative information on clone sets E.g., LEN(S), RNR(S), POP(S) Purposes To check features of code clones in software To extract code clones for several purposes E.g., The highest length of code clones… 8 [Higo2007] Y.Higo, T. Kamiya, S.Kusumoto, K.Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985-998 (2007-9)
9
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone Metrics: LEN(S) The average length of token sequences of code clones in a clone set S 9 Clone set S A token sequence [a b b ] is detected as a code clone LEN(S) = 3 a b b
10
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone Metrics: RNR(S) The ratio of non-repeated token sequences of code clones in a clone set S Eliminate language dependent code clones High RNR value 10 RNR(S) = 100 = 33.3 1 3 The length of non-repeated token sequence token sequence The length of whole token sequence Clone set S a b b
11
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone Metrics: POP(S) The number of code clones in a clone set S 11 POP(S) = 3 1 2 3 Clone set S
12
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Single Clone Metric (1/3) Clone sets whose LEN(S) is higher They Include many consecutive if (of if-else) blocks involve similar but different conditional expressions. 12 if ((p = getProject().getProperty("ant.netrexxc.binary")) != null) { this.binary = Project.toBoolean(p); } // classpath makes no sense if ((p = getProject().getProperty("ant.netrexxc.comments")) != null) { this.comments = Project.toBoolean(p); } …………The last part is omitted…………………… Code Clone in a clone set whose POP(S) is the highest in Ant1.7.0
13
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Single Clone Metric (2/3) Clone sets whose RNR(S) is higher They do not organize a single semantic unit semantic unit : many instructions forming a single functionality 13 Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) { a part of semantic unit
14
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Single Clone Metric (3/3) Clone sets whose POP(S) is higher They Include many language-dependent code clones 14 Code Clone in a clone set whose POP(S) is higher than others out.println("\">"); out.println(""); out.print("<!ELEMENT project (target | "); out.print(TASKS); out.print(" | "); out.print(TYPES);
15
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Key Idea It is not appropriate to extract code clones for refactoring using just a single clone metric According to our experiences We propose a method based on combined clone metrics To improve the weakness of single-metric-based extraction 15
16
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Combined Clone Metrics Clone sets whose RNR(S), POPS(S) are higher Each code clone organizes a single semantic units 16 Code Clone in a clone set whose RNR(S), POP(S) are higher than others if (ifProperty != null && p.getProperty(ifProperty) == null) { return false; } else if (unlessProperty != null && p.getProperty(unlessProperty) != null) { return false; } return true; } Appropriate for Refactoring!
17
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Case Study (1/2) Goal: validating our key idea Using combined clone metrics is a feasible method to extract code clone for refactoring Target System Industrial Java software developed by NEC 110KLOC, 736 clone sets 17
18
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Case Study (2/2) Experimental Step 1. Selected 62 clone sets from CCFinder's output using clone metrics. 2. Conducted a survey about these clone sets and got feedback from a developer. 18 Source files CCFinder Clone sets using clone metrics Survey Feed back
19
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (1/2) Clone sets whose either clone metric value is high S LEN : Clone sets whose LEN(S) value is top 10 high S RNR : Clone sets whose RNR(S) value is top 10 high S POP : Clone sets whose POP(S) value is top 10 high 19
20
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (2/2) Clone sets whose combined clone metrics values are high S LENRNR : 15 clone sets whose LEN(S) and RNR(S) values are high rank in the top 15 S LENPOP : 7 clone sets whose LEN(S) and POP(S) values are high rank in the top 15 S RNRPOP : 18 clone sets whose RNR(S) and POP(S) values are high rank in the top 15 S LENRNRPOP : 1 clone set whose LEN(S), RNR(S) and POP(S) values are high rank in the top 15 20
21
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University In Survey : About Clone set XXX Q. Which practice is appropriate for this clone set? [] Perform refactoring [] Write comments about code clones, but don’t perform refactoring. [] Change nothing. [] Others. ( ) 21
22
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University In Survey : About Clone set XXX Q. Which practice is appropriate for this clone set? [] Perform refactoring [] Write comments about code clones, but don’t perform refactoring. [] Change nothing. [] Others. ( ) 22 = Appropriate for refactoring √
23
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University In Survey : About Clone set XXX Q. Which practice is appropriate for this clone set? [] Perform refactoring [] Write comments about code clones, but don’t perform refactoring. [] Change nothing. [] Others. ( ) 23 =Inappropriate for refactoring √ √ √
24
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Results of Case Study (1/2) 24 #Selected Clone Sets: The number of selected clones #Refactoring: The number of clone sets marked as “Perform refactoring“ in survey Filtering #Selected Clone Sets #RefactoringPrecision Each Single Clone metric 30140.47 Combined Clone metrics 41340.87
25
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Results of Case Study (2/2) 25 Precision : “How many refactoring candidates were accepted by a developer?“ Combined clone metrics is more accepted as refactoring candidates by a developer #Refactoring #Selected Clone Sets Precision = Filtering #Selected Clone Sets #RefactoringPrecision Each Single Clone metric 30140.47 Combined Clone metrics 41340.87
26
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study of Open Source Software Goal: validating our key idea Using combined clone metrics is a feasible method to extract code clone for refactoring Using open source software Experimental Step 1. Selected clone sets from CCFinder's output using clone metrics. 2. Checked Clone sets whether they are appropriate for performing refactoring. 26
27
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Target systems implementation in java Apache Ant: 198KLOC, 998 clone sets Jboss : 633KLOC, 4284 clone sets 27
28
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject clone sets Apached Ant: 87 clone sets Jboss: 299 clone sets Clone sets whose either clone metric value is top 10 high Clone sets whose combined clone metrics values are high rank in the 15 28
29
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (Apache Ant) 29 Filtering #Selected Clone Sets #RefactoringPrecision Each Single Clone metric 3060.20 Combined Clone metrics 60310.53
30
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (Jboss) 30 Filtering #Selected Clone Sets #RefactoringPrecision Each Single Clone metric 3090.30 Combined Clone metrics 298760.25 Q.Why results are different between the software? Because of the open source software dose not allow coding rule?
31
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analysis of Results: defects of RNR metric (1/2) 31 RNR metric sometimes extract unintentional code clones E.g., Language-dependent code clones
32
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analysis of Results: defects of RNR metric (2/2) 32 lIndex = lReturn.indexOf( "*" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%2a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( "*" ); } lIndex = lReturn.indexOf( ":" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%3a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( ":" ); } Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS
33
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analysis of Results: defects of RNR metric (2/2) 33 lIndex = lReturn.indexOf( "*" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%2a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( "*" ); } lIndex = lReturn.indexOf( ":" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%3a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( ":" ); } The value of RNR is really 96? Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS
34
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analysis of Results: defects of RNR metric (2/2) 34 lIndex = lReturn.indexOf( "*" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%2a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( "*" ); } lIndex = lReturn.indexOf( ":" ); while( lIndex >= 0 ) { lReturn = ( lIndex > 0 ? lReturn.substring( 0, lIndex ) : "" ) + "%3a" + ( ( lIndex + 1 ) < lReturn.length() ? lReturn.substring( lIndex + 1 ) : "" ); lIndex = lReturn.indexOf( ":" ); } Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS
35
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code Clone in a clone sets whose LEN(S) and RNR(S) (=96) values are high rank in the top 15 in JBOSS RNR value of this clone sets Code Clone in a clone sets whose LEN(S) and RNR(S) (=50) 35 Analysis of Results: defects of RNR metric (2/2)
36
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Summary and Future Work Summary We conducted a case study to validate our key idea and discuss its result Future Work Update used metrics Investigate about recall Use more metrics. Conduct case studies of open source software 36
37
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 37 Thank You for Your Attention! 감사합니다. ありがとうございます
38
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Example of clone set that are not selected… It is too short to organize a semantic unit. RNR metric sometimes extract unintentional code clones E.g., Language-dependent code clones 38 boolean isEqual(final DeweyDecimal other) { final int max = Math.max(other.components.length, components.length); for (int i = 0; i < max; i++) { final int component1 = (i < components.length) ? components[ i ] : 0; final int component2 = (i < other.components.length) ? other.components[ i ] : 0; if (
39
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone sets whose RNR(S) is higher than others Each code clone in a clone set S consists of more non-repeated token sequences 39 /* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) { /* … */
40
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone sets whose RNR(S) is lower than others Consists of more repeated token sequences Involve in language-dependent code clone 40 /* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1.7.0 */ String sosCmdDir = null; …… skip code…. private String filename = null; private boolean noCompress = false; private boolean noCache = false; private boolean recursive = false; private boolean verbose = false; /* … */ Consecutive variable declarations
41
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone metric: RNR(S) (1/2) File: F 1 : a b c a b, F 2 : c c* c* a b, F 3 : d a b, e f F 4 : c c* d e f Superscript * indicated that the token is in a repeated token sequence RNR(S 1 ) of Clone Set S 1 is 41 RNR(S 1 ) = 100 = 100 2 + 2 + 2 + 2 Clone Set: S 1 : {,,, } ab
42
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Clone metric: RNR(S) (2/2) File: F 1 : a b c a b, F 2 : c c* c* a b, F 3 : d a b, e f F 4 : c c* d e f Superscript * indicated that the token is in a repeated token sequence RNR(S 2 ) of Clone Set S 2 is 42 Clone Set: S 2 : {,, } c c* c* c*c c* RNR(S 2 ) = 100 = 33.3 1 + 0 + 1 2 + 2 + 2
43
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University | S RNR ∩ S POP ∩ S RNR ∙ POP | = 1 | S RNR ∩ S RNR ∙ POP | = 2 | S POP ∩ S RNR ∙ POP | = 2 | S LEN ∙ RNR ∩ S LEN ∙ POP ∩ S RNR ∙ POP ∩ S LEN ∙ RNR ∙ POP | = 1 CS セミナー 2010/12/01 43 The Number of Duplicate Clone Set(Industrial)
44
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University | S RNR ∩ S RNR ∙ POP | = 1 | S POP ∩ S RNR ∙ POP | = 1 | S POP ∩ S LEN ∙ POP | = 1 CS セミナー 2010/12/01 44 The Number of Duplicate Clone Set(Apache ant)
45
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University | S RNR ∩ S LEN ∙ RNR | = 3 | S RNR ∩ S RNR ∙ POP | = 1 | S LEN ∙ RNR ∩ S LEN ∙ POP ∩ S RNR ∙ POP ∩ S LEN ∙ RNR ∙ POP | = 2 CS セミナー 2010/12/01 45 The Number of Duplicate Clone Set(JBOSS)
46
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 46 Clone set metrics LEN (C ): Length of token sequence of each element in clone set C POP (C ): Number of elements in clone set C RAD (C ): Distribution in the file system of elements in clone set C DFL (C ): Estimation of how many tokens would be removed from source files when all code fragments of clone set C are replaced with caller statements of a new identical routine new sub routine caller statements
47
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Results, and Precision of each clone set in the survey 47 Filtering#Selected Clone Sets #RefactoringPrecision Clone sets whose LEN(S) value is top 10 high 1070.70 Clone sets whose RNR(S) value is top 10 high 1040.40 Clone sets whose POP(S) value is top 10 high 1030.30 Clone sets whose LEN(S) and RNR(S) values are high rank in the top 15 15130.87 Clone sets whose LEN(S) and POP(S) values are high rank in the top 760.86 RNR(S) and POP(S) values are high rank in the top 15 18140.78 Clone sets whose 1 clone set whose LEN(S), RNR(S), and POP(S) values are high rank in the top 15 111.00
48
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (Apache Ant) 48 Clone Sets#Selected Clone Sets #RefactoringPrecision S LEN 1000.00 S RNR 1060.60 S POP 1000.00 S LENRNR 860.75 S LENPOP 1890.50 S RNRPOP 34160.47 S LENRNRPOP ---
49
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Subject Code Clones (Jboss) 49 Clone Sets#Selected Clone Sets #RefactoringPrecision S LEN 1020.20 S RNR 1070.60 S POP 1000.00 S LENRNR 63370.59 S LENPOP 10450.05 S RNRPOP 129320.25 S LENRNRPOP 221.00
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.