Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Aries: Refactoring.

Slides:



Advertisements
Similar presentations
Symbol Table.
Advertisements

ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
A Tool Support to Merge Similar Methods with a Cohesion Metric COB ○ Masakazu Ioka 1, Norihiro Yoshida 2, Tomoo Masai 1,Yoshiki Higo 1, Katsuro Inoue 1.
Data Abstraction and Object- Oriented Programming CS351 – Programming Paradigms.
13/07/2015Dr Andy Brooks1 Fyrirlestrar 9 & 10 CCFinder: A Tool to Detect Clones “I can just copy these lines. That is the safest thing to do. The code.
Refactoring Support Tool: Cancer Yoshiki Higo Osaka University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Software Engineering Lab, Osaka University Code Clone Analysis and Its Application Katsuro Inoue Osaka University.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Mining Coding Patterns to Detect Crosscutting Concerns.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Code Clone Analysis and Its Application
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 ARIES: Refactoring.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
Cross Language Clone Analysis Team 2 October 27, 2010.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Design and Implementation.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University 1 Evaluation of a Business Application Framework Using Complexity.
Refactoring1 Improving the structure of existing code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Technology and Science, Osaka University Dependence-Cache.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Detection.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Assessment of the Quality of Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University A Slicing Method for Object-Oriented Programs Using Lightweight.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code Clone Analysis.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
 In the java programming language, a keyword is one of 50 reserved words which have a predefined meaning in the language; because of this,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
Cross Language Clone Analysis Team 2 February 3, 2011.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 コードクローン解析に基づくリファクタリング支援.
Refactoring1 Improving the structure of existing code.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
On Detection of Gapped Code Clones using Gap Locations Yasushi Ueda†, Toshihiro Kamiya‡, Shinji Kusumoto†, and Katsuro Inoue† †Graduate School of Information.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Metric-based Approach for Reconstructing Methods.
STATIC CODE ANALYSIS. OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR.
Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information.
Information and Computer Sciences University of Hawaii, Manoa
Refactoring Support Based on Code Clone Analysis
A Pluggable Tool for Measuring Software Metrics from Source Code
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Predicting Fault-Prone Modules Based on Metrics Transitions
Improving the structure of existing code
Refactoring Support Tool: Cancer
Compiler design.
On Refactoring Support Based on Code Clone Dependency Relation
Presentation transcript:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Aries: Refactoring Support Environment Based on Code Clone Analysis Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue Graduate School of Information Science and Technology, Osaka University Presto, Japan Science and Technology Agency

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2 Background What is code clone? a code fragment that has identical or similar fragments in the same or different files in a system introduced in the source program because of various reasons such as reusing code by `copy-and-paste’ makes software maintenance more difficult. copy-and-paste

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3 Requirements for Code Clone Detection Appropriate code clones should be detected in compliance with demands. To understand the amount and distribution of code clones, it is desirable to detect all code clones To remove code clones (Restructuring or Refactoring), it is useful to detect code clones that can be removed, and also removing them improves software maintainability

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4 Research Objective and Approach We aim to extract code clones which can be easily refactored Approach To detect code clones efficiently, we use a code clone detection tool, CCFinder. Then, we extract the specific code clones easily refactored and provide applicable refactoring patterns for the code clones. Finally, we develop a refactoring support tool and apply it to an open source program.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5 Refactoring Process Support Commonly used refactoring process Step 1: Determine where refactoring should be applied Step 2: Determine which refactoring patterns can/should be applied Step 3: Investigate the effectiveness of the refactoring patterns Step 4: Modify source code Step 5: Conduct regression tests Proposed method supports Steps1 and 2 High scalability: it take less of high time complexity. Detect fine-grained clone: it detect more fine-graded code clone than method unit.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6 Outline of CCFinder CCFinder directly compares source code on token unit, and detects code clones Normalization of name space Replacement of names defined by user Removal of table initialization Consideration of module delimiter CCFinder can analyze the system of millions line scale in practical use time

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7 Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting CCFinder: Clone Detection Process

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 8 Definitions: Clone Pair and Clone Set Clone Pair: a pair of identical or similar fragments Clone Set: a set of identical or similar fragments CCFinder detects code clones as a clone pair After detection process, clone pairs are transformed into clone sets C1 C5 C4 C3 C2 Clone PairClone Set (C1, C4){C1, C4, C5} (C1, C5){C2, C3} (C2, C3) (C4, C5)

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 9 Extraction of code clones easily refactored Structural code clones are regarded as the target of refactoring Detect clone pairs by CCFinder Transform the detected clone pairs into clone sets Extract structural parts as structural code clones from the detected clone sets What is structural code clone ? example: Java language Declaration: class declaration, interface declaration Method: method body, constructor, static initializer statement: do, for, if, switch, synchronized, try, while

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University : reset(); 610: grammar = g; 611: // Lookup make-switch threshold in the grammar generic options 612: if (grammar.hasOption("codeGenMakeSwitchThreshold")) { 613: try { 614: makeSwitchThreshold = grammar.getIntegerOption("codeGenMakeSwitchThreshold"); 615: //System.out.println("setting codeGenMakeSwitchThreshold to " + makeSwitchThreshold); 616: } catch (NumberFormatException e) { 617: tool.error( 618: "option 'codeGenMakeSwitchThreshold' must be an integer", 619: grammar.getClassName(), 620: grammar.getOption("codeGenMakeSwitchThreshold").getLine() 621: ); 622: } 623: } 624: 625: // Lookup bitset-test threshold in the grammar generic options 626: if (grammar.hasOption("codeGenBitsetTestThreshold")) { 627: try { 628: bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold"); 623: } 624: 625: // Lookup bitset-test threshold in the grammar generic options 626: if (grammar.hasOption("codeGenBitsetTestThreshold")) { 627: try { 628: bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold"); 629: //System.out.println("setting codeGenBitsetTestThreshold to " + bitsetTestThreshold); 630: } catch (NumberFormatException e) { 631: tool.error( 632: "option 'codeGenBitsetTestThreshold' must be an integer", 633: grammar.getClassName(), 634: grammar.getOption("codeGenBitsetTestThreshold").getLine() 635: ); 636: } 637: } 638: 639: // Lookup debug code-gen in the grammar generic options 640: if (grammar.hasOption("codeGenDebug")) { 641: Token t = grammar.getOption("codeGenDebug"); 642: if (t.getText().equals("true")) { fragment 1 fragment 2 Code clones which CCFinder detects Code clones which proposed method detects

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University : if ( inputState.guessing==0 ) { 1528: t=a.getText(); 1529: } 1530: { 1531: _loop84: 1532: do { 1533: if ((LA(1)==COMMA)) { 1534: match(COMMA); 1535: id(); 1536: if ( inputState.guessing==0 ) { 1537: t+=","+b.getText(); 1538: } 1539: } 1007: if ( inputState.guessing==0 ) { 1008: buf.append(a.getText()); 1009: } 1010: { 1011: _loop144: 1012: do { 1013: if ((LA(1)==WILDCARD)) { 1014: match(WILDCARD); 1015: a=id(); 1016: if ( inputState.guessing==0 ) { 1017: buf.append('.'); buf.append(a.getText()); 1018: } 1019: } fragment 3 fragment 4 Code clones which CCFinder detects

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 12 Provision of applicable refactoring patterns Following refactoring patterns[1][2] can be used to remove clone sets including structural code clones Extract Class, Extract Method, Extract Super Class, Form Template Method, Move Method, Parameterize Method, Pull Up Constructor, Pull Up Method, For each clone set, the proposed method determines which refactoring pattern is applicable by using several metrics. [1]: M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, [2]:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 13 Metrics(1):Volume Metrics for Clone Set LEN, POP, DFL LEN(S): is the average length of token sequence for a clone set S POP(S): is the number of elements (code fragments) of a clone set S DFL(S): indicates an estimation of how many tokens would be removed from source files when all code fragments in a clone set S are reconstructed new sub routine caller statements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 14 Metrics(2): Coupling Metrics for Clone Set NRV, NSV NRV(S): represents the average number of externally defined variables referred in the fragments of a clone set S NSV(S): represents the average number of externally defined variables assigned to in the fragments of a clone set S Definition Clone set S includes fragment f 1, f 2, ・・・, f n s i is the number of externally defined variable which fragment f i refers t i is the number of externally defined variable which fragment f i assigns int a, b; … if( … ){ … ; … = b; a = … ; … ; } … assignment reference Fragment f 1 example : ・ Clone set S includes fragments f 1 and f 2. ・ In fragment f 1, externally defined variable b is referred and a is assigned to. ・ Fragment f 2 is same as f 1. then , NRV(S) = ( ) / 2 = 1 NSV(S) = ( ) / 2 = 1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15 Metrics ( 3 ): Inheritance Metric for Clone Set DCH DCH(S): represents the position and distance between each fragment of a clone set S Definition Clone set S includes fragment f 1, f 2, ・・・, f n Fragment f i exists in class C i Class C p is a class which locates lowest position in C 1, C 2, ・・・, C n on class hierarchy If no common parent class of C 1 , C 2 ,・・・, C n exists, the value of DCH(S) is -1 This metric is measured for only the class hierarchy where target software exists. example 1: ・ Clone set S includes fragments f 1 and f 2. ・ If all fragments of clone set S are included in a same class, then , DCH(S) = 0 class A fragment f 1 fragment f 2 class A class B class C fragment f 1 fragment f 2 example 2 : ・ Clone set S includes fragments f 1 and f 2. ・ If all fragments of clone set S are included in a class and its direct child classes, then , DCH(S) = 1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 16 Aries: Refactoring Support Tool Overview Target: Java programs Runtime environment: JDK1.4 or above Implementation Analysis component: Java 32,000 Lines CCFinder is used as code clone detection component JavaCC is used to construct syntax and semantic analysis component GUI component: Java14,000 Lines User can specify target clone sets through GUI operations.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 17 Case Study: Ant Overview Ant is one of build tools like ‘make’ Input for Aries Source files of Ant: 627 LOC: about 180,000 It took 30 seconds to extract structural code clones We got 151 clone sets. Environment OS: FreeBSD 4.9 CPU: Xeon 2.8G x 2 Memory: 4GB

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 18 Case Study: Ant Extract Method (conditions) To apply ‘Extract Method’ pattern, we filtered clone sets by using following conditions The unit of clone is statement (do, for, if, …) Set the value of DCH(S) = 0 All fragments of a clone set are included in a class Set the value of NSV(S) < 2 Each fragment of a clone set assigns any value to 1 or no externally defined variable. 32 clone sets satisfied these conditions

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 19 Case Study: Ant Extract Method(result) 32 clone set can be categorized as followings categorynumber No parameter, no return value 3 Addition of some parameters, no return value 18 Addition of some parameters and return the value 7 Others 4 if (!isChecked()) { // make sure we don't have a circular reference here Stack stk = new Stack(); stk.push(this); dieOnCircularReference(stk, getProject()); } if (iSaveMenuItem == null) { try { iSaveMenuItem = new MenuItem(); iSaveMenuItem.setLabel("Save BuildInfo To Repository"); } catch (Throwable iExc) { handleException(iExc); } } assignment if (name == null) { if (other.name != null) { return false; } } else if (!name.equals(other.name)) { return false; } // javacopts if (javacopts != null && !javacopts.equals("")) { genicTask.createArg().setValue("-javacopts"); genicTask.createArg().setLine(javacopts); } local variable

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20 Conclusion We have proposed refactoring support method implemented a refactoring support tool, Aries conducted a case study to Ant, which is an open source program, and most of filtered clone sets could be removed.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 21 Future Works As future works, we are going to evaluate whether or not each refactoring should be done as the viewpoint of software quality (support Step 3) find a group of clone sets that can be refactored at once to conduct refactoring more effectively Commonly used refactoring process Step 1: Determine where refactoring should be applied Step 2: Determine which refactoring patterns can/should be applied Step 3: Investigate the effectiveness of the refactoring patterns Step 4: Modify source code Step 5: Conduct regression tests

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 22

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 23 Code clone detection for refactoring: Related Works Detect similar sub-graphs as clone on program dependency graph [1]. High accuracy: This approach finds out data-dependence and control dependence in source codes. High time complexity: It takes O(n 2 ) time to construct program dependency graph. Detect similar methods and functions as clone using metrics [2]. Low accuracy: if the size of target method or function is small, the values of metric make no difference. detection unit restriction: only method and function unit clone can be detected. [1] R. Komondoor and S. Horwitz, “ Using slicing to identify duplication in source code ”, In Proc. of the 8th International Symposium on Static Analysis, Paris, France, July 16-18, [2] Magdalena Balazinska, Ettore Merlo, Michel Dagenais, Bruno Lague, and Lostas Kontogiannis, “Advanced Clone-Analysis to Support Object-Oriented System Refactoring”, WCRE 2000, pp

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 24 The difference between ‘diff’ and clone detection tools Diff finds the longest common sub-string. Given a code portion, diff does not report two or more same code portions (clones). Clone detection tool finds all the same or similar code portions.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 25 Suffix-tree Suffix tree is a tree that satisfies the following conditions. 1.A leaf node represents the starting position of sub-string. 2.A path from root node to a leaf node represents a sub-string. 3.First characters of labels of all the edges from one node are different from each other. → A common path means a clone

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26 Example of transformation rules in Java All identifiers defined by user are transformed to same tokens. Unique identifier is inserted at each end of the top-level definitions and declarations. Prevents detecting clones that begin at the middle of class definition and end at the middle of another one. ” java. lang. Math. PI ” is transformed to ” Math. PI ”. By using import sentence, a class is referred to with either full package name or a shorter name ” new int[] {1, 2, 3} ” is transformed to ” new int[] {$} ” Eliminates table initialization code.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 27 The output of CCFinder Output of CCFinder #version: ccfinder 3.1 #langspec: JAVA #option: -b 30,1 #option: -k + #option: -r abcdfikmnprsv #option: -c wfg #begin{file description} C:\Gemini.java C:\GeneralManager.java : #end{file description} #begin{clone} ,9 63, ,9 553, ,9 63, ,9 633, ,9 152, ,9 216,51 42 : #end{clone} Object file ID ( file 0 in Group 0 ) Location of a clone pair ( Lines in file 0.1 and Lines in file 1.10 are identical or similar to each other) It is difficult to analyze source code by only this text-based information of the location of clone pairs.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 28 The analysis of comparison among students (non-gapped clones only) A B The corresponding code A (2 students) Similar code fragments were from source code of sample compiler described in textbook. B (4 students) Many code fragments were similar even with respect to name of variables or comments.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 29 Clone class metrics LEN (C ): Length of token sequence of each element in clone class C LNR (C) : Length of non-repetitive token sequence of LEN(C) POP (C ): Number of elements in clone class C DFL (C ): Estimation of how many tokens would be removed from source files when all code fragments of clone class C are replaced with caller statements of a new identical routine RAD (C ): Distribution in the file system of elements in clone class C new sub routine caller statements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 30 Comparison with AST approach Features of AST approach Extract the same sub-trees of AST as a clone The result is precise because of strict syntax analysis. High space and time complexity Features of Our approach Hybrid approach of CCFinder’s quick but inaccurate clone detection and CCShaper’s filtering considering syntax structure.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 31 The other approaches AST(Abstract syntax tree) approach Clone = the same sub-trees in an AST Deep dependence on program language PDG(Program dependency Graph) approach Clone = the same sub-graph in a PDG Graph comparison is difficult Code metric Clone = the routines which have the same metric values Severe restriction in granularity CCFinder&CCShaper Clone = the code fragments which have the same syntax structure Limited precision

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32 Why I choose “a” I selected the clones by the following criteria All clone code fragments appear in the same class The metric LEN is high The code fragment includes a whole method body

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 33 Metrics ( 3 ): Inheritance Metric for Clone Set DCH DCH(S): represents the position and distance between each fragment of a clone set S Definition Clone set S includes fragment f 1, f 2, ・・・, f n Fragment f i exists in class C i Class C p is a class which locates lowest position in C 1, C 2, ・・・, C n on class hierarchy If no common parent class of C 1 , C 2 ,・・・, C n exists, the value of DCH(S) is -1 This metric is measured for only the class hierarchy where target software exists. example 1: ・ Clone set S includes fragments f 1 and f 2. ・ If all fragments of clone set S are included in a same class, then , DCH(S) = 0 class A fragment f 1 fragment f 2 class A class B class C fragment f 1 fragment f 2 example 2 : ・ Clone set S includes fragments f 1 and f 2. ・ If all fragments of clone set S are included in a class and its direct child classes, then , DCH(S) = 1 fragment f 1 fragment f 2 class A class B example 3 : ・ Clone set S includes fragments f 1 and f 2. ・ If all classes which include f 1 and f 2 don ’ t have common parent class, then , DCH(S) = -1