2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.

Slides:



Advertisements
Similar presentations
Function Point Measurement from Java Programs
Advertisements

Chapter 7 User-Defined Methods. Chapter Objectives  Understand how methods are used in Java programming  Learn about standard (predefined) methods and.
 Lex helps to specify lexical analyzers by specifying regular expression  i/p notation for lex tool is lex language and the tool itself is refered to.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
ISBN Chapter 10 Implementing Subprograms.
A Tool Support to Merge Similar Methods with a Cohesion Metric COB ○ Masakazu Ioka 1, Norihiro Yoshida 2, Tomoo Masai 1,Yoshiki Higo 1, Katsuro Inoue 1.
Programming Languages Structure
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
13/07/2015Dr Andy Brooks1 Fyrirlestrar 9 & 10 CCFinder: A Tool to Detect Clones “I can just copy these lines. That is the safest thing to do. The code.
Refactoring Support Tool: Cancer Yoshiki Higo Osaka University.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Java Software Solutions Foundations of Program Design Sixth Edition by Lewis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Lab, Osaka University Code Clone Analysis and Its Application Katsuro Inoue Osaka University.
Course Revision Contents  Compilers  Compilers Vs Interpreters  Structure of Compiler  Compilation Phases  Compiler Construction Tools  A Simple.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Mining Coding Patterns to Detect Crosscutting Concerns.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Code Clone Analysis and Its Application
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
1 Programming Languages Tevfik Koşar Lecture - II January 19 th, 2006.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 ARIES: Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University 1 Evaluation of a Business Application Framework Using Complexity.
D. M. Akbar Hussain: Department of Software & Media Technology 1 Compiler is tool: which translate notations from one system to another, usually from source.
Lexical and Syntax Analysis
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Detection.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Assessment of the Quality of Refactoring.
CPS 506 Comparative Programming Languages Syntax Specification.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Investigation of Opportunities for Refactoring.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
Duplicate code detection using anti-unification Peter Bulychev Moscow State University Marius Minea Institute eAustria, Timisoara.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Chapter 1 Introduction Major Data Structures in Compiler
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University A Slicing Method for Object-Oriented Programs Using Lightweight.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code Clone Analysis.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
Cross Language Clone Analysis Team 2 February 3, 2011.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 コードクローン解析に基づくリファクタリング支援.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
© 2006 Pearson Addison-Wesley. All rights reserved 1-1 Chapter 1 Review of Java Fundamentals.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Aries: Refactoring.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Rendering XML Documents ©NIITeXtensible Markup Language/Lesson 5/Slide 1 of 46 Objectives In this session, you will learn to: * Define rendering * Identify.
On Detection of Gapped Code Clones using Gap Locations Yasushi Ueda†, Toshihiro Kamiya‡, Shinji Kusumoto†, and Katsuro Inoue† †Graduate School of Information.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Metric-based Approach for Reconstructing Methods.
Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information.
Refactoring Support Based on Code Clone Analysis
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Chapter 6 Methods: A Deeper Look
Refactoring Support Tool: Cancer
Compiler design.
On Refactoring Support Based on Code Clone Dependency Relation
Dotri Quoc†, Kazuo Kobori†, Norihiro Yoshida
Presentation transcript:

2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji Kusumoto*, Katsuro Inoue* * Graduate School of Information and Science Technology, Osaka University **PRESTO, Japan Science and Technology Corporation

2002/12/11PROFES20022 Contents Background: Code Clone Objective Code Clone Detection Tool: CCFinder Proposed Clone Removal Technique Case Studies Summaries and Future Works

2002/12/11PROFES20023 Background: Code Clone Code clone is a code portion in source files that is identical or similar to another. Clone Pair Clone Class Code clone is one of factors that make software maintenance more difficult. If some faults are found in a code clone, it is necessary to correct the faults in its all code clones.

2002/12/11PROFES20024 Code Clone Detection Tool: CCFinder We have been developing a code clone detection tool, CCFinder. We have delivered CCFinder to software companies and evaluated the usefulness through some case studies.

2002/12/11PROFES20025 Case studies of CCFinder Open source software Commercial Software (about 30 companies) Students exercise of Osaka University Filed in a court as an evidence for software copyright suit JDK libraries (Java, 570 KLOC) Linux, FreeBSD (C, MLOC) FreeBSD, OpenBSD , NetBSD(C) Qt(C++ , 240KLOC) NTT Data Corp., Hitachi Ltd., Hitachi GP, NEC soft Ltd., ASTEC Inc., SRA Inc., NASDA , Daiwa Computer, etc …

2002/12/11PROFES20026 Purpose of our research As an actual application of CCFinder, we want to use code clone analysis in refactoring process. But code clones detected by CCFinder are sequences of tokens, such code clones are not appropriate to be directly replaced by one module (subroutine, function et.).

2002/12/11PROFES20027 Objective We propose a method to extract code clones from ones detected by CCFinder, which are well-suited to refactoring process ([Extract Method], [Pull Up Method])*. We apply the proposed method to some open source softwares, and evaluate the usefulness of it. *M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.

2002/12/11PROFES20028 Extract Method Void methodA(int i){ methodZ(); System.out.println(“name:” + name); System.out.println(“amount:” + i); } Void methodB(int i){ methodY(); System.out.println(“name:” + name); System.out.println(“amount:” + i); } void methodA(int i){ methodZ(); methodC(i); } void methodB(int i){ methodY(); methodC(i); } Void methodC(int i){ System.out.println(“name:” + name); System.out.println(“amount:” + i); } methodC(i);

2002/12/11PROFES20029 Pull Up Method method A class A class B class C class A class B class C method A

2002/12/11PROFES Outline of CCFinder Clone detection process consists of four steps. Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs CCFinder Step 1 Step 2 Step 3 Step 4 Target program C / C++ Java FORTRAN COBOL LISP Plain Text

2002/12/11PROFES Process of CCFinder(1/4) Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Clone pairs Transformation Lexical analysis

2002/12/11PROFES Process of CCFinder(3/4) Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs

2002/12/11PROFES Process of CCFinder(3/4) Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs

2002/12/11PROFES Process of CCFinder(4/4) 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Clone pairs

2002/12/11PROFES Issues in refactoring process Since code clones detected by CCFinder are sequences of tokens, so they are not appropriate to be directly replaced by one module (subroutine, function et.). Some of them do not suit refactoring.

Example 1 (Code clones including needless statements for refactoring) righttokennumber = c.getEndNumber() - c.getStartNumber() + 1; } string getLeftClone() const { char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\t%d,%d,%d\t%d,%d,%d\t",leftID.c_str(), leftstartline,leftstartcolumn,leftstartnumber, leftendline,leftendcolumn,leftendnumber); string clone(temp); return clone; } string getRightClone() const { char temp[STRLENGTH]; snprintf(temp,STRLENGTH, "%s\t%d,%d,%d\t%d,%d,%d\t",rightID.c_str(), rightstartline,rightstartcolumn,rightstartnumber, rightendline,rightendcolumn,rightendnumber); string clone(temp); return clone; } int getLeftTokenNumber() const { return lefttokennumber; }

2002/12/11PROFES Example 1 (Code clones including needless statements for refactoring) parts should be detected. Only

Example 2 (Code clones not suited to refactoring) CCFinder extracts parts as code clones. But, these are not suited to refactoring.

2002/12/11PROFES Outline of proposed method It extracts meaningful code clone from output of CCFinder. CCFinder Filter Source filesMeaningful clone data Clone data GUI Interface

2002/12/11PROFES Processes executed by the filter Clone extraction unit : It extracts meaningful code clones from the result of syntax analysis and the output of CCFinder. Output of CCFinder Source files Syntax analysis unit Clone extraction unit Clone management unit Output Clone management unit : It sorts and merges code clones detected by clone extraction unit. Syntax analysis unit : It performs syntax analysis to source code including code clones.

2002/12/11PROFES Implementation of proposed method CCShaper ( Code Clone Shaper) Target program: Java CCShaper extracts meaningful block from the output of CCFinder Description language: C++ Source size: about 4000 LOC Working environment : Windows2000/XP

2002/12/11PROFES Outline of experiments We conducted two experiments using two kinds of Java source codes, ANTLR and Ant, which are open source software. Experimental environment Pentium4 1.5GHz memory SDRAM512MB We deal with code clones which include more than 50 tokens. CCFinder Code clones Meaningful code clones CCShaper Source code of ANTLR, Ant

2002/12/11PROFES Two criteria Clone Pair Clone Class A Clone Pair is each pair of clone code portions. A Clone Class is a collection of code portions that are code clones each other. In this experiment, we use two criteria, the number of clone pairs and clone classes.

2002/12/11PROFES Experiment 1(ANTLR)(1/3) Without CCShaperWith CCShaper Number of clone pairs Number of clone classes Analysis time: about 2 minutes ANTLR is implemented in Java, and generates parsers in either Java or C Source files Size: about LOC Result 1/400 1/7

2002/12/11PROFES Experiment 1(ANTLR)(2/3) Comparing on scatter plot Without CCShaper With CCShaper a

2002/12/11PROFES Experiment 1(ANTLR)(3/3) Source code of the selected code clone public final void mOPEN_ELEMENT_OPTION(boolean _createToken) throws RecognitionException, CharStreamException, TokenStreamException { int _ttype; Token _token=null; int _begin=text.length(); ttype = OPEN_ELEMENT_OPTION; int _saveIndex; match('<'); if ( _createToken && _token==null && _ttype!=Token.SKIP ) { _token = makeToken(_ttype); _token.setText(new String(text.getBuffer(), _begin, text.length()- _begin)); } _returnToken = _token; } (a) Only portions are different from other clones. This code clone appears in 20 places of ANTLR. All code clones are methods included in the same class. These methods can be merged to one method by adding 2 arguments.

2002/12/11PROFES Experiment 2(Ant)(1/4) Ant is a Java-based build tool. 689 source files. Size: about LOC. Result Without CCShaperWith CCShaper Number of clone pairs Number of clone classes Analysis time: about 5 seconds 1/80 1/13

2002/12/11PROFES Experiment 2(Ant)(2/4) Comparing on scatter plot Without CCShaper With CCShaper b

2002/12/11PROFES Experiment 2(Ant)(3/4) public void getAutoresponse(Commandline cmd) { if (m_AutoResponse == null) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } else if (m_AutoResponse.equalsIgnoreCase("Y")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_YES); } else if (m_AutoResponse.equalsIgnoreCase("N")) { cmd.createArgument().setValue(FLAG_AUTORESPONSE_NO); } else { cmd.createArgument().setValue(FLAG_AUTORESPONSE_DEF); } // end of else } Source code of the selected code clone (b) These clones are verbatimly the same ones These clones appear in seven classes These seven classes inherit a same class These methods can be merged to one method by pulling up to the parent class

2002/12/11PROFES Experiment 2(Ant)(4/4) Class diagram (before refactoring) MSVSSADDMSVSSCHECKINMSVSSCHECKOUT MSVSSCPMSVSSCREATEMSVSSGETMSVSSLABEL MSVSS getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) getAutoresponse (Commandline cmd) Class diagram (after refactoring) MSVSSADDMSVSSCHECKINMSVSSCHECKOUT MSVSSCPMSVSSCREATEMSVSSGETMSVSSLABEL MSVSS getAutoresponse (Commandline cmd)

2002/12/11PROFES Summaries We have developed a filtering tool (CCShaper) which extracts code clones that are well-suited to refactoring activity We have evaluated the usefulness of CCShaper by applying it to actual Java programs

2002/12/11PROFES We are going to apply CCShaper to commercial software, extend it as to apply other programming languages, develop a filtering method which can extract code clones more-suited to refactoring. Future works

2002/12/11PROFES200233

2002/12/11PROFES Web page of CCFinder/Gemini is available at u.ac.jp/cdtools/index.html.en

2002/12/11PROFES The difference between ‘ diff ’ and clone detection tools Diff finds the longest common sub- string. Given a code portion, diff does not report two or more same code portions (clones). Clone detection tool finds all the same or similar code portions.

2002/12/11PROFES Suffix-tree Suffix tree is a tree that satisfies the following conditions. 1. A leaf node represents the starting position of sub-string. 2. A path from root node to a leaf node represents a sub-string. 3. First characters of labels of all the edges from one node are different from each other. → A common path means a clone

2002/12/11PROFES Example of transformation rules in Java All identifiers defined by user are transformed to same tokens. Unique identifier is inserted at each end of the top-level definitions and declarations. Prevents detecting clones that begin at the middle of class definition and end at the middle of another one. ”java. lang. Math. PI” is transformed to ”Math. PI”. By using import sentence, a class is referred to with either full package name or a shorter name ” new int[] {1, 2, 3} ” is transformed to ” new int[] {$} ” Eliminates table initialization code.

2002/12/11PROFES The output of CCFinder Output of CCFinder #version: ccfinder 3.1 #langspec: JAVA #option: -b 30,1 #option: -k + #option: -r abcdfikmnprsv #option: -c wfg #begin{file description} C:\Gemini.java C:\GeneralManager.java : #end{file description} #begin{clone} ,9 63, ,9 553, ,9 63, ,9 633, ,9 152, ,9 216,51 42 : #end{clone} Object file ID ( file 0 in Group 0 ) Location of a clone pair ( Lines in file 0.1 and Lines in file 1.10 are identical or similar to each other) It is difficult to analyze source code by only this text-based information of the location of clone pairs.

2002/12/11PROFES The analysis of comparison among students (non-gapped clones only) A B The corresponding code A (2 students)  Similar code fragments were from source code of sample compiler described in textbook. B (4 students)  Many code fragments were similar even with respect to name of variables or comments.