Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.

Slides:



Advertisements
Similar presentations
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary.
Advertisements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
13/07/2015Dr Andy Brooks1 Fyrirlestrar 9 & 10 CCFinder: A Tool to Detect Clones “I can just copy these lines. That is the safest thing to do. The code.
Refactoring Support Tool: Cancer Yoshiki Higo Osaka University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.
Software Engineering Lab, Osaka University Code Clone Analysis and Its Application Katsuro Inoue Osaka University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Code Clone Analysis and Its Application
Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.
INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Kinds of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Engineering.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 ARIES: Refactoring.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Design and Implementation.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University 1 Evaluation of a Business Application Framework Using Complexity.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Detection.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Assessment of the Quality of Refactoring.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Investigation of Opportunities for Refactoring.
Chapter 1 Introduction. Chapter 1 - Introduction 2 The Goal of Chapter 1 Introduce different forms of language translators Give a high level overview.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code Clone Analysis.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Experience of Finding Inconsistently-Changed Bugs in Code Clones of Mobile Software Katsuro Inoue†, Yoshiki Higo†, Norihiro Yoshida†, Eunjong Choi†, Shinji.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 コードクローン解析に基づくリファクタリング支援.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Towards a Collection of Refactoring Patterns Based.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Aries: Refactoring.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
On Detection of Gapped Code Clones using Gap Locations Yasushi Ueda†, Toshihiro Kamiya‡, Shinji Kusumoto†, and Katsuro Inoue† †Graduate School of Information.
Source File Set Search for Clone-and-Own Reuse Analysis
Do Developers Focus on Severe Code Smells?
Refactoring Support Based on Code Clone Analysis
CBCD: Cloned Buggy Code Detector
Ruru Yue1, Na Meng2, Qianxiang Wang1 1Peking University 2Virginia Tech
Yuta Nakamura1, Eunjong Choi1, Norihiro Yoshida2,
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Refactoring Support Tool: Cancer
Quaid-i-Azam University
Finding File Clones in FreeBSD Ports Collection
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Multilingual Detection of Code Clones Using ANTLR Grammar Definitions
On Refactoring Support Based on Code Clone Dependency Relation
Kazuki Yokoi1 Eunjong Choi2 Norihiro Yoshida3 Katsuro Inoue1
Research Activities of Software Engineering Lab in Osaka University
Dotri Quoc†, Kazuo Kobori†, Norihiro Yoshida
Presentation transcript:

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar large-scale software products Eunjong Choi†, Norihiro Yoshida‡, Yoshiki Higo†, Katsuro Inoue† †Osaka University ‡Nara Institute of Science and Technology 1

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Software Development for Mobile Device (1/2) Releases a new model in regular and rapid rushed intervals Adapts to various country constraints and needs  e.g. Oshaifu-Keitai for Japan 2

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Software Development for Mobile Device (2/2) Develop software by reusing common pieces and implement unique pieces for each feature. 3 +++ Reused pieces Unique features

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Reused Source Code Pieces Source code is reused in code fragment level (code clones) and file level . Detecting and managing reused pieces is necessary  e.g. Inconsistency management, plagiarism detection 4 A code clone : a code fragment that has lexically, syntactically, or semantically similar code fragments in source code

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Code Clone Generated by:  Code reuse by copy & paste  Stereotyped functions or tool generated code 5 Code Clone A clone set: A set of code clones that are similar or identical to each other

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Code Clone Detection Various detection techniques and tools have been proposed  e.g. Text-based and line-based(dup)[Baker1995], token-base(CP-minder)[Li2006] CCFinder [Kamiya2002]  A token-base clone detection tool  Multi language support (C, C++, COBOL, Java,...)  Good speed (5MLOC/20m) 6 [Baker1995] B. S. Baker. On nding duplication and near-duplication in large software systems. In Proc. of WCRE, pages 86, July [Li 2006] Z. Li, S. Lu, S. Myagmar and Y. Zhou. CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code. IEEE Transactions on Software Engineering, 32: pages , 2006 [Kamiya2002] T. Kamiya, S. Kusumoto and K. Inoue. CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code. IEEE TSE, 28: , 2002

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Example of Clone Detection Technique: CCFinder Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Code clones 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting 7 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Problem of Code Clone Detection Tools Take enormous time for existing tools to detect code clones on large-scale software Suggest an approach for detecting code clone for a collection of similar large-scale software products  Excluding detecting code clones among each set of files that are identical each other 8 A identical file set : A set of files that are identical each other

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Overview Of Our Approach Step1. Calculate MD5 hash. Step2. Prepare Input Files for CCFinder Step3. Detect code clones using CCFinder Step4. Generate all clone sets 9 (1) Calculate MD5 Hash (2) Prepare Input Files for CCFinder (3) Detect Code Clones Using CCFinder (4) Generate All Clone Sets Hashed files Input Files for CCFinder Identical File Sets Clone Sets All Clone Sets Source Files

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step1. Calculate MD5 Hash Creates MD5 hash value of input files  MD5 hash does not require any large substitution tables 10 Ce9e187434e357 46abf2 C9ad2 A77bdd2 7ed90608d ccd1164dc Calculate MD5 Hash Source Files

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step2. Prepare Input Files for CCFinder (1/2) Detect identical file sets 11 0cc A1 A2A3 C1D1 175A9.. be05.. B2 B1 MD5 Hash Identical File Set A Identical File Set B Identical File Sets Input Files for CCFinder

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 12 C1D1 175A9.. be05 B2 B1.. Identical File Set A Identical File Set B Identical File Sets Input Files for CCFinder 0cc A1 A2A3 Step2. Prepare Input Files for CCFinder (2/2) Prepare Input Files for CCFinder

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step3. Detect Code Clones Use CCFinder to detect code clones 13 C1D1.. B2 B1.. Identical File Set A Identical File Set B Identical File Sets A1 A2A3 Detected Code Clones Code Clones Detected by CCFinder

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step3. Detect Code Clones Use CCFinder to detect code clones 14 C1D1.. Identical File Set A Identical File Set B Identical File Sets A1 A2A3 Detected Code Clones B2 B1 D1 Clone Sets Detected by CCFinder Clone Set 1 Clone Set 2

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step4. Generate all clone sets Generate all clone sets 15 A1 A2A3.. B2 B1.. Identical File Set A Identical File Set B A2A3 C1D1 A1 D1 B2 B1 All Clone Sets Clone Set 1 Clone Set 2.. Identical File Sets

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Overview of Case Study (2/2) Approach  Compare detection time between our method and using only CCFinder  Confirm that the detection result of our method is the same as the one of using only CCFinder Detection Environment  64 bits Windows 7 Professional workstation equipped with 2 processors and 24 gigabytes of main memory. 16

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Results of Case Study (1/2) Detection Time  Our approach detects code clones faster than using only CCFinder. 17 Project NameOnly CCFinderOur Approach #Clone Sets Time (sec.) #Clone Sets #Identical File Sets Time (sec.) Apache ant11, ,6923,08389 Linux kernel24,2351,11921,3431, Galxy y pro(GT-B5510 model 325,274113, ,76128,18111,902

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Results of Case Study (2/2) Accuracy of Results : manually checked outputs  Arbitrary selected 30 clone sets that are detected by our approach from each OSS  Selected 30 identical file sets from each OSS project. 18

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Summary Suggest an approach for detecting code clones for a collection of similar large-scale software products.  MD5 hash to identify identical file sets  CCFinder to detect code clones. Apply our approach to three OSS projects and compared code clone detection time between using only CCFinder and our approach.  Our approach takes shorter time to detect code clones. 19

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Future Work Improve for detecting files with slightly modification as identical file sets  Our current approach detects file that are identical each as a identical file set Apply to various size of software projects in different domains Introduce other code clones detection tools and compare results from them in the case study 20

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Thank you for your attentions 21