Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.

Similar presentations


Presentation on theme: "Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar."— Presentation transcript:

1 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar large-scale software products Eunjong Choi†, Norihiro Yoshida‡, Yoshiki Higo†, Katsuro Inoue† †Osaka University ‡Nara Institute of Science and Technology 1

2 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Software Development for Mobile Device (1/2) Releases a new model in regular and rapid rushed intervals Adapts to various country constraints and needs  e.g. Oshaifu-Keitai for Japan 2

3 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Software Development for Mobile Device (2/2) Develop software by reusing common pieces and implement unique pieces for each feature. 3 +++ Reused pieces Unique features

4 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Reused Source Code Pieces Source code is reused in code fragment level (code clones) and file level . Detecting and managing reused pieces is necessary  e.g. Inconsistency management, plagiarism detection 4 A code clone : a code fragment that has lexically, syntactically, or semantically similar code fragments in source code

5 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Code Clone Generated by:  Code reuse by copy & paste  Stereotyped functions or tool generated code 5 Code Clone A clone set: A set of code clones that are similar or identical to each other

6 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Code Clone Detection Various detection techniques and tools have been proposed  e.g. Text-based and line-based(dup)[Baker1995], token-base(CP-minder)[Li2006] CCFinder [Kamiya2002]  A token-base clone detection tool  Multi language support (C, C++, COBOL, Java,...)  Good speed (5MLOC/20m) 6 [Baker1995] B. S. Baker. On nding duplication and near-duplication in large software systems. In Proc. of WCRE, pages 86, July 1995. [Li 2006] Z. Li, S. Lu, S. Myagmar and Y. Zhou. CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code. IEEE Transactions on Software Engineering, 32: pages 176-192, 2006 [Kamiya2002] T. Kamiya, S. Kusumoto and K. Inoue. CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code. IEEE TSE, 28: 654-670, 2002

7 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Example of Clone Detection Technique: CCFinder Source files Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Code clones 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting 7 1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. } 10. static void goo(String [] a) throws RESyntaxException { 11. RE exp = new RE("[0-9,]+"); 12. int sum = 0; 13. for (int i = 0; i < a.length; ++i) 14. if (exp.match(a[i])) 15. sum += parseNumber(exp.getParen(0)); 16. System.out.println("sum = " + sum); 17. } Lexical analysis Transformation Token sequence Match detection Transformed token sequence Clones on transformed sequence Formatting

8 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Problem of Code Clone Detection Tools Take enormous time for existing tools to detect code clones on large-scale software Suggest an approach for detecting code clone for a collection of similar large-scale software products  Excluding detecting code clones among each set of files that are identical each other 8 A identical file set : A set of files that are identical each other

9 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Overview Of Our Approach Step1. Calculate MD5 hash. Step2. Prepare Input Files for CCFinder Step3. Detect code clones using CCFinder Step4. Generate all clone sets 9 (1) Calculate MD5 Hash (2) Prepare Input Files for CCFinder (3) Detect Code Clones Using CCFinder (4) Generate All Clone Sets Hashed files Input Files for CCFinder Identical File Sets Clone Sets All Clone Sets Source Files

10 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step1. Calculate MD5 Hash Creates MD5 hash value of input files  MD5 hash does not require any large substitution tables 10 Ce9e187434e357 46abf2 C9ad2 A77bdd2 7ed90608d1 2622448 97ccd1164dc3 -------------- ------ ----- ------ -------- ----- ---------- Calculate MD5 Hash Source Files

11 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step2. Prepare Input Files for CCFinder (1/2) Detect identical file sets 11 0cc A1 A2A3 C1D1 175A9.. be05.. B2 B1 MD5 Hash Identical File Set A Identical File Set B Identical File Sets Input Files for CCFinder

12 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 12 C1D1 175A9.. be05 B2 B1.. Identical File Set A Identical File Set B Identical File Sets Input Files for CCFinder 0cc A1 A2A3 Step2. Prepare Input Files for CCFinder (2/2) Prepare Input Files for CCFinder

13 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step3. Detect Code Clones Use CCFinder to detect code clones 13 C1D1.. B2 B1.. Identical File Set A Identical File Set B Identical File Sets A1 A2A3 Detected Code Clones Code Clones Detected by CCFinder

14 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step3. Detect Code Clones Use CCFinder to detect code clones 14 C1D1.. Identical File Set A Identical File Set B Identical File Sets A1 A2A3 Detected Code Clones B2 B1 D1 Clone Sets Detected by CCFinder Clone Set 1 Clone Set 2

15 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Step4. Generate all clone sets Generate all clone sets 15 A1 A2A3.. B2 B1.. Identical File Set A Identical File Set B A2A3 C1D1 A1 D1 B2 B1 All Clone Sets Clone Set 1 Clone Set 2.. Identical File Sets

16 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Overview of Case Study (2/2) Approach  Compare detection time between our method and using only CCFinder  Confirm that the detection result of our method is the same as the one of using only CCFinder Detection Environment  64 bits Windows 7 Professional workstation equipped with 2 processors and 24 gigabytes of main memory. 16

17 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Results of Case Study (1/2) Detection Time  Our approach detects code clones faster than using only CCFinder. 17 Project NameOnly CCFinderOur Approach #Clone Sets Time (sec.) #Clone Sets #Identical File Sets Time (sec.) Apache ant11,16924110,6923,08389 Linux kernel24,2351,11921,3431,967168 Galxy y pro(GT-B5510 model 325,274113,44 5 148,76128,18111,902

18 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Results of Case Study (2/2) Accuracy of Results : manually checked outputs  Arbitrary selected 30 clone sets that are detected by our approach from each OSS  Selected 30 identical file sets from each OSS project. 18

19 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Summary Suggest an approach for detecting code clones for a collection of similar large-scale software products.  MD5 hash to identify identical file sets  CCFinder to detect code clones. Apply our approach to three OSS projects and compared code clone detection time between using only CCFinder and our approach.  Our approach takes shorter time to detect code clones. 19

20 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Future Work Improve for detecting files with slightly modification as identical file sets  Our current approach detects file that are identical each as a identical file set Apply to various size of software projects in different domains Introduce other code clones detection tools and compare results from them in the case study 20

21 Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Thank you for your attentions 21


Download ppt "Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar."

Similar presentations


Ads by Google