Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.

Slides:



Advertisements
Similar presentations
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary.
Advertisements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolutional Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Prototype of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Automatic Categorization.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CoxR: Open Source.
Software Engineering Lab, Osaka University Code Clone Analysis and Its Application Katsuro Inoue Osaka University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Yuki Manabe*, Daniel M. German†,‡ and Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Kinds of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
An Adaptive Version-Controlled File System Makoto Matsushita, Tetsuo Yamamoto and Katsuro Inoue Osaka University, JAPAN.
1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Design and Implementation.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Debug Concern Navigator Masaru Shiozuka(Kyushu Institute of Technology, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu University,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University VerXCombo: An.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Assessment of the Quality of Refactoring.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Investigation of Opportunities for Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Cage: A Keyword.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Software Tag:
Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Experience of Finding Inconsistently-Changed Bugs in Code Clones of Mobile Software Katsuro Inoue†, Yoshiki Higo†, Norihiro Yoshida†, Eunjong Choi†, Shinji.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Towards a Collection of Refactoring Patterns Based.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information.
Experience Report: System Log Analysis for Anomaly Detection
Source File Set Search for Clone-and-Own Reuse Analysis
CBCD: Cloned Buggy Code Detector
Yuta Nakamura1, Eunjong Choi1, Norihiro Yoshida2,
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Predicting Fault-Prone Modules Based on Metrics Transitions
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue
On Refactoring Support Based on Code Clone Dependency Relation
Kazuki Yokoi1 Eunjong Choi2 Norihiro Yoshida3 Katsuro Inoue1
Where Does This Code Come from and Where Does It Go?
Research Activities of Software Engineering Lab in Osaka University
Dotri Quoc†, Kazuo Kobori†, Norihiro Yoshida
Presentation transcript:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of a code clone search tool for open source repositories Pei Xia †, Yuki Manabe †, Norihiro Yoshida ††, Katsuro Inoue † † Graduate School of Information Science and Technology, Osaka University †† Graduate School of Information Science, Nara Institute of Science and Technology 1

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Jacobsen v. Katzer [1] A lawsuit between two open source software JMRI and KAMIND, in the U.S. federal court, KAMIND reused some source code from JMRI, but deleted the copyright notice. Settlement: –Katzer paid Jacobsen $100,000 for infringement –ceased using the JMRI code –etc. [1] Arne, P.H “Jacobsen v. Katzer - Open Source License Validation: How Far Does It Go?”, The Computer & Internet Lawyer (25:11), pp

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Open Source Repositories ………….. 3

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code Reuse Developers reuse code from existing open source projects [2] [2]C. Ebert (ed.), “Open Source Software in Industry”, IEEE Software, Vol. 25, No. 3, pp , May/June … 4

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code Clones Among OSS There are many code clones exist among open source software [3]. code clones [3] S. Livieri, Y. Higo, M. Matsushita, K. Inoue, “Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder”, Proc. of 29th International Conference on Software Engineering (ICSE 2007), pp , Minneapolis, MN, May

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code Clones Among OSS There are many code clones exist among open source software [3]. code clones [3] S. Livieri, Y. Higo, M. Matsushita, K. Inoue, “Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder”, Proc. of 29th International Conference on Software Engineering (ICSE 2007), pp , Minneapolis, MN, May Code clone: Identical or similar source code fragments in a program or among different programs.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Questions When we find a useful source code file, can we reuse it safely? Are our own open source projects illegally reused by other people? 7 It is necessary to detect code clones in open source repositories.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Existing technology Code clone detection –CCFinder, DECKARD, etc. Code search engine –Google code search, SPARS, etc. 8 Can only detect clones of the code in local machine Do not provide enough code clone information

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Solution Using Code clone detection and code search technology to build up a mash-up tool. –Be able to detect code clones in open source repositories –Be able to provide related information about the files which contain code clones. 9

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Overview of the Proposed method 10

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Overview of the Proposed method 11 URLwhere the file can be accessed on the Internet File paththe path in the file’s own project LOCline of code Licensethe software license of the file Copyrightthe copyright of the file Last modified timethe latest committed time of the file in its repository Cover ratiothe percentage of the queried code that reused by the file Code clone detailWhat part of the file has been detected as code clone

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Search Process of Open CCFinder Token1 Token2 Token3 Token4 Token5 Token6 ……. Filename Keyword 6 Keyword 4 Keyword 1 Keyword 3 Keyword 2 Keyword 5 ……. File1 File2 File3 …. Related information Code Clone Detector Code Clone information Input Q Output R ①② ③ ④ ⑤ ⑥ ⑦ ⑧ ① Word Extraction ② Keyword Ranking ③ Generating request ④ searching for Candidate files ⑤ Merging candidates ⑥ crawling information ⑦ clone analysis ⑧ Result forming External code search engines … Open source repositories 12

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Demo – Tokenize View ① query input area ② keywords list ③ log area ④ Tokenize button ⑤ Search button ⑥ Shuffle button Query input area Keyword list Log 13

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Demo – Candidates View Log Candidates file list 14

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Demo – Results View Results view ① results detail ② log area 15 Log Code attributes

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Demo – Results View Results view ① results detail ② log area ③ code clone detail 16 Code clone detail

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 1 Question: When we find a useful source code file, can we reuse it safely? Subjects: base64.java Purpose: To find the files from open source repositories that we should check while reusing this file. 17

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 1 – Subjects base64.java –Apache ObJectRelationBridge (Apache OJB) open source project –Apache license 18

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 1 – Result Files that contain code clones with base64.java Files that contain code clones with base64.java found by Open CCFinder (57 results) 19 Query

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 1 – Result Files that contain code clones with base64.java 20 With these information, developers would know what files they should check while reusing the query code. Files that contain code clones with base64.java found by Open CCFinder (57 results) Query

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 2 Question: Are our own open source projects illegally reused by other people? Subjects: SSHTools project Purpose: To find similar files in open source repositories, especially the different licensed similar files. 21

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 2 – Subjects SSHTools –A java open source projects –A SSH application providing Java SSH API, terminal and so on. –Under GPL license 22

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 2 – Result (1/3) #Similar files* found –339 files have been checked The histogram of #input files in terms of #similar files found by Open CCFinder 23 * Similar files: files that contain code clones with query code

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 2 – Result (2/3) #Different licensed similar files found –339 files have been checked –305 files have similar files found from open source repositories. 285 files in SSHTools have similar files with 1 different license 10 files in SSHTools have similar files with 2 different licenses 1 file in SSHTools have similar files with 3 different licenses. 24

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 2 – Result (3/3) Result example /sshtools/common/util/BrowserLauncher.java Part of the similar files returned by Open CCFinder with 3 other different licenses file pathProject nameCover ratiolicense Last modified /j2ssh- fork/src/com/sshtools/common/util/Bro wserLauncher.java j2ssh-fork0.91GPL2008/6/17 /de.fzj.unicore.rcp.terminal.ssh.gsissh/.../sshtools/common/util/BrowserLaunc her.java unicore0.89LGPL2010/2/3 /openfire/launcher/BrowserLauncher.ja va openfire-tomcat0.88Apache2010/4/19 /dg/hipster/BrowserLauncher.javahipster0.84BSD2006/10/12 …………… 25

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study 2 – Result (3/3) Result example /sshtools/common/util/BrowserLauncher.java Part of the similar files returned by Open CCFinder with 3 other different licenses file pathProject nameCover ratiolicense Last modified /j2ssh- fork/src/com/sshtools/common/util/Bro wserLauncher.java j2ssh-fork0.91GPL2008/6/17 /de.fzj.unicore.rcp.terminal.ssh.gsissh/.../sshtools/common/util/BrowserLaunc her.java unicore0.89LGPL2010/2/3 /openfire/launcher/BrowserLauncher.ja va openfire-tomcat0.88Apache2010/4/19 /dg/hipster/BrowserLauncher.javahipster0.84BSD2006/10/12 …………… 26 With these information, code owners can get suspicious candidates about illegally reused source code.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Discussion Usability –Open CCFinder is helpful to solve the raised problems –It cannot find all the code clones in the world Time limit, space limit, external search engines, etc. –It could only provides clues to get evidence, but could not make the judgment. Performance –It takes1-2 minutes for analyzing one file Case study 1: About 2 minute for 1 file Case study 2: About 7 hours for 339 files 27

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Summary & Future Work Summary –We proposed a tool that detect code clones and collect related information from open source repositories –We launched two case studies to show the usability of our tool. Future work –Implement better keyword ranking algorithm –Import other external search engines 28

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Thank you! 29