Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.

Slides:



Advertisements
Similar presentations
Formal Language, chapter 4, slide 1Copyright © 2007 by Adam Webber Chapter Four: DFA Applications.
Advertisements

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolutional Analysis.
1 Accumulative Versioning File System Moraine and Its Application to Metrics Environment Mame Tetsuo Yamamoto * Makoto Matsushita * Katsuro Inoue *,**
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Lecture 23: Software Architectures
Informationsteknologi Friday, November 16, 2007Computer Architecture I - Class 121 Today’s class Operating System Machine Level.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Prototype of.
1 Friday, July 07, 2006 “Vision without action is a daydream, Action without a vision is a nightmare.” - Japanese Proverb.
1 Lab Session-III CSIT-120 Spring 2001 Revising Previous session Data input and output While loop Exercise Limits and Bounds GOTO SLIDE 13 Lab session.
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
GOOD PRACTICES & STRATEGY. Functional Analysis & Approach Start from outcome onwards Simplicity Validity.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CoxR: Open Source.
1 Lab Session-III CSIT-120 Fall 2000 Revising Previous session Data input and output While loop Exercise Limits and Bounds Session III-B (starts on slide.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
5 Copyright © 2004, Oracle. All rights reserved. Using Recovery Manager.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Upgrade to Real Time Linux Target: A MATLAB-Based Graphical Control Environment Thesis Defense by Hai Xu CLEMSON U N I V E R S I T Y Department of Electrical.
1 PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Computer Security and Penetration Testing
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Multi-Objective.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
2003 Dominic Swayne1 Microsoft Disk Operating System and PC DOS CS-550-1: Operating Systems Fall 2003 Dominic Swayne.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Documentation Dr. Andrew Wallace PhD BEng(hons) EurIng
CS 460/660 Compiler Construction. Class 01 2 Why Study Compilers? Compilers are important – –Responsible for many aspects of system performance Compilers.
Exploiting Code Search Engines to Improve Programmer Productivity and Quality Suresh Thummalapenta Advisor: Dr. Tao Xie Department of Computer Science.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Cage: A Keyword.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Software Tag:
Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
File Systems.  Issues for OS  Organize files  Directories structure  File types based on different accesses  Sequential, indexed sequential, indexed.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Yasuhiro Hayase†, Yu Kashima‡, Yuki Manabe‡, Katsuro Inoue‡
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
Module 11: File Structure
Source File Set Search for Clone-and-Own Reuse Analysis
Chapter 2: The Linux System Part 1
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue
Where Does This Code Come from and Where Does It Go?
Presentation transcript:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This Code Come from and Where Does It Go? - Integrated Code History Tracker for Open Source Systems - Katsuro Inoue, Yusuke Sasaki, Pei Xia, and Yuki Manabe Osaka University 1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Reuse of Open Source Code Essential to recent electric products Blu-ray HDD recorder –Panasonic DMR-BW570 Linux and associated applications under GPL and lesser GPL Open SSL UC Berkeley software JPEG group’s software Without reuse of OSS, product cost would increase drastically 2

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Developer’s Concerns 3 Existing Project New Project Developer Reusable? Origin - Who? - When? - License? - Copyright? Evolution - Maintenance? - Popularity? - Newer version? … Code Fragment Concerns To ease concerns, a support system is needed

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code History Tracking System 4 OSS Repositories Code Fragment in Question ? License: X /Copyright: Y Copy Source Ancestor Modify Copy Time License: X’ Copyright Y’ Descendants Proj. C Proj. A Proj. B Proj. D Proj. E Proj. F Proj. G Proj. H License: X Copyright: Y

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Overview of Research Proposing code history tracking model Prototype of code history tracking system, Ichi Tracker Integrated Code History Case studies of using Ichi Tracker Discussions and related works 5 位置 i chi Location in Japanese

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6 Code History Tracking Model and Ichi Tracker

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Design Policy of Ichi Tracker OSS repository Target many OSS projects from old to new ones No crawling, no maintenance  Do not have local repository, but use external code search engines Output quality Find not only exactly same code fragments, but also similar ones Lower false positive results No real-time response  Use code clone filtering to improve the output quality 7

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code History Tracking Model 8 Input Query Q Integrated Code History Tracker Ichi Tracker Output Results R Code Fragment Internet Code Fragment Search Query SQ Search Results SR Code Attributes (Optional) Code Attributes attribute 1:... attribute 2: attribute 1:... attribute 2: attribute 1:... attribute 2: attribute 1:... attribute 2: qcqc Open Source Repositories Code Search Engines SPARS/R Koders Google Code Search Code Clones

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Input Query Code q c public void write(JMEExporter e) throws IOException { OutputCapsule capsule = e.getCapsule(this); capsule.write(imageLocation, "imageLocation", null); if (storeTexture) { capsule.write(image, "image", null); } capsule.write(blendColor, "blendColor", null); capsule.write(borderColor, "borderColor", null); capsule.write(translation, "translation", null); 9

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1: Word Extraction public void write(JMEExporter e) throws IOException { OutputCapsule capsule = e.getCapsule(this); capsule.write(imageLocation, "imageLocation", null); if (storeTexture) { capsule.write(image, "image", null); } capsule.write(blendColor, "blendColor", null); capsule.write(borderColor, "borderColor", null); capsule.write(translation, "translation", null); 10

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2: Keyword Selection public void write(JMEExporter e) throws IOException { OutputCapsule capsule = e.getCapsule(this); capsule.write(imageLocation, "imageLocation", null); if (storeTexture) { capsule.write(image, "image", null); } capsule.write(blendColor, "blendColor", null); capsule.write(borderColor, "borderColor", null); capsule.write(translation, "translation", null); 11 - Words with 5 or more characters - No reserved words word freq. capsule6 write6 image2 translation2 image_ocation1 imageLocation1 public1 …

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3: Query Generation and Result Check 12 Internet Open Source Repositories Code Search Engines SPARS/R Koders Google Code Search word freq. capsule6 write6 image2 translation2 image_ocation1 imageLocation1 public1 … Search Keywords Header List

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4: Download and Code Clone Filtering 13 Query Code Fragment q c … private void renderFromControl(RenderMa nager rm, ViewPort vp) { Camera cam = vp.getCamera(); OutputCapsule capsule = e.getCapsule(this); capsule.write(imageLocation, "imageLocation", null); … Downloaded Source Code Files … private void renderFromControl(RenderMa nager rm, ViewPort vp) { Camera cam = vp.getCamera(); OutputCapsule capsule = e.getCapsule(this); capsule.write(imageLocation, "imageLocation", null); … Files with Sufficient Shared Code CCFinder public void write(JMEExporter e) throws IOException { OutputCapsule capsule = e.getCapsule(this); capsule.write(imageLocatio n, "imageLocation", null); … Code Clone Filter

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Cover Ratio Similarity metrics between query and result –Ratio of shared code clone size over the query size 1: all query code appears in the result 0: no query code appears in the result Filtering threshold –Used 0.4 in the following case studies 14 Clone Pair Result ri c Query q c x x Cover Ratio : di = |x| / |q c |

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15 Case Studies

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Objectives of Case Studies Case Studies A, B, and C Does Ichi Tracker work properly as we have expected? Does Ichi Tracker provide useful information for developers? 16

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study A texture.java –1,600 LOC java file to define a graphic texture object in game programs, and used by many 3D games –Developed by a game engine project jMonkeyEngine Good code to reuse?  Code evolution Conditions of all case studies 17 Feb and May 2011 Dual Xeon (X GHz) with 24GB main memory Osaka University internet environment

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Keywords and Number of Results 18 |R| : Number of output results |SR|: Number of headers from external code search engines With 2~5 trials, we had the total result 29/62

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Keywords with File Name 19 With only 2 trials, we had the total result 26/56

20 Evolution Pattern of Texture.java jMonkeyEngine R3800 R3448 R4099

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study B kern_malloc.c –1082 LOC C function, to allocate a specified- size memory block in the kernel address space –Developed for 4.4 BSD and Mach, and taken over by many other projects Good code to reuse?  Code evolution We were able to get the output results 67/75 with 1~4 trials (without file name option) 21

Evolution Pattern of kern_malloc 22 2

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case Study C SSHTools Suite of Java SSH (SSH API, terminal, …) Ver (June, 23, 2007) 442 files in total –We have selected 339 files larger than 2 KB –Each selected file was used as the input of Ichi Tracker Can we safely reuse these files in the same manner? 23

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Number of Similar Files Found 24 Total 339

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Oldest Files Found for Each Query File 25 License: GPL 2 Copyright: Lee David Painter and Contributors Last Modified Time: 2007/6/23 SSHTools (all 339 files) We have found many cases of different projects, different licenses and different copyrights -> We need to check very carefully when we reuse SSHTools code

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 26 Discussions and Related Works

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Usefulness With simple check of the output of Ichi Tracker, we can get useful information for the history and evolution of code 27 Ichi Tracker Reusable? Origin - Who? - When? - License? - Copyright? Evolution - Maintenance? - Popularity? - Newer version? … Results Concerns EASE

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Approach and Process Choosing good code search engines is a key to get high quality results GCS >> Koders > SPARS/R (GCS, Koders, SPARS/R) >> (Google, Bing) Need good code search engine available! Keyword selection strategy: Incremental strategy: try 1, 2, … keywords until the header list becomes less than 50 – Decrement strategy, random, less frequently-used keywords, comment keywords, short keywords, … Less effective 28

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Other Issues Performance –Case Studies A and B: 1 to 4 min. –Heavily depend on the code search engines and network performance Acceptable as non-interactive support system Quality of search result – Non-removed rate at the code clone filtering is an indicator of effectiveness of keyword search e.g., 0.46 (Case Study A(1) default setting) –The final output contains no false positives results 29

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Related Works Clone Tracker by Duala-Ekoko et.al. [7] –Track clones with clone region descriptors Software Bertillonage by Davies et. al. [6] –Determine origin with anchored signature matching method Code Broker by Ye et. al. [35] – Interactively provide complete code from partial code fragment  Need local repositories PARSEWeb by Thummalapenta et. al. [33] –Use code search engines to generate method invocation sequences  No code clone filtering Black Duck and Palamida  Proprietary systems using local repositories 30

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Conclusions Proposed code history tracking model Developed prototype system Ichi Tracker Showed case studies using Ichi Tracker  Ichi Tracker provides useful information for OSS reuse to ease developer's concerns Choose alternative code search engines Improve user interface to allow more interactive operation Try other keyword selection algorithms, e.g., TF/IDF,... 31

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 32 Thank you! Questions?

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 33

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Process of Ichi Tracker 34 Header List/ Files Keywords

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Keywords and Convergence 35

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Precision and Recall Precision – |R|/ |SR| : 0.46 (case A(1) and A(2)) 0.89 (case B(1)), 0.95 (case B(2)) – Output results : 1.0 (no fault positive) Recall –No information of GCS and Koders –SPARS/R: (case of random 100 files) 36

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 2) Keyword Selection 37

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 5) Search Result Analysis 38

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 6) Code Clone Filtering 39 Use CCFinder with the maximum token length 10

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 7) Result Forming 40