Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.

Slides:



Advertisements
Similar presentations
Hybrid Context Inconsistency Resolution for Context-aware Services
Advertisements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Freshness Policy Binoy Dharia, K. Rohan Gandhi, Madhura Kolwadkar Department of Computer Science University of Southern California Los Angeles, CA.
Source Code Revision Control with Subversion Christophe Dupré May 13, 2005 Update KEJ May 10, 2006 Scientific Computation Research Center Rensselaer Polytechnic.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Prototype of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
An Adaptive Version-Controlled File System Makoto Matsushita, Tetsuo Yamamoto and Katsuro Inoue Osaka University, JAPAN.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
ITEC 370 Lecture 16 Implementation. Review Questions? Design document on F, feedback tomorrow Midterm on F Implementation –Management (MMM) –Team roles.
Version Control Systems academy.zariba.com 1. Lecture Content 1.What is Software Configuration Management? 2.Version Control Systems (VCS) 3.Basic Git.
Information Systems and Network Engineering Laboratory II DR. KEN COSH WEEK 1.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Design and Implementation.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Computer Science and Engineering The Ohio State University  Widely used, especially in the opensource community, to track all changes to a project and.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University 1 Evaluation of a Business Application Framework Using Complexity.
May 30, 2016Department of Computer Sciences, UT Austin1 Using Bloom Filters to Refine Web Search Results Navendu Jain Mike Dahlin University of Texas at.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University VerXCombo: An.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Version Control with SVN Images from TortoiseSVN documentation
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Cage: A Keyword.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Software Tag:
Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Towards a Collection of Refactoring Patterns Based.
Introduction to Git Yonglei Tao GVSU. Version Control Systems  Also known as Source Code Management systems  Increase your productivity by allowing.
Hall, Accounting Information Systems, 8e ©2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly.
CS307P-SYSTEM PRACTICUM CPYNOT. B13107 – Amit Kumar B13141 – Vinod Kumar B13218 – Paawan Mukker.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Information Systems and Network Engineering Laboratory I DR. KEN COSH WEEK 1.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Problem Solving With C++ SVN ( Version Control ) April 2016.
Yasuhiro Hayase†, Yu Kashima‡, Yuki Manabe‡, Katsuro Inoue‡
Source Control Dr. Scott Schaefer. Version Control Systems Allow for maintenance and archiving of multiple versions of code / other files Designed for.
Experience Report: System Log Analysis for Anomaly Detection
Information Systems and Network Engineering Laboratory II
Source File Set Search for Clone-and-Own Reuse Analysis
Naoya Ujihara1, Ali Ouni2, Takashi Ishio1, Katsuro Inoue1
Source Control Dr. Scott Schaefer.
A Generalized Model for Visualizing Library Popularity, Adoption, and Diffusion within a Software Ecosystem Raula Gaikovina Kula, Coen De Roover, Daniel.
Boris Todorov1, Raula Gaikovina Kula2, Takashi Ishio2, Katsuro Inoue1
Part 1: Editing and Publishing Files
Visualizing the Evolution of Systems and their Library Dependencies
Raula Gaikovina Kula, Daniel German, Takashi Ishio, Katsuro Inoue
Predicting Fault-Prone Modules Based on Metrics Transitions
Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue
PADLA: A Dynamic Log Level Adapter Using Online Phase Detection
Presentation transcript:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source Code Reuse across Repositories using LCS-based Source Code Similarity Naohiro Kawamitsu, Takashi Ishio, Tetsuya Kanda, Raula Gaikovina Kula, Coen De Roover and Katsuro Inoue

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Background: Software Reuse Developers often reuse existing source code. –Clone-and-own approach –Source code reuse reduces cost and enables quick software development. Reused code may include vulnerability –Developers have to keep the reused code up-to-date. 2

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Motivation It is important to keep track of the library version developers copied from. –To keep files up-to-date A study shows 18.7% of projects had no records of version of the third-party code. diff command is often insufficient. –Many copies are modified for project-specific enhancements. 3

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Proposed method Automatically extract source code reuse instances Input –Source repository: a library –Destination repository: an application Output –Instances of reuse Original files and its versions (tags) 4 Source pathTagsDestination Path Commit png.hv1.5.7libpng/png.h58f9e77 pngrio.cv1.0.52, v libpng/pngrio.c101018d

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Key Ideas Two assumptions to identify reuse –Timestamp A copy is younger than the original. –Contents of file The most similar file revision is the original. We use pairwise comparison using LCS-based similarity. –LCS stands for Longest Common Subsequence 5

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Similarity Metric 6

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Why isn’t clone detection used? The problem is ‘which is the most similar file revision?’. Clone detection ignores small differences. –Most revisions are considered as code clones. 7

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Process 1.Computing pairs of similar file revisions –To find reuse candidates 2.Filtering candidates by timestamp –To remove instances which contradict to provided information 3.Identifying original revision –To find which version is origin 8

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1. Computing pairs of similar file revisions Pair-wise comparison of each revision of each file with each revision of all other files 9 Repository A Repository B FFFFF XXXXX GGG YYY

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination File F Source An example result of step 1 Compute similarity between all pairs of revisions –A pair of file revisions is considered as similar if similarity is higher than the threshold F2F3F4F5 G3G2G1 F1

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination File F Source 2. Filtering by timestamp 1.Extract pairs of revisions whose similarity is higher than the threshold F2F3F4F5 G3G2G1 F1 : low similarity : high similarity

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination File F Source 2. Filtering by timestamp 2.Select the oldest revisions of F and G 12 F2F3F4F5 G3G2G1 : low similarity : high similarity

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination 2. Filtering by timestamp 3.Compare the timestamps of the revisions. –Assumption: A copy is younger than the original 13 File F Source F2 G1 G1 is younger than F2 identified as reuse

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File Y Destination 2. Filtering by timestamp 14 X Y File X Source If the destination revision is older, the file pair is filtered out.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination 3. Identifying of the original revision For each revision of the destination file, identify its original revision. Heuristic –The revision of the source file that is the most similar to the destination is the original revision 15 F2F3F4F5 G3G2G1 F1 File F Source

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination 3. Identifying of the original revision For each revision of the destination file, identify its original revision. Heuristic –The revision of the source file that is the most similar to the destination is the original revision 16 F2F3F4F5 G3G2G1 F1 File F Source : the most similar

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination 3. Identifying of the original revision For each revision of the destination file, identify its original revision. Heuristic –The revision of the source file that is the most similar to the destination is the original revision 17 F2F3F4F5 G3G2G1 F1 File F Source : the most similar

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination 3. Identifying of the original revision For each revision of the destination file, identify its original revision. Heuristic –The revision of the source file that is the most similar to the destination is the original revision 18 F2F3F4F5 G3G2G1 F1 File F Source : the most similar

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University File G Destination 3. Identifying of the original revision Result –G1’s origin = F2 –G2’s origin = F4 –G3’s origin = F5 19 F2F3F4F5 G3G2G1 F1 File F Source

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 3. Identifying of the original revision Original revisions are identified into version numbers using tags in the source repository. –G1’s origin’s version = 1.1 –G2’s origin’s version = 1.3 –G3’s origin’s version = File G Destination F2F3F4F5 G3G2G1 F1 File F Source tags

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evaluation We evaluated the effectiveness of our approach. –Evaluated with precision and recall We compared reuse instances with version numbers recorded by developers. DestinationSource cocos2d-iphone libpng apitrace guliverkli2 fs2open v8monkey Haiku-services-branch Enemy-Territory libcurl doom3.gpl 21

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Classes of instances of source code reuse For evaluation of precision and recall, reported reuse instances are classified into four groups as follows –Consistent –Inconsistent –Redundant –Unrecorded 22

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Consistent, Inconsistent and Unrecorded Imported from updated to foo.c consistent inconsistent unrecorded Source foo.c Destination recorded by developers identified reuse instance

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Redundant Imported foo2.c foo.c consistent redundant Source Destination recorded by developers identified reuse instance

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Results Precision = Estimated recall =

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University An example of incorrectly recorded version number Commit log: Update to Identical Not Identical

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Performance We have employed an optimization to speed up. –In the worst case, the method compares all file revision pairs. 27 DestinationExecution Time cocos2d-iphone40min 51sec apitrace55min 6sec guliverkli238min 13sec fs2open23min 43sec v8monkey225min 33sec Haiku-services- branch 139min 45sec Enemy-Territory5min 26sec doom3.gpl4min 35sec

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Conclusion We proposed a method to extracting reuse instances. –It is based on LCS-based source code similarity. The results show that our method is enough accurate. Our method can notify developers to update their copy of a library. 28