Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue

Slides:



Advertisements
Similar presentations
Author: Carlos Pacheco, Shuvendu K. Lahiri, Michael D. Ernst, Thomas Ball MIT CSAIL.
Advertisements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolutional Analysis.
Stan Lovric CSc214 Fall 2002 Victorian Symbolic Logic John Swartz Victorian Symbolic Logic Paper by John Swartz Presentation by Stan Lovric.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Prototype of.
Chapter 1 Assuming the Role of the Systems Analyst
Kurt Menke, GISP GRASS GIS Geographic Resources Analysis Support System.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
October 30, 2008 Extensible Workflow Management for Simmod ESUG32, Frankfurt, Oct 30, 2008 Alexander Scharnweber (DLR) October 30, 2008 Slide 1 > Extensible.
Yuki Manabe*, Daniel M. German†,‡ and Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Teaching material for a course in Software Project Management & Software Engineering – part II.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Chapter 14 Part II: Architectural Adaptation BY: AARON MCKAY.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Effective Testing and Debugging Methods and Its Supporting System with Program Deltas Makoto Matsushita, Masayoshi Teraguchi, and Katsuro Inoue Osaka University.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
© 2007 OPNET Technologies, Inc. All rights reserved. OPNET and OPNET product names are trademarks of OPNET Technologies, Inc. An Open Source ARM 4 Implementation.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Introduction to Computer Programming using Fortran 77.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Presented By Sushil K. Chaturvedi Assistant Professor SRCEM,Banmore 1.
Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information.
Yasuhiro Hayase†, Yu Kashima‡, Yuki Manabe‡, Katsuro Inoue‡
Daoyuan Wu, Ximing Liu*, Jiayun Xu*, David Lo, and Debin Gao
What is F/LOSS? By Scot Henderson.
Collectd 101.
Development Environment
Scripting for QA Engineers
Source File Set Search for Clone-and-Own Reuse Analysis
HDL simulation and Synthesis (Marks16)
Static Detection of Cross-Site Scripting Vulnerabilities
Testing The JCOP Framework
IT6004 – SOFTWARE TESTING.
Software Testing With Testopia
Easy-Bash: Designing a Metasearch Engine for Bash Command Queries
Software for Structural engineering. Past, Present and future.
Lesson 1: Introduction to Trifacta Wrangler
Lesson 1: Introduction to Trifacta Wrangler
Jefferson Ridgeway, 2017 NIFS Summer Intern Mentor: Brian Coltin, IRG
Boris Todorov1, Raula Gaikovina Kula2, Takashi Ishio2, Katsuro Inoue1
Lesson 1 – Chapter 1B Chapter 1B – Terminology
Software Engineering with Reusable Components
Predicting Fault-Prone Modules Based on Metrics Transitions
Identification and Characterization of pre-miRNA Candidates in the C
Quaid-i-Azam University
Coding Concepts (Basics)
Capability Maturity Model
CS310 Software Engineering Lecturer Dr.Doaa Sami
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Estimating the number of components with defects post-release that showed no defects in testing C. Stringfellow A. Andrews C. Wohlin H. Peterson Jeremy.
Recitation on AdFisher
Automation of Control System Configuration TAC 18
Where Does This Code Come from and Where Does It Go?
Capability Maturity Model
Large-scale Analysis of Software Reuse for Code and License Changes
Recommending Adaptive Changes for Framework Evolution
Contributing source code to CSDMS
From Use Cases to Implementation
Why do we refactor? Technical Debt Items Versus Size: A Forensic Investigation of What Matters Hello everyone, I’m Ehsan Zabardast I am a PhD candidate.
Presentation transcript:

Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue Detecting License Inconsistencies Based on Token Size in Open Source Software Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue

Background Free open source software is an integral part of the software engineering ecosystem. Various licenses emerged over the years. Source code licenses can change over time depending on the author and developer’s actions. These changes lead to license inconsistencies and in the case of legal incompatibility, license violations.

Previous Research The first version of CCFX (Code Clone Finder X) was developed to find Type 1 and Type 2 code clones. Ninka was developed to determine licenses found in source code. Yuhao Wu, Katsurou Inoue, Yuki Manabe, Tetsuya Kanda, and Daniel M. German have published a paper detailing their analysis on Debian 7.5 and 10,514 Java projects using these tools.

Goals & Motivation Validation of previous research. To detect code clones based on varying thresholds to observe outcome of number and types of license inconsistencies. Previous research used token size of 50, default for CCFX.

Tools Used CCFX Detect and report code clones Ninka Detect and classify licenses in source code Python Used as main scripting language SQLite3 Store and manage data output from CCFX and Ninka

Relevant Terminology Code snippet A portion of source code Cloneset A set of similar code snippets as determined by CCFX Token Size Number of essential elements that CCFX divides source code into License Inconsistency Difference in licenses found in two versions of the same source code

Results

Results

Results

Results

Analysis & Conclusions The mean number of code snippets per cloneset decreases only very slightly. However, the number of license inconsistencies decreases heavily. The number of clonesets also drops off heavily. The mean length of snippets increases as expected. Conclusion: Smaller token sizes of 30 to 50 may include many false positives of license inconsistencies. Further reinforced by manual inspection of source code.

Types of License Inconsistencies License Addition The source file was without a license and one was added at a later time. License Removal The source file was under a license and was removed at a later time. License Upgrade The source file was under a certain version of a license and was upgraded to a newer version of the license. License Downgrade The source file was under a certain version of a license and was downgraded to an older version of the license. License Change The source file was under a certain license and has been changed to another license.

Results

Results

Analysis & Conclusions Based on the previous conclusions, we can safely exclude the column of token size 30. The proportion of license changes decreases greatly as token size increases. The number of license additions and license removals falls to nearly nothing as token size increases. Conclusion: A large portion of false positives may include license additions, removals, and changes.

Further Research Creating a systematic way to further detect license violations from license inconsistencies Using CCFX and Ninka to analyze other large codebases such as ones written in Perl and other compatible languages Further refining which token size is optimal for code clone detection for license inconsistencies Running these analyses on several consecutive versions of Debian and Ubuntu to see how license violations evolve over time