Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Introduction to Product Family Engineering. 11 Oct 2002 Ver 2.0 ©Copyright 2002 Vortex System Concepts 2 Product Family Engineering Overview Project Engineering.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 5 second questions
ZMQS ZMQS
Effective Test Planning: Scope, Estimates, and Schedule Presented By: Shaun Bradshaw
ABC Technology Project
© S Haughton more than 3?
VOORBLAD.
Squares and Square Root WALK. Solve each problem REVIEW:
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Executional Architecture
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Function Point Measurement from Java Programs
Addition 1’s to 20.
25 seconds left…...
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Figure Essential Cell Biology (© Garland Science 2010)
Essential Cell Biology
Intracellular Compartments and Transport
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
Essential Cell Biology
1 PART 1 ILLUSTRATION OF DOCUMENTS  Brief introduction to the documents contained in the envelope  Detailed clarification of the documents content.
How Cells Obtain Energy from Food
Energy Generation in Mitochondria and Chlorplasts
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolutional Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Yuki Manabe*, Daniel M. German†,‡ and Katsuro Inoue†
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Assessment of the Quality of Refactoring.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Presentation transcript:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary Study on Impact of Software Licenses on Copy-and-Paste Reuse Yu Kashima † , Yasuhiro Hayase †† , Norihiro Yoshida ††† , Yuki Manabe † , Katsuro Inoue † † : Osaka University †† : Toyo University †††: Nara Institute of Science and Technology 1

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Reuse Purpose of software reuse –Development of reliable software –Increasing software productivity We focus on Copy-and-Paste(CnP) –A basic method of software reuse 2

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Open Source Software and Licenses Open Source Software(OSS) –Derivative works from OSS products are allowed to be distributed –Reusable source code is increasing because of increasing OSS products OSS Licenses –Many kind of licenses are designed for satisfying various developer’s intent –Each OSS licenses have different conditions –Reuse is also restricted by the licenses 3

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Representative OSS Licenses 3-clause BSD License(BSD3) –A derivative work must retain copyright notices, list of conditions and disclaimer of warranties Apache License Version 2(Apachev2) –A derivative work must retain copyrights, patents, trademarks and attribution notices GNU General Public License Version 2(GPLv2) –A derivative work must be distributed under GPLv2 LicenseName Code ≡ source code distributed under LicenseName Ex. BSD3 code ≡ source code distributed under BSD3 4

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CnP between different license files If a developer reuse source code; –Both license of reused code and license of developing code must be satisfied simultaneously –Distributions of developing code are prohibited in case CnP 5 BSD3GPLv2 CnP Apachev2GPLv2 CnP

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Impact of License on CnP Hypothesis –Characteristic of source code reuse depends on their license Frequency of CnP Kind of licenses used by source code developed by CnP To our knowledge, there are no quantitative studies on CnP reuse from the aspect of software license We investigate actual OSS to confirm this hypothesis 6

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Experiment An quantitative experiment was performed on a small set Purpose –Confirming our hypothesis –Investigating the scalability of our method Overview –Investigation of the number of CnP on each license –Code clone detection is used for CnP detection Code clone is a code fragment similar to other Code clone is typically generated by CnP 7

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Method of Experiment Step1. License detection Source Files Application X Application Y Step3. Counting Code Clones Code fragments grouped by their license 8 License#Code Fragm ents License A10 License B3 …… Unknown License A License B License A License B Step2. Code Clone Detection

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step1. License Detection Ninka[1] is used for detecting licenses of source files –Analyzing license description in the source file –Having the high precision of the detected license Excluding files Ninka fails to detect their licenses –Files which contain no license description or unknown license description [1] D. M. German, Y. Manabe and K. Inoue: “A sentence-matching method for automatic license identification of source code files”, ASE 2010, pp. 437–446 (2010) 9

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step2. Code Clone Detection CCFinder[2] is used for extracting code clone across different application –We assume that CnP within application will not cause license problems Filtering –Excluding code clones generated by other than CnP Ex. getter/setter, variable declarations Directions of CnP are undecided 10 License A License B License C Application X Application Y Application Z CnP Getter/Setter [2] T. Kamiya, S. Kusumoto and K. Inoue: “CCFinder: A multilinguistic token-based code clone detection system for large scale source code”, IEEE Transactions on Software Engineering, 28, pp. 654–670 (2002) Variable Declarations

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step3. Counting Code Clones(1/2) Repeating the following steps to target licenses 1.Select a license as an analysis target 2.Extract clone sets including the license code Clone set is a set of code clones similar to each other 3.Count code fragments in extracted clone sets grouped by their license 11 License A License B License C License #Code Fragments License A2 License B1 License C2 Application XApplication YApplication Z Fragments having CnP relations to License A code

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Step3. Counting Code Clones(2/2) A clone set including both original code fragments and code fragments generated by CnP → Counting code fragments in clone sets approximates counting the number of CnP Counting the number of CnP to/from target license code fragments Although this table includes the CnP of opposite direction, it is enough to understand the brief of summary 12 License A License B License C License #Code Fragments License A2 License B1 License C2 Application XApplication YApplication Z Fragments having CnP relations to License A code

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Analyzed Code Java files(.java) in Debian GNU/Linux main section Reasons for selecting this target –consisted of various licenses –enable to be analyzed by both Ninka and CCFinder –an feasible scale for this experiment 13 #Packages452 #Files77,452 LOC8,530,896

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University License Distribution in Analyzed Code 14 #Files

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result ( BSD3 ) 15 License#FragmentsPercentage BSD361392% GPLv % Apachev2162.4% LesserGPL % GPLv2,ClassPathException10.15% LesserGPL % Result of counting code fragments in clone sets including BSD3 fragments grouped by their license The frequency of license used by code fragments having CnP relationship to BSD3 fragments BSD3 code is mostly reused by BSD3 code

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result ( Apachev2 ) License#Fragments Percentage Apachev % Apachev % LesserGPL % MPLv % BSD3291.5% MX4JLicensev % GPLv % LibraryGPL % MPLv % MITX11noNotice20.10% Public Domain10.050% Subversion % EPLv % 16 Large percentage of CnP between Apachev2 code fragments Apachev1.1 code has been changed their license to Apachev2

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result ( GPLv2+ ) 17 License#FragmentsPercentage GPLv % GPLnoVersion,GPLv2+,LinkException22541% BSD3285.1% LibraryGPLv % Apachev240.73% LesserGPLv % CnP within GPLv2+ code occupy the highest percentage “GPLnoVersion, GPLv2+, LinkException” has high percentage “GPLnoVersion, GPLv2+, LinkException” code is reused by GPLv2+ code. CnP GPLnoVersion, GPLv2+, LinkExceptionGPLv2+ CnP

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University #Files and #Fragments under Each License 18 #Fragments#Files#Fragments / #Files BSD Apachev GPLv The frequency of CnP per file BSD3 > Apachev2 > GPLv2+ Code under a license is copy-and-pasted frequently, if “#Fragments / #Files” of the license is large

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Summary of the Results Common characteristic of all licenses –CnP within code distributed under same license or licenses designed by the same organization have a majority CnP might happen mostly in an organization Apachev2 has CnP relations to various licenses –Files under Apachev2 have the largest number –The condition of Apachev2 is more relaxed than that of GPLv2+ The frequency of CnP per file BSD3 > Apachev2 > GPLv2+ 19

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Threat to Validity Insufficient to apply this result to general OSS –This analysis target is small → We plan large scale analysis –Only Java files were analyzed History of Java files is short, hence Java files are less copy- and-pasted than others → We plan analysis of C/C++ files Overlap code fragments may be counted separately –Number of overlap code fragments might be small 20 Fragment A Fragment B

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Scalability of Investigating Method This method can apply to large target, because each step can –License detection Ninka can analyze files in linear order –Code clone detection There are more scalable tools than CCFinder such as CCFinderX and D-CCFinder. –Counting code clone This process did not take a long time 21

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Conclusion A preliminary study of impact of licenses on CnP was performed –Java files in Debian/GNU Linux main section were analyzed CnP are happened mostly within code distributed under the same license or licenses designed by the same organization The frequency of CnP per file –BSD3 > Apachev2 > GPLv2+ Our method can be applied to a large target 22

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Future Work Large Scale Experiment Investigating that code fragments are copy-and-pasted mostly in an organization Detecting direction of CnP 23