Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-

Slides:



Advertisements
Similar presentations
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Preliminary.
Advertisements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolutional Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Prototype of.
Open Source/Free Software Source code is available Extensible Can be changed, modified Freely distributed Copies Modified versions Alternatives to commercial/proprietary.
Open Source WGISS 39. Definition of Open Source Software (OSS)  Open source or open source software (OSS) is any computer software distributed under.
CWG2 on Tools, guidelines and procedures Licensing Adriana Telesca on behalf of the CWG2 December, 5 th 2014.
How Is Open Source Affecting Software Development? Je-Loon Yang.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.
1 EPICS EPICS Licensing BESSY, May 2002 Andrew Johnson.
Licenses A Legal Necessity Copyright © 2015 – Curt Hill.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Yuki Manabe*, Daniel M. German†,‡ and Katsuro Inoue†
Dependency Tracking in software systems Presented by: Ashgan Fararooy.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Kinds of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
PlanetSim Release 3.0 Candidate in depth Jordi Pujol Ahulló Universitat Rovira i Virgili
CDL-Flex Empirical Research
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University VerXCombo: An.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Assessment of the Quality of Refactoring.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Investigation of Opportunities for Refactoring.
Copyright © 2015 NTT DATA Corporation Kazuo Kobori, NTT DATA Corporation Makoto Matsushita, Osaka University Katsuro Inoue, Osaka University SANER2015.
1 European Open source Lawyers Event Paris, Capitale du Libre 24 September 2008 Philippe LAURENT Researcher at the CRID (Research Centre on IT and Law)
Providing Access to Your Data: Rights Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International Earth Science.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Towards a Collection of Refactoring Patterns Based.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
Chapter 3: Understanding Software Licensing
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Estimating Code Size After a Complete Code-Clone Merge Buford Edwards III, Yuhao Wu, Makoto Matsushita, Katsuro Inoue 1 Graduate School of Information.
Localization by TDOA ©Thomas Haenselmann – Department of Computer Science IV – University of Mannheim Lecture on Sensor Networks Historical Development.
Open Source Software Practices
Estimate Testing Size and Effort Using Test Case Point Analysis
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Boris Todorov1, Raula Gaikovina Kula2, Takashi Ishio2, Katsuro Inoue1
Predicting Fault-Prone Modules Based on Metrics Transitions
Quaid-i-Azam University
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Reno WordPress Meetup February 12, 2015.
Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue
On Refactoring Support Based on Code Clone Dependency Relation
Empirical Studies on License Compliance and Copyright Inconsistency Risks in Open Source Software Shi QIU.
Where Does This Code Come from and Where Does It Go?
APACHE LICENSE HISTORICAL EVOLUTION
Research Activities of Software Engineering Lab in Osaka University
Overview Activities from additional UP disciplines are needed to bring a system into being Implementation Testing Deployment Configuration and change management.
Large-scale Analysis of Software Reuse for Code and License Changes
Why do we refactor? Technical Debt Items Versus Size: A Forensic Investigation of What Matters Hello everyone, I’m Ehsan Zabardast I am a PhD candidate.
Presentation transcript:

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large- Scale Open Source Projects Yuhao Wu 1, Yuki Manabe 2, Tetsuya Kanda 1, Daniel M. German 3, Katsuro Inoue Osaka University, Japan 2 Kumamoto University, Japan 3 University of Victoria, Canada

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Open Source Software License A legal instrument governing the use or redistribution of software, usually put in the header of a source file GPLv3+ Generally not allowed to be modified, removed or changed without copyright owner’s permission MIT This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. … The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 2

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Files with same contents under different licenses –License A License B –License C No license Definition of License Inconsistency –Two source files that evolved from the same provenance contain different licenses Motivation 3 License inconsistency indicates potential license violation problems LGPLv2.1Apachev2.0 EPLv1.0 None

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Problem No research has been done to address these questions: –RQ1: How many types of license inconsistency are there? –RQ2: Do they exist in large open source products? –RQ3: What is the proportion of each type of license inconsistency? –RQ4: What caused these license inconsistencies? Are they legal? 4

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Approach Overview Distribution Project1Project2Project3… a.cppb.java … a.cpp_0a.cpp_1  Select files that have the same file name  Group semantically identical files using CCFinder [1] Detect the license of each file in each group using Ninka [2] …  Calculate metrics for the groups that contain license inconsistencies File name#LicensesNoneUnknown a.cpp_0250 a.cpp_1220 … 5 [1] T. Kamiya, S. Kusumoto, and K. Inoue, “CCFinder: A multilinguistic token-based code clone detection system for large scale source code,” IEEE Transactions on Software Engineering, vol. 28, no. 7, pp. 654–670, [2] D. M. German, Y. Manabe, and K. Inoue, “A sentence-matching method for automatic license identification of source code files,” in Proceedings of the 25th International Conference on Automated Software Engineering (ASE2010), 2010, pp. 437–446.

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Empirical Study Goal –To reveal the characteristics of license inconsistency in a large open source software Target: –Debian 7.5 Categorization –LAR: License Addition or Removal –LUD: License Upgrade or Downgrade –LC: License Change 6 CharacteristicsNumber Projects17,160 Total files6,136,637.c files472,861.cpp files224,267.java files365,213 Inconsistency typeNumberPerc. LC5, % LUD2, % LAR1, %

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Answers to RQ1-3 RQ1: How many types of license inconsistency are there in the target distribution? –3 types: LAR, LUD and LC RQ2: Do they exist in large open source projects? –Yes, they exist in Debian 7.5 RQ3: What is the proportion of each type of license inconsistency? –LAR (28.0%), LUD (43.9%) and LC (98.4%) 7

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Manual Analysis To determine the reason and safety of each license inconsistency (RQ4): 1.Find the repository of each related project 2.Check the license evolution of the files 3.Find out when and why the license is modified 4.Determine whether the license modification is legally safe or not 8

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Example of license change glassfish tomcat6 tomcat 5.5.x TagLibraryInfo.java 9 Apachev1.1 Apachev2 CDDL Multiple: CDDL, GPLv2 or Apachev2 Tools to maintain the licenses Discussed with Apache people and changed to combined licenses Validating the license by file basis is complicated and expensive Debian7.5

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Example of license inconsistencies Other safe examples –Original file is multi-licensed, which means the developer can choose either license from them and remove the others. –Original file is under a permissive license, developers added another compatible license to it. A suspicious example –Developers reused a file licensed under BSD3, but they changed the license to GPLv2 and also modified the copyright owner, which is not allowed in the original license. 10

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Answer to RQ4 RQ4: What caused these license inconsistencies? Are there potentially illegal license modifications? –i) Original author modified/upgraded the license; –ii) The file was originally multi-licensed and reusers chose either one; –iii) Reuser added one or more compatible licenses; –iv) Reuser replaced the original license, and changed the copyright owner. Among them, the last type of license modification is unsafe. 11

Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Conclusion An efficient method is proposed to detect license inconsistencies in open source projects An exploratory study is done to investigate the license violation problems Challenges of license maintenance are revealed Future work –Apply this method to more projects to detect more patterns –Develop a (semi-)automatic method to identify whether the license changes are legal or not. 12