Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:

Slides:



Advertisements
Similar presentations
Assieme: Finding and Leveraging Implicit References in a Web Search Interface for Programmers I am Raphael Hoffmann and this is joint work with James Fogarty.
Advertisements

Configuration management
Introduction to Maven 2.0 An open source build tool for Enterprise Java projects Mahen Goonewardene.
Web Toolkit Julie George & Ronald Lopez 1. Requirements  Java SDK version 1.5 or later  Apache Ant is also necessary to run command line arguments 
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
What is a Programming Language? The computer operates using binary numbers. The computer only knows about 1’s and 0’s. Humans can also use 1’s and 0’s,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
SWE Introduction to Software Engineering
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Prototype of.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
Swami NatarajanJuly 14, 2015 RIT Software Engineering Reliability: Introduction.
Stimulating reuse with an automated active code search tool Júlio Lins – André Santos (Advisor) –
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Yuki Manabe*, Daniel M. German†,‡ and Katsuro Inoue†
Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Kinds of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
Configuration Management (CM)
INFSOM-RI Juelich, 10 June 2008 ETICS - Maven From competition, to collaboration.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A Method to Detect License Inconsistencies for Large-
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
CS266 Software Reverse Engineering (SRE) Reversing and Patching Java Bytecode Teodoro (Ted) Cipresso,
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Design and Implementation.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Software Engineering Research Group, Graduate School of Engineering Science, Osaka University 1 Evaluation of a Business Application Framework Using Complexity.
OWASP Dependency-Check
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Technology and Science, Osaka University Dependence-Cache.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University VerXCombo: An.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
Generative Approaches for Application Tailoring of Mobile Devices Victoria M. Davis, Dr. Jeff Gray (UAB) and Dr. Joel Jones (UA) Portions of this research.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Code Clones.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
© Federal Statistical Office, Institute for Research and Development in Federal Statistics, Elmar Wein Federal Statistical Office Concepts, materials and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Cage: A Keyword.
Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer.
Maven for building Java applications By Nalin De Zoysa
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Polytechnic University of Tirana Faculty of Information Technology Computer Engineering Department A MULTITHREADED SEARCH ENGINE AND TESTING OF MULTITHREADED.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
Maven. Introduction Using Maven (I) – Installing the Maven plugin for Eclipse – Creating a Maven Project – Building the Project Understanding the POM.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC
Experience Report: System Log Analysis for Anomaly Detection
Maven 04 March
Source File Set Search for Clone-and-Own Reuse Analysis
prepared by hasan.we4tech.com
A Generalized Model for Visualizing Library Popularity, Adoption, and Diffusion within a Software Ecosystem Raula Gaikovina Kula, Coen De Roover, Daniel.
Boris Todorov1, Raula Gaikovina Kula2, Takashi Ishio2, Katsuro Inoue1
Software Engineering Laboratory, Osaka University
Visualizing the Evolution of Systems and their Library Dependencies
Raula Gaikovina Kula, Daniel German, Takashi Ishio, Katsuro Inoue
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
Daniel Kim Software Engineering Laboratory Professor Katsuro Inoue
Where Does This Code Come from and Where Does It Go?
Inductive Clustering: A technique for clustering search results Hieu Khac Le Department of Computer Science - University of Illinois at Urbana-Champaign.
PADLA: A Dynamic Log Level Adapter Using Online Phase Detection
Near-Omniscient Debugging for Java Using Size-Limited Execution Trace
Presentation transcript:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients: Detection of Third-party Component Reuse in Java Software Release Takashi Ishio †, Raula Gaikovina Kula †, Tetsuya Kanda †, Daniel M. German ‡, Katsuro Inoue † MSR2016 † Osaka University, Japan ‡ University of Victoria, Canada

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Motivation: Software Reuse In Java, many binary components are reused in a product binary. 1 Google Web Toolkit All-in-one package for WebApp development Apache Ant Apache Commons Codec Apache Commons Collections Apache HttpClient Apache XalanHTMLUnit … is-made-of

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Is it safe? 2 Google Web Toolkit Security Advisories # Xalan-Java insufficient secure processing arbitrary code can be executed if … [Affected versions: before 2.7.2] All-in-one package for WebApp development Relevant? Product Documentation A partial list of components No version numbers

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Detection of Software Components 3 Google Web Toolkit Apache Ant Apache Commons Codec 1.8 Apache Commons Collections Apache HttpClient Apache Xalan HtmlUnit 2.13 Vulnerability Note VU# Apache Commons Collections library insecurely deserializes data. [Affected versions: 3.2.1, 4.0] # Xalan-Java insufficient secure processing arbitrary code can be executed if … [Affected versions: before 2.7.2] Our tool detects component names and their version numbers in a given jar file. is-made-of

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Actions based on Detection Result Upgrade the whole product if available Upgrade vulnerable components if available Use the product in a safe environment if upgrade is impossible Accept a risk (Continue to use the product) if vulnerability conditions are unsatisfiable 4

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A How to Detect Components 5 Input: a jar file gwt-dev jar ant jar ant jar ant jar collections jar collections jar collections jar Component Database (e.g. Maven.org) includes? ant jar collections jar Output: jar files that are the most likely included in the input file

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A previous work: Software Bertillonage 6 [target.jar] Class Signature A7fabc... Bff1dc... C07a21... E920b4... F6b9a3... Ga18e0... [X-1.0.jar] Class Signature A7fabc... Bff1dc... C07a21... [X-1.1.jar] Class Signature A7fabc... C07a21... D35e23... [Y-0.1.jar] Class Signature E920b4... [Z-0.2.jar] Class Signature E920b4... F6b9a3... Database Input: Component Likely Included? X-1.0.jar0.5 ✔ X-1.1.jar0.286 Y-0.1.jar0.167 ✔ Z-0.2.jar0.333 ✔ A user has to manually identify original components using the information.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University The key difference: Greedy search Strategy: –Select the largest, entirely copied jar file 7 [target.jar] Class Signature A7fabc... Bff1dc... C07a21... E920b4... F6b9a3... Ga18e0... [X-1.0.jar] Class Signature A7fabc... Bff1dc... C07a21... [X-1.1.jar] Class Signature A7fabc... C07a21... D35e23... [Y-0.1.jar] Class Signature E920b4... [Z-0.2.jar] Class Signature E920b4... F6b9a3... Input: Database

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University The key difference: Greedy search Strategy: –Select the largest, entirely copied jar file Greedy Search in this example: 1.Select X-1.0 because it provides 3 of 6 classes. 8 [target.jar] Class Signature A7fabc... Bff1dc... C07a21... E920b4... F6b9a3... Ga18e0... [X-1.0.jar] Class Signature A7fabc... Bff1dc... C07a21... [X-1.1.jar] Class Signature A7fabc... C07a21... D35e23... [Y-0.1.jar] Class Signature E920b4... [Z-0.2.jar] Class Signature E920b4... F6b9a3... Database Input: 3 classes

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University The key difference: Greedy search Strategy: –Select the largest, entirely copied jar file Greedy Search in this example: 1.Select X-1.0 because it provides 3 of 6 classes. 2.Select Z-0.2 because it provides 2 of 3 remaining classes. 9 [target.jar] Class Signature A7fabc... Bff1dc... C07a21... E920b4... F6b9a3... Ga18e0... [X-1.0.jar] Class Signature A7fabc... Bff1dc... C07a21... [X-1.1.jar] Class Signature A7fabc... C07a21... D35e23... [Y-0.1.jar] Class Signature E920b4... [Z-0.2.jar] Class Signature E920b4... F6b9a3... Database Input: 2 classes

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University The key difference: Greedy search Strategy: –Select the largest, entirely copied jar file Greedy Search in this example: 1.Select X-1.0 because it provides 3 of 6 classes. 2.Select Z-0.2 because it provides 2 of 3 remaining classes. 3.X-1.1 and Y-0.1 are not selected because they do not cover the remaining class G. 10 [target.jar] Class Signature A7fabc... Bff1dc... C07a21... E920b4... F6b9a3... Ga18e0... [X-1.0.jar] Class Signature A7fabc... Bff1dc... C07a21... [X-1.1.jar] Class Signature A7fabc... C07a21... D35e23... [Y-0.1.jar] Class Signature E920b4... [Z-0.2.jar] Class Signature E920b4... F6b9a3... Database Input:

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University The key difference: Greedy search 11 [target.jar] Class Signature A7fabc... Bff1dc... C07a21... E920b4... F6b9a3... Ga18e0... [X-1.0.jar] Class Signature A7fabc... Bff1dc... C07a21... [X-1.1.jar] Class Signature A7fabc... C07a21... D35e23... [Y-0.1.jar] Class Signature E920b4... [Z-0.2.jar] Class Signature E920b4... F6b9a3... Database Input: is-made-of Strategy: –Select the largest, entirely copied jar file Greedy Search in this example: 1.Select X-1.0 because it provides 3 of 6 classes. 2.Select Z-0.2 because it provides 2 of 3 remaining classes. 3.X-1.1 and Y-0.1 are not selected because they do not cover the remaining class G.

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ant jar commons- codec-1.8.jar collections jar Experiment to evaluate accuracy Comparison with the previous work –Component database: Sourcerer Dataset (172,232 jar files) A snapshot of Maven repository on August, –1,000 artificial products: We randomly selected 10—1,000 components and repackaged them into a single jar file. 12 ant jar commons- codec-1.8.jar collections jar An artificially mixed jar file Our method and the previous work Copy classes ant jar commons- codec-1.8.jar collections jar Verify the result (Compute precision and recall) Randomly selected components Reported components

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Result: Precision and Recall 13 The previous work Our method Precision:0.357  Improved! Recall:0.993  0.997

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Conclusion Our method detects components in a Java binary file. –Compare a binary with all the components in a database –Introduced a greedy search to select reused components Precision:0.357  Recall:0.993  –Our simple implementation (< 2.5KLOC) is available on GitHub Future Work –Component detection in source code –Empirical studies on inter-project code reuse 14

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 15

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Limitation The experiment is performed on an ideal situation: The database included all the reused components. –In reality, it is not so easy to keep all the components. Our method and the previous work use identifiers (e.g. package names, class names) to compare classes. –Our method is applicable to release engineering activities and open source projects. –Our method is inapplicable to obfuscated code. We need a technique to identify similar classes in obfuscated code. 16

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Q. Component dependencies are managed by a tool such as Maven -- Is it insufficient? A.Insufficient. Because some components have an internal copy of their dependent components. For example, GWT re-packages all the dependent components to simplify dependencies. The dependent components, e.g. Ant and Xalan, do not appear in pom files of GWT users. 17

Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Q. Why don’t you use a MD5 file hash to compare classes? A file hash (e.g. SHA-1, MD5) cannot compare classes, because different compilers generate different binary files. –JDK version and debug information also affects binary files. Davies et al. reported that 48% of jar files in Debian GNU/Linux have no class files that were identical to any classes in the Maven Central Repository. 18