Finding File Clones in FreeBSD Ports Collection

Slides:



Advertisements
Similar presentations
Whitebox Testing Fra: CS Fall Whitebox Testing AKA Structural, Basis Path Test Normally used at unit level Assumes errors at unit level are.
Advertisements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolutional Analysis.
1 Accumulative Versioning File System Moraine and Its Application to Metrics Environment Mame Tetsuo Yamamoto * Makoto Matsushita * Katsuro Inoue *,**
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
CSCI-1411 F UNDAMENTALS O F C OMPUTING L AB Shane Transue Summer
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Digital Signatures, Font Files, and Recursion Illustrated Examples.
CS211 Data Structures Sami Rollins Fall 2004.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
CS 101 Problem Solving and Structured Programming in C Sami Rollins Spring 2003.
Cyclomatic Complexity Dan Fleck Fall 2009 Dan Fleck Fall 2009.
Software Engineering Lab, Osaka University Code Clone Analysis and Its Application Katsuro Inoue Osaka University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Hello World 2 What does all that mean?.
Dawn Pedersen. Flick the switch… What happens when you turn a computer on?
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
An Adaptive Version-Controlled File System Makoto Matsushita, Tetsuo Yamamoto and Katsuro Inoue Osaka University, JAPAN.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
GNU Compiler Collection (GCC) and GNU C compiler (gcc) tools used to compile programs in Linux.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
1 Gemini: Maintenance Support Environment Based on Code Clone Analysis *Graduate School of Engineering Science, Osaka Univ. **PRESTO, Japan Science and.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
CSE 2541 – Advanced C Programming. Course info Prereq – CSE 2221 or CSE 222 Co-req – CSE 2231 Website
An Effective Method to Control Interrupt Handler for Data Race Detection Makoto Higashi †, Tetsuo Yamamoto ‡, Yasuhiro Hayase †, Takashi Ishio † and Katsuro.
How to start Visual Studio 2008 or 2010 (command-line program)
Mining Logical Clones in Software: Revealing High-Level Business & Programming Rules Wenyi Qian 1, Xin Peng 1, Zhenchang Xing 2, Stan Jarzabek 3, Wenyun.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University 1 Towards an Investigation of Opportunities for Refactoring.
1 Original Source : and Problem and Problem Solving.ppt.
1 A simple C++ program // ======================================================= // File:helloworld.cpp // Author:Vana Doufexi // Date:1/4/2006 // Description:Displays.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Towards Generating Templates of Method Body Based on Method Name and Related Identifiers Yuya Onizuka, Yasuhiro Hayase, Tetsuo Yamamoto, Yuki Kashiwabara,
Multiple File Compilation and linking By Bhumik Sapara.
1 Gemini: Code Clone Analysis Tool †Graduate School of Engineering Science, Osaka Univ., Japan ‡ Graduate School of Information Science and Technology,
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection of License Inconsistencies in Free and.
Review: A Structural View program modules -> main program -> functions -> libraries statements -> simple statements -> compound statements expressions.
On Detection of Gapped Code Clones using Gap Locations Yasushi Ueda†, Toshihiro Kamiya‡, Shinji Kusumoto†, and Katsuro Inoue† †Graduate School of Information.
KYC - Know your compiler Introduction to GCC
ANNOUNCEMENT The missed lecture will be made up this Monday evening in the Tech PC classroom (MG51). A tentative time interval is 6:30-8:00. The exact.
1-1 Logic and Syntax A computer program is a solution to a problem.
Computer Programming Chapter 1: Introduction
Introduction to C Language
CBCD: Cloned Buggy Code Detector
Functions CIS 40 – Introduction to Programming in Python
A Pluggable Tool for Measuring Software Metrics from Source Code
Hello World 2 What does all that mean?.
Yuta Nakamura1, Eunjong Choi1, Norihiro Yoshida2,
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Individual Research Presentation
Information Flow Metric
Yuhao Wu1, Yuki Manabe2, Daniel M. German3, Katsuro Inoue1
OurSQL = MySQL + Blockchain
Part of knowledge base of fuzzy logic expert system for exercise control of diabetics
Kazuki Yokoi1 Eunjong Choi2 Norihiro Yoshida3 Katsuro Inoue1
Where Does This Code Come from and Where Does It Go?
Research Activities of Software Engineering Lab in Osaka University
Compile and run c files.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Arithmatic Logic Unit (ALU). ALU Input Data :  A0-A3  B0-B3 Output Data :  F0 – F3.
PYTHON - VARIABLES AND OPERATORS
Presentation transcript:

Finding File Clones in FreeBSD Ports Collection Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue

File Clones Research about file-clones is scarce Two or more files with the same content Comments and code indentation ignored Inside a project or between different projects Research about file-clones is scarce Get new knowledge about file-clones Project A Project B int main() { printf(“Hello msr!”); return 0; }

FCFinder Input Output Faster than other tools Detection .c and .h files Output File-clone sets Faster than other tools Detection Tokenization MD5 Hash Calculation Exact Matching Tool Speed CCFinder 1.4M files / 960 hours x1 1PC D-CCFinder 1.4M files / 51 hours x19 80PCs FCFinder 1.4M files / 17.16 hours x55

These values follow the power law Experiment Target Only .c and .h files in the FreeBSD Ports Collection ~1.4M files ~12 GB 17.16 hours We measured: File size Number of files in each project Size of each file-clone set Number of file-clones in a project These values follow the power law

File-clone Set Size file clone set size 5 10 50 100 Left:used in PHP5 Right:used in PHP4 used in both of PHP4 and 5 D E L:650 sets R:500 sets 419 sets 120 file clones 5 10 50 100 L:61 file clones R:59 file clones file clone set size R*2 = 0.8508

File-clones per Project Right:PHP4 modules Center:projects related bin-utils Left:PHP5 modules G 5 10 50 100 500 1K 5K 10K number of file clone sets R*2 = 0.8263

File-clones Between Projects (1/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects Ex) gcc41 and gfortran shares 7691 file clones

File-clones Between Projects (2/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects

File-clones Between Projects (3/3) * Nodes show the projects * Edges between projects show the number of file clones between two projects

Conclusions & Future Work Measured several features of the FreeBSD Ports collection. Found that the measured features follow the power law Future Work Projects logical coupling investigation