Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.

Slides:



Advertisements
Similar presentations
Page 1 October 31, 2000 An Introduction to Large-Scale Software Development Steve Varnau Core HP-UX Operation October 31, 2000.
Advertisements

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Identifying Source.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolutional Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University On the Effectiveness.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code.
Computer Programming and Basic Software Engineering 4. Basic Software Engineering 1 Writing a Good Program 4. Basic Software Engineering 3 October 2007.
Unit Five – Transforming Organizations
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Automatic Categorization.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Measuring Copying.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Where Does This.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University CoxR: Open Source.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Debugging Support.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University ICSE 2003 Java.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Finding Similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Kinds of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Refactoring.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University DCCFinder: A Very- Large Scale Code Clone Analysis.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Investigation.
Software Engineering CS3003 Lecture 3 Software maintenance and evolution.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University A clone detection approach for a collection of similar.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University What Do Practitioners.
Chapter 3: Completing the Problem- Solving Process and Getting Started with C++ Introduction to Programming with C++ Fourth Edition.
Configuration Management (CM)
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Code-Clone Analysis.
2002/12/11PROFES20021 On software maintenance process improvement based on code clone analysis Yoshiki Higo* , Yasushi Ueda* , Toshihiro Kamiya** , Shinji.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Detection and evolution analysis of code clones for.
CSE 219 Computer Science III Program Design Principles.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Design and Implementation.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Software Engineering Chapter 3 CPSC Pascal Brent M. Dingle Texas A&M University.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Technology and Science, Osaka University Dependence-Cache.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University VerXCombo: An.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Development of.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Retrieving Similar Code Fragments based on Identifier.
Union-find Algorithm Presented by Michael Cassarino.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
CASE/Re-factoring and program slicing
An approach for Framework Construction and Instantiation Using Pattern Languages Rosana Teresinha Vaccare Braga Paulo Cesar Masiero ICMC-USP: Institute.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University IWPSE 2003 Program.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Cage: A Keyword.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Extracting a Unified Directory Tree to Compare Similar Software Products Yusuke Sakaguchi, Takashi Ishio, Tetsuya Kanda, Katsuro Inoue Department of Computer.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
P51UST: Unix and SoftwareTools Unix and Software Tools (P51UST) Version Control Systems Ruibin Bai (Room AB326) Division of Computer Science The University.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University An Empirical Study of Out-dated Third-party Code.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Classification.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
Minimum- Spanning Trees
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Towards a Collection of Refactoring Patterns Based.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
1 Object-Oriented Analysis and Design with the Unified Process Figure 13-1 Implementation discipline activities.
1 The FreeBSD Project: a Replication Case Study of Open Source Development.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Software Ingredients:
Naoya Ujihara1, Ali Ouni2, Takashi Ishio1, Katsuro Inoue1
Lecture 6: Data Versioning
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
Boris Todorov1, Raula Gaikovina Kula2, Takashi Ishio2, Katsuro Inoue1
Software Engineering Laboratory, Osaka University
Speaker: Liu Shuchang Osaka University
Predicting Fault-Prone Modules Based on Metrics Transitions
Finding File Clones in FreeBSD Ports Collection
Near-Omniscient Debugging for Java Using Size-Limited Execution Trace
Presentation transcript:

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of Product Evolution Tree from Source Code of Product Variants Tetsuya Kanda, Takashi Ishio, Katsuro Inoue 1

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Developing a new software product Clone-and-own approach [1] –Copying existing code/project Copy and modify branched... [1] Rubin et al. “Managing forked product variants” SPLC Copy and modify

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University As a result Many products are created and stored in a company. 3

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University From existing products to product line A company already has a large number of products without applying SPLE. The construction of a software product line from existing products is a major problem. Compare source code to extract information –Intersection: Common features –Differences: Product specific features Analyzing a large number of software products is a difficult task for developers. 4

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Product selection 5 [2] Krueger “Easing the transition to software mass customization” PFE 2001 Choose representative software products as a starting point [2]. Pick up products with a principle –Products in the same branch –Products among branches

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Relationships among products 6 Nonaka et al. “A preliminary analysis on corrective maintenance for an embedded software product family” IPSJ SIG Technical Report, 2009.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Relationships among products 7 Compare products in the same branch to extract bug fixes and additional features Compare products in the same branch to extract bug fixes and additional features Nonaka et al. “A preliminary analysis on corrective maintenance for an embedded software product family” IPSJ SIG Technical Report, 2009.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Relationships among products 8 Compare products between branches to extract core features and product specific features Compare products between branches to extract core features and product specific features Nonaka et al. “A preliminary analysis on corrective maintenance for an embedded software product family” IPSJ SIG Technical Report, 2009.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University The evolution history Evolution history of software products shows the relationships among the products. –Helps selection of the products Is the history always available? 9

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University The history is not available Products are not always version controlled. –Or managed independently and relationships between branches are not recorded In the worst case, developers only have access to source code of each product. –No version numbers, no release dates Lost 10

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Proposal: Product Evolution Tree We extract an approximation of the evolution history of software products. –Analyze products using only the source code. 11 Source code Product Evolution Tree

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Key idea Similar products has similar source files Product B is more similar than Product C compared with Product A. Product A Product C Product A Product B : similar source file pair 12

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Construction of the Product Evolution Tree 1.File similarity calculation –Detect similar file pairs 2.Product similarity calculation –Count the number of similar file pairs 3.Construction of the minimum spanning tree 4.Evolution direction calculation 13

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Calculate the similarity for all pairs of files across different products class A { int a = 0; public int getA(){ return a; } class B { int a = 0; public void incA(){ a++; } File similarity calculation 14 class A { int a = 0 ; public int getA ( ) { return a ; } class B { int a = 0 ; public void incA ( ) { a ++ ; } A specific LCS B specific = 15 / 23 = 0.65… + +

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Product similarity calculation Product A Product B : similar source file pair 15

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Construction of the minimum spanning tree Vertex: Software product Edge: connects products Minimum spanning tree –A tree which has the smallest total cost Prim's algorithm Total cost: -27

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Evolution direction calculation Hypothesis: Source code is likely added. –The new version of the software should have additional features. Count the total number of modified tokens between projects old new ADDED CODE deleted code

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Case study 6 datasets from OSS (written in C) –4 datasets from PostgreSQL Single project –1 dataset from FFmpeg and Libav Libav is forked from FFmpeg and is developed by a group of FFmpeg developers. –1 dataset from 4.4BSD-lite, FreeBSD, NetBSD, OpenBSD 4.4BSD-Lite and its derived OSs. 18

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Input and Output 19 Input: source files Each directory contains source files of one product Output: Producrt Evolution Tree

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Recall DatasetEdges in the actual history Matched Edges without direction Matched Edges with direction %1191.7% % % %3081.1% %2083.3% %1173.3% %952.9% 20

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Dataset 4 (1/2) Picked up PostgreSQL 8.X series released in every Septembers 21

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Dataset 4 (2/2) 83.3% recall Using the cost value, we can identify branches. All edges inside the branches are correct. –We can identify initial and latest versions of each branch  Cost:  Cost: -177

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 4.4BSD-lite, FreeBSD, NetBSD, OpenBSD One product branched into three products Dataset 6 (1/3) 23

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Dataset 6 (2/3) Product Evolution TreeThe family-tree Based on “bsd-family-tree” in the FreeBSD project 24 2 of 4 latest versions of the family-tree are detected by Product Evolution Tree

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Dataset 6 (3/3) 52.9% recall Misdetection increased for the products with the complex history –Some edges shows reversed direction (green) –connecting between branches are mismatched (red) 25

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Misdetection Patterns Dataset Version Skip11 Misalignment of Branch45412 Misdirection1823 Missing merge2 Out of Place Connects exact products but direction is wrong. This pattern can be recovered with the release date. Without considering this misdetection pattern, recall is about 80%

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Concluding remarks Our tool and datasets are available online. – Product Evolution Tree visualizes relationships among software products from their source code. –Branches and latest versions can be identified. Future work –Improve the cost function –Extend datasets to other programming languages –Case study with industrial developers 27