Improving the Unification of Software Clones Using Tree & Graph Matching Algorithms Giri Panamoottil Krishnan Supervisor: Dr. Nikolaos Tsantalis 22.04.14.

Slides:

Advertisements

Similar presentations

Refactoring Clones: A New Perspective Nikolaos Tsantalis and Giri Panamoottil Krishnan Computer Science & Software Engineering.

Advertisements

Unification and Refactoring of Clones Giri Panamoottil Krishnan and Nikolaos Tsantalis Department of Computer Science & Software Engineering Clone images.

Architecture-dependent optimizations Functional units, delay slots and dependency analysis.

gSpan: Graph-based substructure pattern mining

CSC 421: Algorithm Design & Analysis

Ranking Refactoring Suggestions based on Historical Volatility Nikolaos Tsantalis Alexander Chatzigeorgiou University of Macedonia Thessaloniki, Greece.

Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?

Types of Algorithms.

IPOG: A General Strategy for T-Way Software Testing

1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.

Code Smell Research: History and Future Directions Second PLOW Installment - March 5, Nikolaos Tsantalis Computer Science & Software Engineering.

CS590 Z Matching Program Versions Xiangyu Zhang. CS590Z Problem Statement  Suppose a program P’ is created by modifying P. Determine the difference between.

Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.

Preventive Software Maintenance: The Past, the Present, the Future Nikolaos Tsantalis Computer Science & Software Engineering Consortium for Software Engineering.

Today’s Agenda  HW #1 Due  Quick Review  Finish Input Space Partitioning  Combinatorial Testing Software Testing and Maintenance 1.

1 Using Search in Problem Solving Part II. 2 Basic Concepts Basic concepts: Initial state Goal/Target state Intermediate states Path from the initial.

3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.

Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.

Testing an individual module

Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.

TK3043 Analysis and Design of Algorithms Introduction to Algorithms.

Introduction SWE 619. Why Is Building Good Software Hard? Large software systems enormously complex  Millions of “moving parts” People expect software.

© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.

Robert Tairas (INRIA & EMN) Ferosh Jacob (University of Alabama) Jeff Gray (University of Alabama) International Workshop on Software Clones (IWSC) – May.

Mining and Analysis of Control Structure Variant Clones Guo Qiao.

Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.

Chapter 12 Recursion, Complexity, and Searching and Sorting

Recursion and Dynamic Programming. Recursive thinking… Recursion is a method where the solution to a problem depends on solutions to smaller instances.

Fundamentals of Algorithms MCS - 2 Lecture # 7

Agenda Introduction Overview of White-box testing Basis path testing

CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.

1 Software Testing. 2 Path Testing 3 Structural Testing Also known as glass box, structural, clear box and white box testing. A software testing technique.

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.

Generative Programming. Automated Assembly Lines.

On Reducing the Global State Graph for Verification of Distributed Computations Vijay K. Garg, Arindam Chakraborty Parallel and Distributed Systems Laboratory.

Analysis of Concurrent Software Models Using Partial Order Views Qiang Sun, Yuting Chen,

1 CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014.

IDENTIFYING SEMANTIC DIFFERENCES IN ASPECTJ PROGRAMS Martin Görg and Jianjun Zhao Computer Science Department, Shanghai Jiao Tong University.

Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.

Gordana Rakić, Zoran Budimac

Automatically detecting and describing high level actions within methods Presented by: Gayani Samaraweera.

Industrial Project (236504) Advanced programming tools for refactoring Java code in Eclipse Student: Alexander Libov Supervisor: Dr. Ran Ettinger, IBM.

Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.

1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.

Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.

Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.

1 Test Coverage Coverage can be based on: –source code –object code –model –control flow graph –(extended) finite state machines –data flow graph –requirements.

Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.

Software Testing. SE, Testing, Hans van Vliet, © Nasty question  Suppose you are being asked to lead the team to test the software that controls.

Andy Nguyen Christopher Piech Jonathan Huang Leonidas Guibas. Stanford University.

Migrating CSS to Preprocessors by Introducing Mixins

Module Road Map Refactoring Why Refactoring? Examples

Hans Bodlaender, Marek Cygan and Stefan Kratsch

CBCD: Cloned Buggy Code Detector

A Refactoring Technique for Large Groups of Software Clones

Clone Refactoring with Lambda Expressions

Rename Local Variable Refactoring Instances

Types of Algorithms.

: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein

Types of Algorithms.

Programming Languages 2nd edition Tucker and Noonan

IPOG: A General Strategy for T-Way Software Testing

Assessing the Refactorability of Software Clones

Efficient Subgraph Similarity All-Matching

Types of Algorithms.

Query Optimization.

Recommending Adaptive Changes for Framework Evolution

Programming Languages 2nd edition Tucker and Noonan

Matching Program Versions

Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.

Presentation transcript:

Improving the Unification of Software Clones Using Tree & Graph Matching Algorithms Giri Panamoottil Krishnan Supervisor: Dr. Nikolaos Tsantalis

Outline Motivation Goal Approach Evaluation Conclusion Publications Future work 2

Motivation Harmful effects of software clones – They are error-prone due to inconsistent updates – Increase maintenance effort and cost – They are change-prone 3

Motivation Poor performance of current refactoring tools Eclipse10.6% CeDAR18.7% 4

Motivation Limitations of current refactoring tools 5 Current tools can parameterize only a small set of differences in clones. Eg: Identifiers, literals, simple method calls. Tools should be able to parameterize non-trivial differences. Eg: Expression replaced by a method call.

Motivation Limitations of current refactoring tools 6 They may not return the best matching solutions. – They do not explore the entire search space of possible matches. In case of multiple possible matches, they select the “first” or “best” match at that point. – They face scalability issues due to the problem of combinatorial explosion.

if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } Clone #1 Clone #2 7

else if (orientation == HORIZONTAL) { } if (orientation == VERTICAL) { } if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } Clone #1 Clone #2 8

if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } Clone #1 Clone #2 9

Minimizing differences Minimizing the differences during the matching process is critical for refactoring. Why? – Less differences means less parameters for the extracted method (i.e., a more reusable method). – Less differences means also lower probability for precondition violations (i.e., higher refactoring feasibility) Matching process objectives: – Maximize the number of matched statements – Minimize the number of differences between them 10

Motivation Limitations of current refactoring tools 11 There are no preconditions to determine whether clones can be safely refactored. – The parameterization of differences might change the behavior of the program. – Statements in gaps need to be moved before the cloned code. Changing the order of statements might also affect the behavior of the program.

Goal 12 Improving the state-of-the-art in the Refactoring of Software clones Optimal mapping with minimum differences Exhaustive search without compromising the performance Preserve code behavior by extensive rules Find the most appropriate refactoring strategy to eliminate the clones

Approach 13

Phase 1 14 Control Structure Matching Assumption Two pieces of code can be merged only if they have an identical control structure. We extract the Control Dependence Trees (CDTs) representing the control structure of the input methods or clones. We find all non-overlapping largest common subtrees within the CDTs in a bottom-up manner. Each subtree match will be treated as a separate refactoring opportunity.

CDT Subtree Matching C A B EDGF c a b gfed x y CDT of Fragment #1CDT of Fragment #2 15

Phase 2 16 PDG Mapping We extract the PDG subgraphs corresponding to the matched CDT subtrees. We want to find the common subgraph that satisfies two conditions: It has the maximum number of matched nodes The matched nodes have the minimum number of differences. This is an optimization problem that can be solved using an adaptation of a Maximum Common Subgraph algorithm [McGregor, 1982].

MCS Algorithm Builds a search tree in depth-first order, where each node represents a state of the search space. Explores the entire search space. It has an factorial worst case complexity. As the number of possible matching node combinations increases, the width of the search tree grows rapidly (combinatorial explosion). 17

Divide-and-Conquer We break the original matching problem into smaller sub-problems based on the control dependence structure of the clones. The sub-problem is the mapping of PDG subgraphs corresponding to the set of statements nested under two control predicate nodes. Finally, we combine the sub-solutions to give a global solution to the original matching problem. 18

Bottom-up Divide-and-Conquer C A B EDGF c a b gfed Level 2 CDT subtree of Clone #1CDT subtree of Clone #2 Dd 19

Bottom-up Divide-and-Conquer C A B EGF c a b gfe Level 2 CDT subtree of Clone #1CDT subtree of Clone #2 Ee 20

Phase 3 21 Precondition checking Preconditions related to clone differences: – Parameterization of differences should not break existing data dependences in the PDGs. – Reordering of unmapped statements should not break existing data dependences in the PDGs.

Phase 3 22 Precondition checking Preconditions related to method extraction: – The unified code should return one variable at most. – Matched branching (break, continue) statements should be accompanied with the corresponding matched loops in the unified code. – If two clones belong to different classes, these classes should extend a common superclass.

Evaluation 23 We compared our approach with a state-of- the-art tool in the refactoring of Type-II clones, CeDAR [Tairas & Gray, IST’12] clone groups, detected in 7 open-source projects by Deckard clone detection tool. CeDAR is able to analyze only clone groups in which all clones belong to the same Java ﬁle.

Clone groups within the same Java file 24

Clone groups within different Java files 25

Clone groups violating only one pre-condition 26

Performance analysis: Node comparisons 27

Performance analysis: Total time 28 The CPU time taken for the execution of the PDG mapping process for each of the clone groups was calculated. The mean value is found to be ms.

Conclusion The approach was able to refactor 82% more clone groups (in which clones are in the same file) than CeDAR. The approach could refactor 28% of the clone groups, in which clones are in different files. The experiment revealed that 37% of the clone groups can be refactored directly or by decomposing original clones into sub-clones. 29

Publications 1. Nikolaos Tsantalis and Giri Panamoottil Krishnan, "Refactoring Clones: A New Perspective" 7th International Workshop on Software Clones (IWSC'2013), San Francisco, California, USA, May 19, Giri Panamoottil Krishnan and Nikolaos Tsantalis, "Refactoring Clones: An Optimization Problem" 29th IEEE International Conference on Software Maintenance (ICSM'2013), Eindhoven, The Netherlands, September 22-28, Giri Panamoottil Krishnan and Nikolaos Tsantalis, "Unification and Refactoring of Clones", IEEE CSMR-WCRE 2014 Software Evolution Week (CSMR-WCRE'2014), Antwerp, Belgium, February 3-7,

What’s next? An extensive empirical study on the refactorability of clones detected from different clone detection tools such as ConQat, NiCad and CCFinder More challenging cases of Type-3 clones with more complex refactoring transformations To extend our AST matching mechanism in order to support the matching of different types of control predicate statements Unification of semantically equivalent expressions 31

Thank you 32 Harmful effects of software clones Poor performance of current refactoring tools Optimal mapping with minimum differences Exhaustive search Preserve code behavior by preconditions Approach Refactor 82% more clone groups than CeDAR Refactor 28% of the clone groups additionally 37% of the clone groups can be refactored Findings Evaluation Comparison with the state-of-the-art tool CeDAR 2342 clone groups from 7 open-source Java projects Extensive empirical study on the refactorability of clones Challenging cases of Type-3 and Type-4 clones Future Motivation Goals