Unification and Refactoring of Clones Giri Panamoottil Krishnan and Nikolaos Tsantalis Department of Computer Science & Software Engineering Clone images.

Slides:



Advertisements
Similar presentations
Duplicate code detection using Clone Digger Peter Bulychev Lomonosov Moscow State University CS department.
Advertisements

L3S Research Center University of Hanover Germany
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Refactoring Clones: A New Perspective Nikolaos Tsantalis and Giri Panamoottil Krishnan Computer Science & Software Engineering.
Greedy Algorithms.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Ranking Refactoring Suggestions based on Historical Volatility Nikolaos Tsantalis Alexander Chatzigeorgiou University of Macedonia Thessaloniki, Greece.
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
Types of Algorithms.
Searching on Multi-Dimensional Data
Dynamic Programming.
Code Smell Research: History and Future Directions Second PLOW Installment - March 5, Nikolaos Tsantalis Computer Science & Software Engineering.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Improving the Unification of Software Clones Using Tree & Graph Matching Algorithms Giri Panamoottil Krishnan Supervisor: Dr. Nikolaos Tsantalis
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..
Preventive Software Maintenance: The Past, the Present, the Future Nikolaos Tsantalis Computer Science & Software Engineering Consortium for Software Engineering.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Recent Development on Elimination Ordering Group 1.
Analyzing Software Code and Execution – Plagiarism and Bug Detection Shoaib Jameel.
Testing an individual module
Distributed Constraint Optimization * some slides courtesy of P. Modi
1 Compression Techniques to Simplify the Analysis of Large Execution Traces Abdelwahab Hamou-Lhadj and Dr. Timothy C. Lethbridge {ahamou,
Introduction SWE 619. Why Is Building Good Software Hard? Large software systems enormously complex  Millions of “moving parts” People expect software.
Detection and Resolution of Anomalies in Firewall Policy Rules
Robert Tairas (INRIA & EMN) Ferosh Jacob (University of Alabama) Jeff Gray (University of Alabama) International Workshop on Software Clones (IWSC) – May.
Program Design CMSC 201. Motivation We’ve talked a lot about certain ‘good habits’ we’d like you guys to get in while writing code. There are two main.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Fundamentals of Algorithms MCS - 2 Lecture # 7
Agenda Introduction Overview of White-box testing Basis path testing
CMCD: Count Matrix based Code Clone Detection Yang Yuan and Yao Guo Key Laboratory of High-Confidence Software Technologies (Ministry of Education) Peking.
Télécom 2A – Algo Complexity (1) Time Complexity and the divide and conquer strategy Or : how to measure algorithm run-time And : design efficient algorithms.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University How to extract.
Analysis of Concurrent Software Models Using Partial Order Views Qiang Sun, Yuting Chen,
Recovering Design Technical Debt from Source Code Comments Department of Computer Science and Software Engineering Concordia University Montreal, Canada.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Applications of Dynamic Programming and Heuristics to the Traveling Salesman Problem ERIC SALMON & JOSEPH SEWELL.
Industrial Project (236504) Advanced programming tools for refactoring Java code in Eclipse Student: Alexander Libov Supervisor: Dr. Ran Ettinger, IBM.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
Performance of Distributed Constraint Optimization Algorithms A.Gershman, T. Grinshpon, A. Meisels and R. Zivan Dept. of Computer Science Ben-Gurion University.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
What kind of and how clones are refactored? A case study of three OSS projects WRT2012 June 1, Eunjong Choi†, Norihiro Yoshida‡, Katsuro Inoue†
Wonjun Kim and Changick Kim, Member, IEEE
ECE450 - Software Engineering II1 ECE450 – Software Engineering II Today: Key Principles of Software Architecture and Design (II) adapted from Dave Penny’s.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Introduction to Algorithms: Brute-Force Algorithms.
Hybrid BDD and All-SAT Method for Model Checking
Software Metrics 1.
TK3043 Analysis and Design of Algorithms
A Refactoring Technique for Large Groups of Software Clones
CMSC201 Computer Science I for Majors Lecture 11 – Program Design
Clone Refactoring with Lambda Expressions
Types of Algorithms.
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
Types of Algorithms.
Assessing the Refactorability of Software Clones
Efficient Subgraph Similarity All-Matching
Types of Algorithms.
Dynamic Programming II DP over Intervals
Recursive Algorithms 1 Building a Ruler: drawRuler()
Minimax strategies, alpha beta pruning
Scalable light field coding using weighted binary images
CMSC201 Computer Science I for Majors Lecture 12 – Program Design
Time Complexity and the divide and conquer strategy
Software Testing and QA Theory and Practice (Chapter 5: Data Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presentation transcript:

Unification and Refactoring of Clones Giri Panamoottil Krishnan and Nikolaos Tsantalis Department of Computer Science & Software Engineering Clone images created by Rebecca Tiarks et al.

Motivation Clones may be harmful – Clones are associated with error-proneness due to inconsistent updates (Juergens et ICSE’09) – Clones increase significantly the maintenance effort and cost (Lozano et ICSM’08) – Clones are change-prone (Mondal et al. 2012) Some studies have shown that clones are stable IEEE CSMR-WCRE 2014 Software Evolution Week 2

Motivation cont'd Current refactoring tools perform poorly A study by Tairas & Gray [IST’12] on Type-II clones detected by Deckard in 9 open-source projects revealed: – only 10.6% of them could be refactored by Eclipse – CeDAR [IST’12] was able to refactor 18.7% of them IEEE CSMR-WCRE 2014 Software Evolution Week 3

Limitation #1 Current tools can parameterize only a small subset of differences in clones. – Mostly differences between variable identifiers, literals, simple method calls. IEEE CSMR-WCRE 2014 Software Evolution Week Rectangle rectangle = new Rectangle( a, b, c, high – low ); Rectangle rectangle = new Rectangle( a, b, c, getHeight() ); Clone #1 Clone #2 4

Limitation #2 Current approaches may return non-optimal matching solutions. – They do not explore the entire search space of possible matches. – In case of multiple possible matches, they select the “first” or “best” match. – They face scalability issues due to the problem of combinatorial explosion. IEEE CSMR-WCRE 2014 Software Evolution Week 5

if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } IEEE CSMR-WCRE 2014 Software Evolution Week Clone #1 Clone #2 NOT APPROVED 6

else if (orientation == HORIZONTAL) { } if (orientation == VERTICAL) { } if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } IEEE CSMR-WCRE 2014 Software Evolution Week Clone #1 Clone #2 7

if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } if (orientation == VERTICAL) { Line2D line = new Line2D.Double(); double y0 = dataArea.getMinY(); double y1 = dataArea.getMaxY(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(start2d, y0, start2d, y1); g2.draw(line); } if (range.contains(end)) { line.setLine(end2d, y0, end2d, y1); g2.draw(line); } else if (orientation == HORIZONTAL) { Line2D line = new Line2D.Double(); double x0 = dataArea.getMinX(); double x1 = dataArea.getMaxX(); g2.setPaint(im.getOutlinePaint()); g2.setStroke(im.getOutlineStroke()); if (range.contains(start)) { line.setLine(x0, start2d, x1, start2d); g2.draw(line); } if (range.contains(end)) { line.setLine(x0, end2d, x1, end2d); g2.draw(line); } IEEE CSMR-WCRE 2014 Software Evolution Week Clone #1 Clone #2 APPROVED 8

Minimizing differences Minimizing the differences during the matching process is critical for refactoring. Why? – Less differences means less parameters for the extracted method (i.e., a more reusable method). – Less differences means also lower probability for precondition violations (i.e., higher refactoring feasibility) Matching process objectives: – Maximize the number of matched statements – Minimize the number of differences between them IEEE CSMR-WCRE 2014 Software Evolution Week 9

Limitation #3 There are no preconditions to determine whether clones can be safely refactored. – The parameterization of differences might change the behavior of the program. – Statements in gaps need to be moved before the cloned code. Changing the order of statements might also affect the behavior of the program. IEEE CSMR-WCRE 2014 Software Evolution Week 10

Our goal Improve the state-of-the-art in the Refactoring of Software Clones: Given two code fragments containing clones; Find potential control structures that can be refactored. Find an optimal mapping between the statements of two clones. Make sure that the refactoring of the clones will preserve program behavior. Find the most appropriate refactoring strategy to eliminate the clones. IEEE CSMR-WCRE 2014 Software Evolution Week 11

isomorphic CDT pairs Our approach IEEE CSMR-WCRE 2014 Software Evolution Week 12 Control Structure Matching PDG Mapping differences unmapped statements Precondition Examination

Phase 1: Control Structure Matching Intuition: two pieces of code can be merged only if they have an identical control structure. We extract the Control Dependence Trees (CDTs) representing the control structure of the input methods or clones. We find all non-overlapping largest common subtrees within the CDTs. Each subtree match will be treated as a separate refactoring opportunity. IEEE CSMR-WCRE 2014 Software Evolution Week 13

CDT Subtree Matching IEEE CSMR-WCRE 2014 Software Evolution Week C A B EDGF c a b gfed x y CDT of Fragment #1CDT of Fragment #2 14

Phase 2: PDG Mapping We extract the PDG subgraphs corresponding to the matched CDT subtrees. We want to find the common subgraph that satisfies two conditions: – It has the maximum number of matched nodes – The matched nodes have the minimum number of differences. This is an optimization problem that can be solved using an adaptation of a Maximum Common Subgraph algorithm [McGregor, 1982]. IEEE CSMR-WCRE 2014 Software Evolution Week 15

MCS Algorithm Builds a search tree in depth-first order, where each node represents a state of the search space. Explores the entire search space. It has an exponential worst case complexity. As the number of possible matching node combinations increases, the width of the search tree grows rapidly (combinatorial explosion). IEEE CSMR-WCRE 2014 Software Evolution Week 16

Divide-and-Conquer We break the original matching problem into smaller sub-problems based on the control dependence structure of the clones. Finally, we combine the sub-solutions to give a global solution to the original matching problem. IEEE CSMR-WCRE 2014 Software Evolution Week 17

Bottom-up Divide-and-Conquer IEEE CSMR-WCRE 2014 Software Evolution Week C A B EDGF c a b gfed Level 2 CDT subtree of Clone #1CDT subtree of Clone #2 Dd 18

Bottom-up Divide-and-Conquer IEEE CSMR-WCRE 2014 Software Evolution Week C A B EGF c a b gfe Level 2 CDT subtree of Clone #1CDT subtree of Clone #2 Ee 19

Phase 3: Precondition examination Preconditions related to clone differences: – Parameterization of differences should not break existing data dependences in the PDGs. – Reordering of unmapped statements should not break existing data dependences in the PDGs. Preconditions related to method extraction: – The unified code should return one variable at most. – Matched branching (break, continue) statements should be accompanied with the corresponding matched loops in the unified code. IEEE CSMR-WCRE 2014 Software Evolution Week 20

Evaluation We compared our approach with a state-of- the-art tool in the refactoring of Type-II clones, CeDAR [Tairas & Gray, IST’12] clone groups, detected in 7 open-source projects by Deckard clone detection tool. CeDAR is able to analyze only clone groups in which all clones belong to the same Java file. IEEE CSMR-WCRE 2014 Software Evolution Week 21

Clone groups within the same Java file IEEE CSMR-WCRE 2014 Software Evolution Week 22 Project Clone groups EclipseCeDARJDeodorant  Ant %2823%5042%+79% Columba %3034%4147%+37% EMF %149%5436%+286% JMeter %1116%2029%+82% JEdit %2013%5736%+185% JFreeChart %6221%8730%+40% JRuby %2328%3543%+52% Total % % %+83%

Clone groups within different Java files IEEE CSMR-WCRE 2014 Software Evolution Week 23 Project Clone groups JDeodorant Ant % Columba % EMF % JMeter % JEdit % JFreeChart % JRuby % Total %

Conclusions Our approach was able to refactor 83% more clone groups than CeDAR. Our approach assessed as refactorable 27% of the clones groups, in which clones are placed in different files. The study revealed that 36% of the clone groups can be refactored directly or in the form of sub-clones. IEEE CSMR-WCRE 2014 Software Evolution Week 24

IEEE CSMR-WCRE 2014 Software Evolution Week 25