Mining and Analysis of Control Structure Variant Clones Guo Qiao.

Slides:



Advertisements
Similar presentations
Duplicate code detection using Clone Digger Peter Bulychev Lomonosov Moscow State University CS department.
Advertisements

Unification and Refactoring of Clones Giri Panamoottil Krishnan and Nikolaos Tsantalis Department of Computer Science & Software Engineering Clone images.
ANTLR in SSP Xingzhong Xu Hong Man Aug Outline ANTLR Abstract Syntax Tree Code Equivalence (Code Re-hosting) Future Work.
Control Structures Any mechanism that departs from straight-line execution: –Selection: if-statements –Multiway-selection: case statements –Unbounded iteration:
Ranking Refactoring Suggestions based on Historical Volatility Nikolaos Tsantalis Alexander Chatzigeorgiou University of Macedonia Thessaloniki, Greece.
Clean code. Motivation Total cost = the cost of developing + maintenance cost Maintenance cost = cost of understanding + cost of changes + cost of testing.
Introduction to Programming Lesson 1. Objectives Skills/ConceptsMTA Exam Objectives Understanding Computer Programming Understand computer storage and.
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Improving the Unification of Software Clones Using Tree & Graph Matching Algorithms Giri Panamoottil Krishnan Supervisor: Dr. Nikolaos Tsantalis
CS 330 Programming Languages 10 / 16 / 2008 Instructor: Michael Eckmann.
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
Chapter 2: Algorithm Discovery and Design
Information Technology Center Hany Abdelwahab Computer Specialist.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
Investigating the Evolution of Bad Smells in Object-Oriented Code Alexander Chatzigeorgiou Anastasios Manakos University of Macedonia Thessaloniki, Greece.
Chapter 2: Algorithm Discovery and Design
Chapter 2: Algorithm Discovery and Design
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Industrial Application.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Language Evaluation Criteria
1 CSC 221: Computer Programming I Fall 2004 course overview  what did we set out to learn?  what did you actually learn?  where do you go from here?
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Imperative Programming
Detecting software clones in binaries Zaharije Radivojević, Saša Stojanović, Miloš Cvetanović School of Electrical Engineering, Belgrade University 14th.
CD in Natural Language Software Artifacts 1 Clone Detection in Natural Language Software Artifacts: Techniques and Applications Elmar Juergens November.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University A Criterion for.
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science, C++ Version, Third Edition.
Change Impact Analysis for AspectJ Programs Sai Zhang, Zhongxian Gu, Yu Lin and Jianjun Zhao Shanghai Jiao Tong University.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages.
Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10 th August 2011.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
111 Protocols CS 4311 Wirfs Brock et al., Designing Object-Oriented Software, Prentice Hall, (Chapter 8) Meyer, B., Applying design by contract,
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Introduction to Java Java Translation Program Structure
Duplicate code detection using anti-unification Peter Bulychev Moscow State University Marius Minea Institute eAustria, Timisoara.
1 CSC 221: Computer Programming I Spring 2008 course overview  What did we set out to learn?  What did you actually learn?  Where do you go from here?
Automatically detecting and describing high level actions within methods Presented by: Gayani Samaraweera.
Cross Language Clone Analysis Team 2 February 3, 2011.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 COS 260 DAY 14 Tony Gauvin. 2 Agenda Questions? 6 th Mini quiz graded  Oct 29 –Chapter 6 Assignment 4 will be posted later Today –First two problems.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
CIS 540 Principles of Embedded Computation Spring Instructor: Rajeev Alur
Flow of Control Joe McCarthy CSS 161: Fundamentals of Computing1.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Chapter 2: Algorithm Discovery and Design Invitation to Computer Science.
Chapter 1: Preliminaries Lecture # 2. Chapter 1: Preliminaries Reasons for Studying Concepts of Programming Languages Programming Domains Language Evaluation.
XP Tutorial 10New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties.
Introduction to Programming Lesson 1. Algorithms Algorithm refers to a method for solving problems. Common techniques for representing an algorithms:
Programming Languages Concepts Chapter 1: Programming Languages Concepts Lecture # 4.
Designing classes How to write classes in a way that they are easily understandable, maintainable and reusable 6.0.
Test Case Purification for Improving Fault Localization presented by Taehoon Kwak SoftWare Testing & Verification Group Jifeng Xuan, Martin Monperrus [FSE’14]
STATIC CODE ANALYSIS. OUTLINE  INTRODUCTION  BACKGROUND o REGULAR EXPRESSIONS o SYNTAX TREES o CONTROL FLOW GRAPHS  TOOLS AND THEIR WORKING  ERROR.
Erasmus University Rotterdam
A Refactoring Technique for Large Groups of Software Clones
Expressions and Control Flow in JavaScript
○Yuichi Semura1, Norihiro Yoshida2, Eunjong Choi3, Katsuro Inoue1
: Clone Refactoring Davood Mazinanian Nikolaos Tsantalis Raphael Stein
CSc4730/6730 Scientific Visualization
Individual Research Presentation
Assessing the Refactorability of Software Clones
Precise Condition Synthesis for Program Repair
Automatically Diagnosing and Repairing Error Handling Bugs in C
Arrays.
Presentation transcript:

Mining and Analysis of Control Structure Variant Clones Guo Qiao

Outline Clones and Control Structure Variant Clones Research Motivation Approach for mining control structure variant clones Evaluation of precision and recall Case study of control structure variant clones Refactorability evaluation 2

Clones are common in software systems. The percentage of clones in systems varied from 6.5% to 59.5%, average proportion is 14.6%. (Chen et Code duplication (Software Clone) 3

Clones are harmful Identified as the worst code smell Indication of poor software maintainability Cause system design quality degrade Why clone is a problem? Clone refactoring can eliminate bad effects. 4

Type-1: Identical code fragments except for variations in whitespace, layout and comments. (Clear) Type-2: Syntactically identical fragments except for variations in identifiers, literals, types, whitespace, layout and comments. (Clear) Type-3: Copied fragments with further modifications such as changed, added or removed statements, in addition Type-1 variation. Type-4: Two or more code fragments that perform the same computation but are implemented by different syntax text. Clone Categorization Most widely accepted definition is from 5

Type-4 clones can be divided into subcategories. Dispute about Type-4 Clones Type-4 clones are syntactically different semantic clones and still undecidable. Type-4 clones are behaviorally similar code fragments regarding to their input/output. 6

Definition Control structure variant clones (CSVC) are clones use different control structures to implement the same functionality. Control Structure Variant Clone? 7

From the perspective of clone refactoring, a different strategy is required to refactor Control Structure variant clones. Extract common code fragment Analysis of code functionality Motivation 8

9

Propose an approach to mine control structure variant clones accurately. The mining process should take into account: 1.Control structure matching 2.Functional similarity evaluation Goal 10

Overall Approach

Code example Control Dependency Tree Phase 1: Control Structure Matching 12

Loop variants Enhanced for loop Iterator-based for or while loop Index-based for or while loop Do-while loop Conditional variants If-else statement Conditional expression (Ternary operator ?: ) Switch statement Common Control Structures in Java 13

Loop Variable: Start index End index Step We consider two loops L1 and L2 as functionally equivalent, if they have the same loop variable value. Unified Representation of Loops 14

Control Structure Equivalents 15

Start index Control Structure Matching 16

End index Control Structure Matching 17

Update Step Control Structure Matching 18

Conditional Variant Equivalents 19

Java Binding: unique string representing a variable, object type, or method invocation. IBinding: IMethodBinding ITypeBinding IVariableBinding (Excluded) Phase 2: Function Similarity Evaluation 20

IMethodBinding represents method signatures. ITypeBinding represents the Java types. Binding Information 21

1. All Collection subtypes are generalized to java.util.Collection. Post-processing of Bindings 22

2. Ignore the binding keys of the methods which access the next element. Post-processing of Bindings 23

Jaccard Similarity Coefficient Specify the threshold Φ Quantify Functional Similarity 24

Study Setup Select projects. Select clone detection tool. Investigation of the results. Evaluation 25

6 open-source systems from different domain, vary in size and history. Selection of Projects 26

Three criteria for tool selection: 1.Able to detect clones with control structure variations. 2.Available for download. 3.Take a reasonable time to detect clones. Tried five different clone detection tools: CCFinder –Not able to find semantic clone JSCtracker –Not able to finish detection process NiCad–Returns abnormal clone groups Deckard—Not able to finish detection Sebyte works well for our experiment Selection of Detection tool 27

Trade off between precision and recall Identify 285 true positives (TP), 475 false positives (FP) Best Threshold 28

Threshold value 0.5 achieved a performance score of 0.64 (precision), and 0.91 (recall) Best Threshold 29

Average 8.8 milliseconds for each clone pair Execution Time 30

Q1 : Which variation is most frequently occurring? Q2 : Does the evolution of a programming language affect the introduction of control structure variant clones? Case Study 31

6 different loops, make 15 combinations, 7 of them have instances Case Study 32

Fact: The largest category is Enhanced for loop VS Iterator-based while loop, which has 109 instances. Answer to Q1: Enhanced for loop and Iterator-based while loop appear most often Case Study 33

Fact: Enhanced for loop is involved in all top 3 categories, they have 209 clone pairs, account for 73% Answer to Q2: Enhanced for loop introduced in Java 5, significantly affects the introduction of control structure variant clone. Case Study 34

State-of-the-art refactoring tool--JDeodorant Clone Refactoring Evaluation 35

Initialization of arrays from collections Variations Hindering Refactoring 36 Clone 1 Clone 2

Temporary variables Variations Hindering Refactoring 37 Clone 1 Clone 2

Exchange of method invocation expressions Variations Hindering Refactoring 38 Clone 1 Clone 2 A B BA

Alternative branching statements Variations Hindering Refactoring 39 Clone 1 Clone 2

Conclusion Control structure variant clones do exist in systems They are introduced because the language evolves, e.g., the new feature Enhanced For 42% of the clones we found are refactorable 40

Improve the approach to convert one data structure to another to refactor an additional 19% of the control structure variant clones. Future Work 41 Develop code to unify different control structures and perform the refactoring.

Thanks! 42 Visit our Benchmark of Control structure variant clones at