Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mining and Analysis of Control Structure Variant Clones Guo Qiao.

Similar presentations

Presentation on theme: "Mining and Analysis of Control Structure Variant Clones Guo Qiao."— Presentation transcript:

1 Mining and Analysis of Control Structure Variant Clones Guo Qiao

2 Outline Clones and Control Structure Variant Clones Research Motivation Approach for mining control structure variant clones Evaluation of precision and recall Case study of control structure variant clones Refactorability evaluation 2

3 Clones are common in software systems. The percentage of clones in systems varied from 6.5% to 59.5%, average proportion is 14.6%. (Chen et al. @2014) Code duplication (Software Clone) 3

4 Clones are harmful Identified as the worst code smell (Rahman @2010) Indication of poor software maintainability (Mondal @2011) Cause system design quality degrade Why clone is a problem? Clone refactoring can eliminate bad effects. 4

5 Type-1: Identical code fragments except for variations in whitespace, layout and comments. (Clear) Type-2: Syntactically identical fragments except for variations in identifiers, literals, types, whitespace, layout and comments. (Clear) Type-3: Copied fragments with further modifications such as changed, added or removed statements, in addition Type-1 variation. Type-4: Two or more code fragments that perform the same computation but are implemented by different syntax text. Clone Categorization Most widely accepted definition is from Roy @2009 5

6 Type-4 clones can be divided into subcategories. Dispute about Type-4 Clones Type-4 clones are syntactically different semantic clones and still undecidable. Type-4 clones are behaviorally similar code fragments regarding to their input/output. 6

7 Definition Control structure variant clones (CSVC) are clones use different control structures to implement the same functionality. Control Structure Variant Clone? 7

8 From the perspective of clone refactoring, a different strategy is required to refactor Control Structure variant clones. Extract common code fragment Analysis of code functionality Motivation 8

9 9

10 Propose an approach to mine control structure variant clones accurately. The mining process should take into account: 1.Control structure matching 2.Functional similarity evaluation Goal 10

11 Overall Approach

12 Code example Control Dependency Tree Phase 1: Control Structure Matching 12

13 Loop variants Enhanced for loop Iterator-based for or while loop Index-based for or while loop Do-while loop Conditional variants If-else statement Conditional expression (Ternary operator ?: ) Switch statement Common Control Structures in Java 13

14 Loop Variable: Start index End index Step We consider two loops L1 and L2 as functionally equivalent, if they have the same loop variable value. Unified Representation of Loops 14

15 Control Structure Equivalents 15

16 Start index Control Structure Matching 16

17 End index Control Structure Matching 17

18 Update Step Control Structure Matching 18

19 Conditional Variant Equivalents 19

20 Java Binding: unique string representing a variable, object type, or method invocation. IBinding: IMethodBinding ITypeBinding IVariableBinding (Excluded) Phase 2: Function Similarity Evaluation 20

21 IMethodBinding represents method signatures. ITypeBinding represents the Java types. Binding Information 21

22 1. All Collection subtypes are generalized to java.util.Collection. Post-processing of Bindings 22

23 2. Ignore the binding keys of the methods which access the next element. Post-processing of Bindings 23

24 Jaccard Similarity Coefficient Specify the threshold Φ Quantify Functional Similarity 24

25 Study Setup Select projects. Select clone detection tool. Investigation of the results. Evaluation 25

26 6 open-source systems from different domain, vary in size and history. Selection of Projects 26

27 Three criteria for tool selection: 1.Able to detect clones with control structure variations. 2.Available for download. 3.Take a reasonable time to detect clones. Tried five different clone detection tools: CCFinder –Not able to find semantic clone JSCtracker –Not able to finish detection process NiCad–Returns abnormal clone groups Deckard—Not able to finish detection Sebyte works well for our experiment Selection of Detection tool 27

28 Trade off between precision and recall Identify 285 true positives (TP), 475 false positives (FP) Best Threshold 28

29 Threshold value 0.5 achieved a performance score of 0.64 (precision), and 0.91 (recall) Best Threshold 29

30 Average 8.8 milliseconds for each clone pair Execution Time 30

31 Q1 : Which variation is most frequently occurring? Q2 : Does the evolution of a programming language affect the introduction of control structure variant clones? Case Study 31

32 6 different loops, make 15 combinations, 7 of them have instances Case Study 32

33 Fact: The largest category is Enhanced for loop VS Iterator-based while loop, which has 109 instances. Answer to Q1: Enhanced for loop and Iterator-based while loop appear most often Case Study 33

34 Fact: Enhanced for loop is involved in all top 3 categories, they have 209 clone pairs, account for 73% Answer to Q2: Enhanced for loop introduced in Java 5, significantly affects the introduction of control structure variant clone. Case Study 34

35 State-of-the-art refactoring tool--JDeodorant Clone Refactoring Evaluation 35

36 Initialization of arrays from collections Variations Hindering Refactoring 36 Clone 1 Clone 2

37 Temporary variables Variations Hindering Refactoring 37 Clone 1 Clone 2

38 Exchange of method invocation expressions Variations Hindering Refactoring 38 Clone 1 Clone 2 A B BA

39 Alternative branching statements Variations Hindering Refactoring 39 Clone 1 Clone 2

40 Conclusion Control structure variant clones do exist in systems They are introduced because the language evolves, e.g., the new feature Enhanced For 42% of the clones we found are refactorable 40

41 Improve the approach to convert one data structure to another to refactor an additional 19% of the control structure variant clones. Future Work 41 Develop code to unify different control structures and perform the refactoring.

42 Thanks! 42 Visit our Benchmark of Control structure variant clones at

Download ppt "Mining and Analysis of Control Structure Variant Clones Guo Qiao."

Similar presentations

Ads by Google