Download presentation
Presentation is loading. Please wait.
Published bySuzan Foster Modified over 8 years ago
1
An Algorithm for Detecting and Removing Clones in Java Code Nicolas Juillerat & Béat Hirsbrunner Pervasive and Artificial Intelligence ( ) Research Group, University of Fribourg, Switzerland
2
10/1/2016ICGT 2006 > SeTra 20062 Overview 1. Introduction 1.1 What are Clones ? 1.2 Focus of this presentation 2. Algorithm overview 3. Clone detection 4. Clone removal 4.1 Splitting expression and control statements 4.2 Multiple outgoing data flow Example Existing solutions Proposed solution 4.3 Multiple outgoing control flows Example Existing solutions Proposed solution 5. Conclusion 5.1. Current state 5.2. Future works
3
10/1/2016ICGT 2006 > SeTra 20063 1.Introduction > Introduction
4
10/1/2016ICGT 2006 > SeTra 20064 1.1 What are Clones ? Clones = Code duplications Causes: – Long running projects – Constantly changing specifications – Stress of the deadline – Lack of architectural knowledge Consequences – Code is difficult to maintain and debug – Code has poor structure – Same bugs are spread at multiple places > Introduction > What are Clones ?
5
10/1/2016ICGT 2006 > SeTra 20065 1.2. Focus of this Presentation Present an algorithm to – Detect clones in Java code – Remove the detected clones Existing algorithm’s targets: – Recovering some structure in huge legacy code Reducing code duplication as much as possible is the main concern The presented algorithm’s targets: – Improve the structure and clarity of existing code Focus on medium and small code snippets Clarity of the resulting code is the main concern Use as a refactoring tool extraction – Also handle clone extraction in Java (many tools are restricted to clone detection) > Introduction > Focus of this Presentation
6
10/1/2016ICGT 2006 > SeTra 20066 1.3. Example save(get(x * 3) - (x + 6) - 4); print(get(x * 3) - (x * 5) + 4); save(expr1(x) - (x + 6) - 4); print(expr1(x) - (x * 5) + 4); int expr1(int x) { return get(x * 3); } > Introduction > Example
7
10/1/2016ICGT 2006 > SeTra 20067 Clone detection – Parse the source code into an Abstract Syntax Tree (AST) – Convert the AST into a token list using post-order walk – Detect duplications using LZ77 hashing algorithm Clone removal – Apply various constraints on the result Splitting Resolve multiple outgoing data flows Resolve multiple outgoing control flows Apply a threshold – Extract the clones (extract method refactoring) – Rewrite the source 2. Algorithm Overview > Algorithm Overview Main contribution
8
10/1/2016ICGT 2006 > SeTra 20068 > Algorithm Overview > Constraints SplittingMethod cannot cross a block’s boundaries (future works)No local type declaration dependency Resolve multiple outgoing control flowsAt most one exit point Resolve multiple outgoing data flowsAt most one return value Constraints in the algorithmPreconditions of extract method refactoring 2. Algorithm Overview Constraints applied in the algorithm resolve the preconditions of the « extract method » refactoring
9
10/1/2016ICGT 2006 > SeTra 20069 3. Clone Detection Based on AST, post-order traversal and LZ77 > Clone Detection save(get(x * 3) - (x + 6) - 4); print(get(x * 3) - (x * 5) + 4); blo ck pri nt + -4 getget * 3x * 5x sa ve getget - -4 * 3x + 6x Source code Abstract Syntax Tree (AST) Parsin g
10
10/1/2016ICGT 2006 > SeTra 200610 3. Clone Detection Based on AST, post-order traversal and LZ77 > Clone Detection blo ck pri nt + -4 getget * 3x * 5x sa ve getget - -4 * 3x + 6x [x, 3, *, get, x, 6, +, -, 4, -, save, x, 3, *, get, x, 5, *, -, 4, +, print, block] Abstract Syntax Tree Token list Post-order traversal
11
10/1/2016ICGT 2006 > SeTra 200611 3. Clone Detection Based on AST, post-order traversal and LZ77 > Clone Detection [x, 3, *, get, x, 6, +, -, 4, -, save, x, 3, *, get, x, 5, *, -, 4, +, print, block] Clone list [x, 3, *, get, x, 6, +, -, 4, -, save, x, 3, *, get, x, 5, *, -, 4, +, print, block] Token list LZ7 7
12
10/1/2016ICGT 2006 > SeTra 200612 4. Clone Removal Application of various constraints on the token sublists corresponding to clones – Splitting – Resolving multiple ougoing data flows – Resolving multiple outgoing control flows – Thresholding (to skip too small clones) > Clone Removal [x, 3, *, get, x, 6, +, -, 4, -, save, x, 3, *, get, x, 5, *, -, 4, +, print, block]
13
10/1/2016ICGT 2006 > SeTra 200613 4.1 Splitting > Clone Removal > Splitting save ge t - -4 * 3x + 6x [x, 3, *, get, x] [x, 3, *, get], [x] save(get(x * 3) - (x + 6) - 4); print(get(x * 3) - (x * 5) + 4); save(get(x * 3) - (x + 6) - 4); print(get(x * 3) - (x * 5) + 4); save(expr1(x) - (x + 6) - 4); print(expr1(x) - (x * 5) + 4); Extract method Splittin g
14
10/1/2016ICGT 2006 > SeTra 200614 4.1 Splitting > Clone Removal > Splitting Two purposes – Split tokens that are joined by the post-order tree traversal (previous slide) – Split tokens where they cross the boundary of a block that is partially covered If a clone covers the begin of a loop, we split it into – A first clone before the loop – A second clone in the begin of the loop body
15
10/1/2016ICGT 2006 > SeTra 200615 4.2 Multiple Outgoing Data Flows > Clone Removal > Multiple Outgoing Data Flows
16
10/1/2016ICGT 2006 > SeTra 200616 4.2.1 Example min = x - getWindowSize() / 2; max = x + getWindowSize() / 2; middle = x; doStuff(min, middle, max); > Clone Removal > Multiple Outgoing Data Flows > Example The clone In languages that allow passing arguments by reference no problem getMinMaxAndMiddle(ref min, ref max, ref middle); doStuff(min, middle, max); And in Java ? Example (C#):
17
10/1/2016ICGT 2006 > SeTra 200617 4.2.2 Existing Solutions Existing tools for Java can – Return an array – Encapsulate each argument in an object – Encapsulate all arguments in an object Working solutions, but – For small clones, result can be bigger – The clarity of the code decreases Not good for a refactoring tool > Clone Removal > Multiple Outgoing Data Flows > Existing Solutions
18
10/1/2016ICGT 2006 > SeTra 200618 4.2.3 Proposed Solution Split the clone into multiple, smaller clones Each clone has at most one outgoing data flow > Clone Removal > Multiple Outgoing Data Flows > Proposed Solution min = x - getWindowSize() / 2; max = x + getWindowSize() / 2; middle = x; doStuff(min, middle, max); min = getMin(x); max = getMax(x); middle = x; doStuff(min, middle, max); min = x - getWindowSize() / 2; max = x + getWindowSize() / 2; middle = x; doStuff(min, middle, max);
19
10/1/2016ICGT 2006 > SeTra 200619 4.3 Multiple Outgoing Control Flows > Clone Removal > Multiple Outgoing Control Flows
20
10/1/2016ICGT 2006 > SeTra 200620 4.3.1 Example > Clone Removal > Multiple Outgoing Control Flows > Example int result = 0; while (true) { applyTransform(code); if (test() || code.hasErrors()) { result = -1; break; } verify(code); } The clone Problem: execution can leave the clone on two different « paths »: Normally (if the condition is false) Through the break statement (if the condition is true)
21
10/1/2016ICGT 2006 > SeTra 200621 4.3.2 Existing Solutions leaving path Return a variable holding the « leaving path » > Clone Removal > Multiple Outgoing Control Flows > Existing Solutions int result = 0; while (true) { applyTransform(code); path int path; path extracted(stable, ref result, ref path) path if (path == BREAK) break; } In C#:
22
10/1/2016ICGT 2006 > SeTra 200622 4.3.2 Existing Solutions Return a variable holding the « leaving path » – Only works in Java if no other variables has to be returned – Can get quite heavy if there are both break and return statements > Clone Removal > Multiple Outgoing Control Flows > Existing Solutions
23
10/1/2016ICGT 2006 > SeTra 200623 4.3.3 Proposed Solution Split the clone on break and return statements – Other splitting may follow because of other constraints > Clone Removal > Multiple Outgoing Control Flows > Proposed Solution int result = 0; while (true) { applyTransform(code); if (test() || code.hasErrors()) { result = -1; break; } verify(code); }
24
10/1/2016ICGT 2006 > SeTra 200624 4.3.3 Proposed Solution Split the clone on break and return statements – Other splitting may follow because of other constraints > Clone Removal > Multiple Outgoing Control Flows > Proposed Solution int result = 0; while (true) { applyTransform(code); if (test() || code.hasErrors()) { result = -1; break; } verify(code); }
25
10/1/2016ICGT 2006 > SeTra 200625 The proposed algorithm, compared to existing solutions – Is less effective in reducing duplications – Is more effective in improving the clarity of the code – Ensures a result close to the original (transformations are done only through the « extract method » refactoring) > Conclusion Split the cloneCode promotion, assignment duplication, etcPartial block covering Split the cloneUse an additional return variable & additional codeMultiple outgoing control flows Split the cloneUse arrays or new objectsMultiple outgoing data flows ProposedExisting 5. Conclusion
26
10/1/2016ICGT 2006 > SeTra 200626 5.1 Current State C# Implementation attempt using the.NET CodeDom Abandonned (parsing was lacking, as well as various language constructs) Java Implementation in progress – Based on the Eclipse JDT Already includes parser and rewriter – Still in early stage of development > Conclusion > Current State
27
10/1/2016ICGT 2006 > SeTra 200627 5.2 Future Works Finish implementation Do practical validation of the results using code metrics Include other complementary works – Detection of clones with renamed variable – Detection of clones with swapped statements Apply the ideas of the algorithm to other transformations – Forming a template method Include user interaction and user interface > Conclusion > Future Works
28
10/1/2016ICGT 2006 > SeTra 200628 Questions ? Thank you for your attention ! > Conclusion > Questions
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.