Download presentation
Presentation is loading. Please wait.
Published byAnthony Scott Modified over 9 years ago
1
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring
2
Reverse Engineering © SERG Introduction Maintenance is the most costly phase of the software lifecycle Code Clone: gratuitous copy of source code in a program. Code Clones have the effect of increasing source code size and duplication of errors.
3
Reverse Engineering © SERG Definitions Clone Class/Set: Set of equivalent Clones Precision: Percent of reported clones that are genuine Recall: Percent of genuine clones that are reported
4
Reverse Engineering © SERG Detection String Matching – Represents and evaluates code using string comparisons. Token Parsing – Code transformation into tokens for comparison. Graph Matching – Pattern matching on graph representations of code.
5
Reverse Engineering © SERG Detection Strategies
6
Reverse Engineering © SERG String Matching Techniques Exact String Matching Parameterized Matching Substring Matching
7
Reverse Engineering © SERG Parameterized Matching Employs exact string matching for comparison. 1.Normalization 2.Concatenation 3.Hashing 4.Extract longest matches
8
Reverse Engineering © SERG Matching Algorithm No algorithm can avoid worst case running time of O(n 2 ). Using a suffix tree we can improve running time complexity to O(n+m). Where m is the number of matches. The input size n, is reduced by hashing.
9
Reverse Engineering © SERG Suffix Tree Example Images Copyright (c) 1996-1998, Mark Nelson, All Rights Reserved. Suffix Trie Suffix Tree
10
Reverse Engineering © SERG Substring Matching Substring Matching provides a faster search algorithm. 1.Normalization 2.Substring Generation 3.Matching 4.Consolidation 5.Reporting
11
Reverse Engineering © SERG Caveat Exact string matching does not find clones with trivial alterations that don’t change semantics. Normalization has the risk of false positives. x+y=z; != z+x=y; -> p+p=p for(i=0; i for(p=p; p<p; p++)
12
Reverse Engineering © SERG Token Parsing Techniques Transforms code into tokens by using language specific constructs into a single token string Find similarities within this token string Transform token clones back into code clones for presentation
13
Reverse Engineering © SERG Token Parsing Example int main(){ int i = 0; static int j=5; while(i<20){ i=i+j; } std::cout<<"Hello World"<<i<<std::endl; return 0; } Remove white spaces
14
Reverse Engineering © SERG Token Parsing Example int main(){ int i = 0; static int j=5; while(i<20){ i=i+j; } std::cout<<"Hello World"<<i<<std::endl; return 0; } Shorten Names
15
Reverse Engineering © SERG Token Parsing Example int main (){ int i = 0; int j = 5; while (i < 20){ i = i + j; } cout << "Hello World” << i << endl; return 0; } Tokenize everything, except language constructs
16
Reverse Engineering © SERG Token Parsing Example $p $p(){ $p $p = $p; while($p < $p ){ $p = $p + $p; } $p << $p << $p << $p;; return $p; } Clone relations with all the transformation rules are compared to clone relations with a subset of the transformation rules
17
Reverse Engineering © SERG CCFinder:DEMO
18
Reverse Engineering © SERG Graph Matching Techniques Form machine representation of code Identify clones as identical subgraphs
19
Reverse Engineering © SERG Abstract Syntax Subtree Matching source_file_name = ((SourceFile) attributes[i]).getSourceFileName();
20
Reverse Engineering © SERG Hash subgraphs Identify maximal identical or similar subgraphs Identify sequences of subgraphs Abstract Syntax Subtree Matching
21
Reverse Engineering © SERG Program Dependence Graph Matching
22
Reverse Engineering © SERG Vertices are lines of code Edges are attributed with different types of dependences (control flow, data, etc) NP complete in general, k-cutoff in maximal graph size used to limit runtime –Experiments determine k=20 best O(|V| 2 ) possible graph starting points, reduced via heuristic Program Dependence Graph Matching
23
Reverse Engineering © SERG Two Clones Found by fg-PDG
24
Reverse Engineering © SERG Metrics? Need to evaluate different clone detection techniques Hard to know the real number of clones in a non-trivial application How to compare different types of clones?
25
Reverse Engineering © SERG Basic Metrics LOC: Line number count SLOC: Line number count after the removal of blanks %LOC: Percent of lines with clones in them %FILE: Percent of files with clones in them
26
Reverse Engineering © SERG Interesting Metrics: Radius A BC
27
Reverse Engineering © SERG Interesting Metrics: Radius RAD(B,C)= 1 A BC
28
Reverse Engineering © SERG Interesting Metrics: Radius A BC RAD(A,C)= 2
29
Reverse Engineering © SERG Interesting Metrics: DFL Estimates improvement from clone removal Assume that every clone class can be replaced by a function Assume that every clone is replaceable by a call to that function
30
Reverse Engineering © SERG Comparison of Clone Detectors CCFinder Token (1128) CloneDr AST (84) Cavet Metric (278) Jplag Token (131) Moss Unknown (120) CCFinder1090/381089/27989/871025/101 CloneDr43265/13120/11111/9 Cavel25170120/15109/10 Jplag447327367/50 Moss197626881
31
Reverse Engineering © SERG Comparison of Clone Detectors FrequencyCCFinderCloneDrCavetJPlagMoss 1569664095104 298634108 33321340 4140610 5160500 6190500 720100 In addition Cavet found clones with frequencies: 8,12, and 13
32
Reverse Engineering © SERG Comparison of Clone Detectors CCFinderCloneDrCavetJPlagMoss Recall729191210 Precision72100638273 Different code clone detectors find different clones String based find direct clones Token based find polymorphism issues and may be difficult to fix Graph based find clones that can be automatically refactored
33
Reverse Engineering © SERG Code Clone Refactoring Use standard Refactoring methods –“Extract” - Make a procedure –“Pull Up” - Make an superclass Aspect Oriented Programming –Advanced technique for clones that are too tough for procedural or OO solutions
34
Reverse Engineering © SERG Extract
35
Reverse Engineering © SERG Pull Up
36
Reverse Engineering © SERG Aspect Oriented Programming Observation: Certain types of peripheral code permeate application logic –Logging, security, parameter checking, etc. How to deal with similar but scattered code? –Remove similar code into Aspects –Join Aspects into application logic using “join points”
37
Reverse Engineering © SERG Aspect Oriented Refactoring Candidates
38
Reverse Engineering © SERG Ophir – Aspect Mining Tool
39
Reverse Engineering © SERG Conclusion Alternative techniques detect different clone types. Code clone detection and removal is still in its infancy. Code clone refactoring tools should be incorporated into standard IDEs to achieve widespread adoption.
40
Reverse Engineering © SERG Future Research Detector improvement –Improve precision and refactorability of token based detectors –Performance of graph based detectors –AI-based detectors? Comparison of detection methods
41
Reverse Engineering © SERG Future Research Continue study of code clones –Effects of Agile methodologies –Impact on Web-based code Visualization of clones Automated refactoring systems
42
Reverse Engineering © SERG Questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.