Robert Tairas (INRIA & EMN) Ferosh Jacob (University of Alabama) Jeff Gray (University of Alabama) International Workshop on Software Clones (IWSC) – May 23, 2011 Representing Clones in a Localized Manner © AtlanMod | This material is based upon work supported by the National Science Foundation under Grant No. CCF
Problem: Clone Comprehension Clones in a group can be scattered – Within a large source file or in several files – A programmer must view clones in each location 2 Cloned Code
Representation within Source Editor From the refactoring perspective: – Eclipse refactoring requires multiple dialog boxes – Separation between editor and refactoring tasks – A solution: visualize refactoring changes directly in the source editor 3 Screen shot of Refactor! Pro † † Refactor! Pro, 2010
Localized Clone Representation Represent a clone group in a localized manner directly in the source editor – Parameterized differences visualized in representation 4 if (!delete(file)) { String message = "Unable to delete file " + file.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); } if (!delete(file)) { String message = "Unable to delete file " + file.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); } if (!delete(f)) { String message = "Unable to delete file " + f.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); } if (!delete(f)) { String message = "Unable to delete file " + f.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); } if (!delete(f)) { String message = "Unable to delete file " + f.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); } if (!delete(f)) { String message = "Unable to delete file " + f.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); } if (!delete(dir)) { String message = "Unable to delete directory " + dir.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); } if (!delete(dir)) { String message = "Unable to delete directory " + dir.getAbsolutePath(); if (failonerror) { throw new BuildException(message); } else { log(message, quiet ? Project.MSG_VERBOSE : Project.MSG_WARN); }
Displaying Clones in a Localized Manner 5 Original Source Code User Selects a Clone Group Clone Group Generate Suffix Tree Generate Suffix Tree Clone Differences Localized Representation Generate Representation Generate Representation Determine differences among the clones Differences based on first-level statement comparisons Determine differences among the clones Differences based on first-level statement comparisons Performed by third-party tool Clone Groups Clone Detection Localized representation is displayed after a user selects a clone group Representation “re-calculated” when a different clone group is selected Localized representation is displayed after a user selects a clone group Representation “re-calculated” when a different clone group is selected
Detecting Parameterized Elements A suffix tree is generated on the AST nodes representing the statements of a group of clones Elements in nodes containing allowed differences are mapped together 6 Stmt1 $ $ Clone 1 $ $ # # Stmt1 Stmt2 Stmt1 # # Clone 2 Stmt2 Excerpt of suffix tree file → dir file.getAbsolutePath()dir.getAbsolutePath() Parameterized elements mapped
Statement Similarity Levels Comparing two statements of two clones – Level 1: Corresponding nodes match each other exactly – Level 2: Corresponding nodes are identical, but can contain allowed parameterized differences i.e., MethodInvocation, NumberLiteral, QualifiedName, SimpleName, or StringLiteral – Level 3: Corresponding nodes are not identical, but both correspond to types from the Level 2 comparison e.g., a MethodInvocation is matched with a SimpleName. 7
Similarity Levels in Representation Statement 1 in Clone 1 – Exact matching nodes with Statement 1 in Clone 2 8 Clone 2
Similarity Levels in Representation Statement 2 in Clone 1 – Non-matching nodes with Statements 2 and 3 in Clone 2 9 Clone 2
Similarity Levels in Representation Statement 3 in Clone 1 – Parameterized identical and non-identical nodes with Statement 4 in Clone 2 10 Clone 2
Similarity Levels in Representation Statement 4 in Clone 1 – Non-matching nodes with Statement 5 in Clone 2 11 Clone 2
Example Representations Sub-groups of clones – Tighter similarities: Clones 1 and 4 vs. Clones 2 and 3 12 for (int i = 0; i < params.length; i++) { if (CONTAINS_KEY.equals(params[i].getType())) { contains.addElement(params[i].getValue()); } for (int i = 0; i < params.length; i++) { if (CONTAINS_KEY.equals(params[i].getType())) { contains.addElement(params[i].getValue()); } Clone 1 for (int i = 0; i < params.length; i++) { if (COMMENTS_KEY.equals(params[i].getType())) { comments.addElement(params[i].getValue()); } for (int i = 0; i < params.length; i++) { if (COMMENTS_KEY.equals(params[i].getType())) { comments.addElement(params[i].getValue()); } Clone 4 for (int i = 0; i < params.length; i++) { if (PREFIX_KEY.equals(params[i].getName())) { prefix = params[i].getValue(); break; } for (int i = 0; i < params.length; i++) { if (PREFIX_KEY.equals(params[i].getName())) { prefix = params[i].getValue(); break; } Clone 2 for (int i = 0; i < params.length; i++) { if (LINE_BREAKS_KEY.equals(params[i].getName())) { userDefinedLineBreaks = params[i].getValue(); break; } for (int i = 0; i < params.length; i++) { if (LINE_BREAKS_KEY.equals(params[i].getName())) { userDefinedLineBreaks = params[i].getValue(); break; } Clone 3
Clone Properties Based on Visualizations 13 Quick summary of clone differences in large clones Quick summary of neighboring clones Identifying an “oddball” clone
Evaluation: Fully Representing Clones Considers the number of clone groups (i.e., #CG) that can be appropriately represented – Evaluated on multiple open source Java projects 14 Project#CGExact (%)Param (%)StmtDiff (%)Mixed (%) Apache Ant (14%)152 (35%)131 (31%)85 (20%) ArgoUML (9%)214 (33%)124 (19%)251 (39%) Jakarta-JMeter (20%)158 (42%)71 (19%) JBoss AOP (32%)81 (51%)14 (9%)13 (8%) JFreeChart (18%)415 (49%)168 (20%)113 (13%) JRuby (36%)70 (22%)63 (20%)72 (23%) EMF (19%)136 (48%)52 (18%)42 (15%) JEdit (26%)120 (35%)88 (26%)46 (13%) Squirrel-SQL (18%)164 (38%)70 (16%)116 (27%)
Evaluation: Fully Representing Clones Exact Clones that match each other exactly 15 Project#CGExact (%)Param (%)StmtDiff (%)Mixed (%) Apache Ant (14%)152 (35%)131 (31%)85 (20%) ArgoUML (9%)214 (33%)124 (19%)251 (39%) Jakarta-JMeter (20%)158 (42%)71 (19%) JBoss AOP (32%)81 (51%)14 (9%)13 (8%) JFreeChart (18%)415 (49%)168 (20%)113 (13%) JRuby (36%)70 (22%)63 (20%)72 (23%) EMF (19%)136 (48%)52 (18%)42 (15%) JEdit (26%)120 (35%)88 (26%)46 (13%) Squirrel-SQL (18%)164 (38%)70 (16%)116 (27%)
Evaluation: Fully Representing Clones Param Groups with parameterized differences – Majority of the cases except ArgoUML and JRuby – Four cases almost half of the instances 16 Project#CGExact (%)Param (%)StmtDiff (%)Mixed (%) Apache Ant (14%)152 (35%)131 (31%)85 (20%) ArgoUML (9%)214 (33%)124 (19%)251 (39%) Jakarta-JMeter (20%)158 (42%)71 (19%) JBoss AOP (32%)81 (51%)14 (9%)13 (8%) JFreeChart (18%)415 (49%)168 (20%)113 (13%) JRuby (36%)70 (22%)63 (20%)72 (23%) EMF (19%)136 (48%)52 (18%)42 (15%) JEdit (26%)120 (35%)88 (26%)46 (13%) Squirrel-SQL (18%)164 (38%)70 (16%)116 (27%)
Evaluation: Fully Representing Clones StmtDiff Groups with statement differences Mixed Groups containing both Param and StmtDiff 17 Project#CGExact (%)Param (%)StmtDiff (%)Mixed (%) Apache Ant (14%)152 (35%)131 (31%)85 (20%) ArgoUML (9%)214 (33%)124 (19%)251 (39%) Jakarta-JMeter (20%)158 (42%)71 (19%) JBoss AOP (32%)81 (51%)14 (9%)13 (8%) JFreeChart (18%)415 (49%)168 (20%)113 (13%) JRuby (36%)70 (22%)63 (20%)72 (23%) EMF (19%)136 (48%)52 (18%)42 (15%) JEdit (26%)120 (35%)88 (26%)46 (13%) Squirrel-SQL (18%)164 (38%)70 (16%)116 (27%)
Summary Clone group representation localized on just one clone instance directly in the source editor – Provides a quick summary of the clone properties – No need to open every occurrence of each clone – Evaluated artifacts show that > 50% of groups are fully represented Remaining groups with non-supported parameterized elements still identified appropriately Future work – Recognize more parameterized elements to reduce the number of non-matched statements – Determine the extent that the representation can be utilized without becoming just cluttered and less useful 18
Key Need Emerging from this Work Many clone detection and analysis tools present results in a proprietary format This makes it very challenging to chain tools together and to build new functionality A strong need is a common standard format for representing results of clone tools Questions: – Why is it that such a standard is not emerging as a widespread need for this community? – What is needed to get tool providers to consider efforts like the Rich Clone Format (RCF)?
Thank You Personal: – Code Clones Literature: – Code Clones Literature (RSS Feed): – 20