Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.

Similar presentations

Presentation on theme: "Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK."— Presentation transcript:

1 Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK

2 Overview Erlang Wrangler Clone detection Clone elimination Case studies Conclusions and future work

3 Erlang Weakly typed functional programming language. Built-in support for concurrency, distribution and fault- tolerance. Some eccentricities: multiple binding occurrences, bound variables in patterns, multiple usages of atoms, side-effects,.... % Factorial in Erlang. -module (fac). -export ([fac/1]). fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1).

4 Wrangler Basic refactorings: structural, macro, process and test-framework related Clone detection + removal Improve module structure



7 Clone Detection

8 The Wrangler clone detector –Report clone classes whose members are identical or similar –No false positives –High recall rate –Scalable.

9 X+4Y+5X+4X+4Y+5Y+5 What is ‘identical’ code? variable + number Identical if values of literals and variables ignored, but respecting binding structure.

10 (X+3)+44+(5-(3*X)) What is ‘similar’ code? X+YX+Y The anti-unification gives the (most specific) common generalisation. Similarity = min(,, ) ||(X+3)+4||||4+(5-(3*X))|| ||X+Y||

11 Clone Detection All clones in a project meeting the threshold parameters. Thresholds: –minimum number of expressions, –minimum number of tokens, –minimum number of duplications, –maximum number of new parameters, and –minimum similarity score.


13 Clone result with threshold values: 1, 40, 2, 4, 0.8:

14 Clone result with threshold values: 3, 20, 2, 2,0.8:

15 Implementation

16 Clone detection in an incremental way. –Initial clone detection. –Incremental clone detection. AST-based two-phase clone detection.

17 Parse program, annotate and serialise AST Generalise and hash expression Clone detection using generalised suffix tree Examination of clone candidates using anti-unification Source Erlang programs Serialised AAST Hashed expression sequences Initial clone candidates Final clones The Initial Detection Algorithm Bypasses the Erlang pre- processor; Location information included In AST; Static semantic information added to AST AAST traversed, and expression sequences collected. Bypasses the Erlang pre- processor; Location information included In AST; Static semantic information added to AST AAST traversed, and expression sequences collected. Capture structural similarity between expressions while keeping a structural skeleton of the original; Replace certain substrees with a placeholder, but only if sensible to do so. Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers. Capture structural similarity between expressions while keeping a structural skeleton of the original; Replace certain substrees with a placeholder, but only if sensible to do so. Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers. Check a candidate clone class for anti-unification, and will return none, one or more clone classes; Generation of anti_unifier function; Generation of application instances. Check a candidate clone class for anti-unification, and will return none, one or more clone classes; Generation of anti_unifier function; Generation of application instances.

18 The Initial Detection Algorithm Designed with incremental clone detection in mind. –Use relative locations, every function starts from location {1, 1}; –Intermediate information cached: AAST, Static semantic information, hash information, clone table.

19 The Incremental Detection Algorithm Follow the same steps as the initial detection algorithm, but reuse and incrementally update the information cached from the previous run of the clone detection. Take a function, instead of a file, as a unit to track changes. Track the change of clones, mark each clone class as new, unchanged, change+, changed-, or change+-.


21 Clone Elimination Fully automatic clone elimination not desirable in practice. –Choice of clones to remove. –functionality of the clone needs to be examined. –the anti-unification function of a clone class, and its parameters need to be renamed. –A host module for the anti-unification function needs to be selected.

22 Clone Elimination with Wrangler Copy and paste the anti_unification function to an proper Erlang module. Modify the anti_unification function is necessary. Rename function name. Rename variable names. Re-order function parameters. Apply ‘fold expressions against a function definition’ to the new function.

23 Case Study 1

24 Incremental vs. Standalone Clone Detection

25 Case Study 2

26 SIP case study Session Initiation Protocol SIP message processing allows rewriting rules to transform messages. SIP message manipulation (SMM) is tested by smm_SUITE.erl, 2658 LOC.

27 Clone detection


29 Reducing the case study Step1265862218112131 2234272203122097 3223182201132042 4221792183…… 52216102149

30 Case Study 3


32 Conclusions Efficient clone detection on medium-sized projects. Possible to improve code using these techniques, but only with expert involvement. A mechanism for clone detection to contribute to the daily reports from incremental nightly builds; case-study for this with LambdaStream.

33 Future Work To extend the tool to detect expression sequences which are similar up to insertion, or deletion of some expressions. To check client code against libraries.

34 Thank you!

Download ppt "Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK."

Similar presentations

Ads by Google