Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.

Slides:



Advertisements
Similar presentations
Tool Support for Refactoring Functional Programs Huiqing Li Claus Reinke Simon Thompson Computing Lab, University of Kent
Advertisements

Progress on ‘HaRe: The Haskell Refactorer’ Huiqing Li, Claus Reinke, Simon Thompson Computing Laboratory, University of Kent Refactoring is the process.
Editing and Imputing VAT Data for the Purpose of Producing Mixed- Source Turnover Estimates Hannah Finselbach and Daniel Lewis Office for National Statistics,
Elaboration or: Semantic Analysis Compiler Baojian Hua
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Refactoring Erlang Programs Huiqing Li Simon Thompson University of Kent.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
Improving your (test) code with Wrangler Huiqing Li, Simon Thompson University of Kent Andreas Schumacher Ericsson Software Research Adam Lindberg Erlang.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
The Formalisation of Haskell Refactorings Huiqing Li Simon Thompson Computing Lab, University of Kent
Elaboration or: Semantic Analysis Compiler Baojian Hua
Refactoring Haskell Programs Huiqing Li Computing Lab, University of Kent
WRT 2007 Refactoring Functional Programs Huiqing Li Simon Thompson Computing Lab Chris Brown Claus Reinke University of Kent.
Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Query Processing Presented by Aung S. Win.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Programming Languages and Paradigms Object-Oriented Programming.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Epydoc API Documentation Extraction in Python Edward Loper.
Semantic Analysis Legality checks –Check that program obey all rules of the language that are not described by a context-free grammar Disambiguation –Name.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
CODE. Using Wrangler to refactor Erlang programs and tests Simon Thompson, Huiqing Li Adam Lindberg, Andreas Schumacher University of Kent, Erlang Solutions,
DIY Refactorings in Wrangler Huiqing Li Simon Thompson School of Computing University of Kent.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
“An Approach to Identify Duplicated Web Pages” G. Lucca, M. Penta, A. Fasolino Compsac’02 pp Today presented by Kenny Kwok.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Cross Language Clone Analysis Team 2 October 27, 2010.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Applying Clone.
Supported by ELTE IKKK, Ericsson Hungary, in cooperation with University of Kent Erlang refactoring with relational database Anikó Víg and Tamás Nagy Supervisors:
Generative Programming. Automated Assembly Lines.
Supported by ELTE IKKK, Ericsson Hungary, in cooperation with University of Kent Erlang refactoring with relational database Anikó Víg and Tamás Nagy Supervisors:
ADTs and C++ Classes Classes and Members Constructors The header file and the implementation file Classes and Parameters Operator Overloading.
Design Patterns Gang Qian Department of Computer Science University of Central Oklahoma.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Refactoring Erlang Programs Huiqing Li Simon Thompson University of Kent Zoltán Horváth Eötvös Loránd Univ.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
1 CSC/ECE 517 Fall 2010 Lec. 3 Overview of Eclipse Lectures Lecture 2 “Lecture 0” Lecture 3 1.Overview 2.Installing and Running 3.Building and Running.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Duplicate code detection using anti-unification Peter Bulychev Moscow State University Marius Minea Institute eAustria, Timisoara.
Chapter 1 Introduction Major Data Structures in Compiler
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Re-Configurable Byzantine Quorum System Lei Kong S. Arun Mustaque Ahamad Doug Blough.
Getting the right module structure: using Wrangler to fix your projects Simon Thompson, Huiqing Li School of Computing, University of Kent, UK.
Collections Data structures in Java. OBJECTIVE “ WHEN TO USE WHICH DATA STRUCTURE ” D e b u g.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Cross Language Clone Analysis Team 2 February 3, 2011.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
Bernd Fischer RW713: Compiler and Software Language Engineering.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
Cross Language Clone Analysis Team 2 February 3, 2011.
Semantic Analysis II Type Checking EECS 483 – Lecture 12 University of Michigan Wednesday, October 18, 2006.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
CMSC 202 Containers and Iterators. Container Definition A “container” is a data structure whose purpose is to hold objects. Most languages support several.
JAVA: An Introduction to Problem Solving & Programming, 6 th Ed. By Walter Savitch ISBN © 2012 Pearson Education, Inc., Upper Saddle River,
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Procedure Activations Programming Language. Exploration Name ocurrenceDeclarationLocationValue scopeactivationstate a.From names to their declarations.
ECE 750 Topic 8 Meta-programming languages, systems, and applications Automatic Program Specialization for J ava – U. P. Schultz, J. L. Lawall, C. Consel.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
CBCD: Cloned Buggy Code Detector
Section 11.1 Class Variables and Methods
Dynamic Data Structures and Generics
CS 350 – Software Design Singleton – Chapter 21
Presentation transcript:

Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK

Overview Erlang Wrangler Clone detection Clone elimination Case studies Conclusions and future work

Erlang Weakly typed functional programming language. Built-in support for concurrency, distribution and fault- tolerance. Some eccentricities: multiple binding occurrences, bound variables in patterns, multiple usages of atoms, side-effects,.... % Factorial in Erlang. -module (fac). -export ([fac/1]). fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1).

Wrangler Basic refactorings: structural, macro, process and test-framework related Clone detection + removal Improve module structure

Clone Detection

The Wrangler clone detector –Report clone classes whose members are identical or similar –No false positives –High recall rate –Scalable.

X+4Y+5X+4X+4Y+5Y+5 What is ‘identical’ code? variable + number Identical if values of literals and variables ignored, but respecting binding structure.

(X+3)+44+(5-(3*X)) What is ‘similar’ code? X+YX+Y The anti-unification gives the (most specific) common generalisation. Similarity = min(,, ) ||(X+3)+4||||4+(5-(3*X))|| ||X+Y||

Clone Detection All clones in a project meeting the threshold parameters. Thresholds: –minimum number of expressions, –minimum number of tokens, –minimum number of duplications, –maximum number of new parameters, and –minimum similarity score.

Clone result with threshold values: 1, 40, 2, 4, 0.8:

Clone result with threshold values: 3, 20, 2, 2,0.8:

Implementation

Clone detection in an incremental way. –Initial clone detection. –Incremental clone detection. AST-based two-phase clone detection.

Parse program, annotate and serialise AST Generalise and hash expression Clone detection using generalised suffix tree Examination of clone candidates using anti-unification Source Erlang programs Serialised AAST Hashed expression sequences Initial clone candidates Final clones The Initial Detection Algorithm Bypasses the Erlang pre- processor; Location information included In AST; Static semantic information added to AST AAST traversed, and expression sequences collected. Bypasses the Erlang pre- processor; Location information included In AST; Static semantic information added to AST AAST traversed, and expression sequences collected. Capture structural similarity between expressions while keeping a structural skeleton of the original; Replace certain substrees with a placeholder, but only if sensible to do so. Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers. Capture structural similarity between expressions while keeping a structural skeleton of the original; Replace certain substrees with a placeholder, but only if sensible to do so. Each expression statement is hashed and mapped to an integer; therefore each expression sequence is mapped to a sequence of integers. Check a candidate clone class for anti-unification, and will return none, one or more clone classes; Generation of anti_unifier function; Generation of application instances. Check a candidate clone class for anti-unification, and will return none, one or more clone classes; Generation of anti_unifier function; Generation of application instances.

The Initial Detection Algorithm Designed with incremental clone detection in mind. –Use relative locations, every function starts from location {1, 1}; –Intermediate information cached: AAST, Static semantic information, hash information, clone table.

The Incremental Detection Algorithm Follow the same steps as the initial detection algorithm, but reuse and incrementally update the information cached from the previous run of the clone detection. Take a function, instead of a file, as a unit to track changes. Track the change of clones, mark each clone class as new, unchanged, change+, changed-, or change+-.

Clone Elimination Fully automatic clone elimination not desirable in practice. –Choice of clones to remove. –functionality of the clone needs to be examined. –the anti-unification function of a clone class, and its parameters need to be renamed. –A host module for the anti-unification function needs to be selected.

Clone Elimination with Wrangler Copy and paste the anti_unification function to an proper Erlang module. Modify the anti_unification function is necessary. Rename function name. Rename variable names. Re-order function parameters. Apply ‘fold expressions against a function definition’ to the new function.

Case Study 1

Incremental vs. Standalone Clone Detection

Case Study 2

SIP case study Session Initiation Protocol SIP message processing allows rewriting rules to transform messages. SIP message manipulation (SMM) is tested by smm_SUITE.erl, 2658 LOC.

Clone detection

Reducing the case study Step ……

Case Study 3

Conclusions Efficient clone detection on medium-sized projects. Possible to improve code using these techniques, but only with expert involvement. A mechanism for clone detection to contribute to the daily reports from incremental nightly builds; case-study for this with LambdaStream.

Future Work To extend the tool to detect expression sequences which are similar up to insertion, or deletion of some expressions. To check client code against libraries.

Thank you!