CODE. Using Wrangler to refactor Erlang programs and tests Simon Thompson, Huiqing Li Adam Lindberg, Andreas Schumacher University of Kent, Erlang Solutions,

Slides:



Advertisements
Similar presentations
Duplicate code detection using Clone Digger Peter Bulychev Lomonosov Moscow State University CS department.
Advertisements

Symbol Table.
Tool Support for Refactoring Functional Programs Huiqing Li Claus Reinke Simon Thompson Computing Lab, University of Kent
Clean code. Motivation Total cost = the cost of developing + maintenance cost Maintenance cost = cost of understanding + cost of changes + cost of testing.
The Assembly Language Level
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Reverse Engineering © SERG Code Cloning: Detection, Classification, and Refactoring.
Refactoring Erlang Programs Huiqing Li Simon Thompson University of Kent.
Improving your (test) code with Wrangler Huiqing Li, Simon Thompson University of Kent Andreas Schumacher Ericsson Software Research Adam Lindberg Erlang.
Property-based Testing – ProTest FP7 Strep John Derrick University of Sheffield, and members of the ProTest team.
Xyleme A Dynamic Warehouse for XML Data of the Web.
1 Programming for Engineers in Python Autumn Lecture 5: Object Oriented Programming.
Software Connectors. Attach adapter to A Maintain multiple versions of A or B Make B multilingual Role and Challenge of Software Connectors Change A’s.
Tutorial 6 & 7 Symbol Table
Refactoring Haskell Programs Huiqing Li Computing Lab, University of Kent
WRT 2007 Refactoring Functional Programs Huiqing Li Simon Thompson Computing Lab Chris Brown Claus Reinke University of Kent.
Architectural Design Principles. Outline  Architectural level of design The design of the system in terms of components and connectors and their arrangements.
Generative Programming. Generic vs Generative Generic Programming focuses on representing families of domain concepts Generic Programming focuses on representing.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
CSC 8310 Programming Languages Meeting 2 September 2/3, 2014.
Genetic Programming.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
INTRODUCTION TO PROGRAMMING STRUCTURE Chapter 4 1.
Hands-on Refactoring with Wrangler Simon Thompson Huiqing Li, Xingdong Bian University of Kent.
Rendering Contexts and Components What is a uPortal3 context ? –Defines all aspects of a traditional portal instance Design, navigation, profiles Parameter.
DIY Refactorings in Wrangler Huiqing Li Simon Thompson School of Computing University of Kent.
Software Refactoring Part I: Introduction Bartosz Walter Advanced Object-Oriented Design Lecture 5.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
Reviewing Recent ICSE Proceedings For:.  Defining and Continuous Checking of Structural Program Dependencies  Automatic Inference of Structural Changes.
CSC-682 Cryptography & Computer Security Sound and Precise Analysis of Web Applications for Injection Vulnerabilities Pompi Rotaru Based on an article.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
CSE 425: Object-Oriented Programming I Object-Oriented Programming A design method as well as a programming paradigm –For example, CRC cards, noun-verb.
Cross Language Clone Analysis Team 2 October 27, 2010.
ISV Innovation Presented by ISV Innovation Presented by Business Intelligence Fundamentals: Data Cleansing Ola Ekdahl IT Mentors 9/12/08.
Supported by ELTE IKKK, Ericsson Hungary, in cooperation with University of Kent Erlang refactoring with relational database Anikó Víg and Tamás Nagy Supervisors:
Generative Programming. Automated Assembly Lines.
Supported by ELTE IKKK, Ericsson Hungary, in cooperation with University of Kent Erlang refactoring with relational database Anikó Víg and Tamás Nagy Supervisors:
ADTs and C++ Classes Classes and Members Constructors The header file and the implementation file Classes and Parameters Operator Overloading.
Built-in Data Structures in Python An Introduction.
Refactoring Deciding what to make a superclass or interface is difficult. Some of these refactorings are helpful. Some research items include Inheritance.
Refactoring Erlang Programs Huiqing Li Simon Thompson University of Kent Zoltán Horváth Eötvös Loránd Univ.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Refactoring. Refactoring Overview  What is refactoring?  What are four good reasons to refactor?  When should you refactor?  What is a bad smell (relative.
CS536 Semantic Analysis Introduction with Emphasis on Name Analysis 1.
Chapter 1 Introduction Major Data Structures in Compiler
Getting the right module structure: using Wrangler to fix your projects Simon Thompson, Huiqing Li School of Computing, University of Kent, UK.
 2003 Prentice Hall, Inc. All rights reserved. 1 IS 0020 Program Design and Software Tools Preprocessor Midterm Review Lecture 7 Feb 17, 2004.
1 Measuring Similarity of Large Software System Based on Source Code Correspondence Tetsuo Yamamoto*, Makoto Matsushita**, Toshihiro Kamiya***, Katsuro.
XP New Perspectives on XML, 2 nd Edition Tutorial 7 1 TUTORIAL 7 CREATING A COMPUTATIONAL STYLESHEET.
Scalable Clone Detection and Elimination for Erlang Programs Huiqing Li, Simon Thompson University of Kent Canterbury, UK.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
SEG 4110 – Advanced Software Design and Reengineering Topic T Introduction to Refactoring.
 Software Clones:( Definitions from Wikipedia) ◦ Duplicate code: a sequence of source code that occurs more than once, either within a program or across.
Cross Language Clone Analysis Team 2 February 3, 2011.
Refactoring Agile Development Project. Lecture roadmap Refactoring Some issues to address when coding.
Compiler Construction CPCS302 Dr. Manal Abdulaziz.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
WMO GRIB Edition 3 Enrico Fucile Inter-Program Expert Team on Data Representation Maintenance and Monitoring IPET-DRMM Geneva, 30 May – 3 June 2016.
The Data Large Number of Workbooks Each Workbook has multiple worksheets Transaction worksheets have large (LARGE) number of lines (millions of records.
An Algorithm for Detecting and Removing Clones in Java Code Nicolas Juillerat & Béat Hirsbrunner Pervasive and Artificial Intelligence ( ) Research Group,
Introduction to Parsing (adapted from CS 164 at Berkeley)
Compiler Construction (CS-636)
Semantic Analysis with Emphasis on Name Analysis
CBCD: Cloned Buggy Code Detector
Refactoring SENG 301 © Refactoring: Improving the Design of Existing Code, Martin Fowler, Kent Beck Source:
Lexical and Syntax Analysis
High Coverage Detection of Input-Related Security Faults
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Programming Fundamentals (750113) Ch1. Problem Solving
Presentation transcript:

CODE

Using Wrangler to refactor Erlang programs and tests Simon Thompson, Huiqing Li Adam Lindberg, Andreas Schumacher University of Kent, Erlang Solutions, Ericsson

Overview Refactoring Erlang in Wrangler Clone detection and elimination Implementation Case study: SIP message manipulation ProTest project: property-based testing

Introduction

Refactoring Refactoring means changing the design or structure of a program … without changing its behaviour. RefactorModify

Soft-ware There’s no single correct design … … different options for different situations. Maintain flexibility as the system evolves.

Generalisation -module (test). -export([f/1]). add_one ([H|T]) -> [H+1 | add_one(T)]; add_one ([]) -> []. f(X) -> add_one(X). -module (test). -export([f/1]). add_one (N, [H|T]) -> [H+N | add_one(N,T)]; add_one (N,[]) -> []. f(X) -> add_one(1, X). -module (test). -export([f/1]). add_int (N, [H|T]) -> [H+N | add_int(N,T)]; add_int (N,[]) -> []. f(X) -> add_int(1, X). Generalisation and renaming

Generalisation -export([printList/1]). printList([H|T]) -> io:format("~p\n",[H]), printList(T); printList([]) -> true. printList([1,2,3]) -export([printList/2]). printList(F,[H|T]) -> F(H), printList(F, T); printList(F,[]) -> true. printList( fun(H) -> io:format("~p\n", [H]) end, [1,2,3]).

Refactoring tool support Bureaucratic and diffuse. Tedious and error prone. Semantics: scopes, types, modules, … Undo/redo Enhanced creativity

Refactoring = Transformation + Condition Transformation Ensure change at all those points needed. Ensure change at only those points needed. Condition Is the refactoring applicable? Will it preserve the semantics of the module? the program?

Static vs dynamic Aim to check conditions statically. Static analysis tools possible … but some aspects intractable: e.g. dynamically manufactured atoms. Conservative vs liberal. Compensation?

Wrangler Refactoring tool for Erlang Integrated into Emacs and Eclipse / ErlIDE. Multiple modules Structural, process, macro refactorings Duplicate code detection … … and elimination Testing / refactoring "Similar" code identification Property discovery

Architecture of Wrangler

Integration with ErlIDE Tighter control of what makes up a project. Potential for adoption by newcomers to the Erlang community.

Clone detection

Duplicate code considered harmful It’s a bad smell … increases chance of bug propagation, increases size of the code, increases compile time, and, increases the cost of maintenance. But … it’s not always a problem.

Clone detection The Wrangler clone detector –relatively efficient –no false positives User-guided interactive removal of clones. Integrated into development environments.

X+4Y+5X+4X+4Y+5Y+5 What is ‘identical’ code? variable + number Identical if values of literals and variables ignored, but respecting binding structure.

(X+3)+44+(5-(3*X))(X+3)+44+(5-(3*X)) What is ‘similar’ code? X+YX+Y The anti-unification gives the (most specific) common generalisation.

Detection Expression search All instances of expressions similar to this expression … … and their common generalisation. Default threshold: ≥ 20 tokens. All clones in a project meeting the threshold parameters … … and their common generalisations. Default threshold: ≥ 5 expressions and similarity of ≥ 0.8.

Similarity Threshold: anti-unifier should be big enough relative to the class members: similarity = min i=1..n (size(AU)/size(E i )) where AU = anti-unifier(E 1, …,E n ). Can also threshold length of expression sequence, or number of tokens, or ….

Implementation

Parse the program with modified parser to ensure that location information (line, column) is included. This ensures that can map between different program representations. Bypasses the Erlang pre- processor. Parse the program with modified parser to ensure that location information (line, column) is included. This ensures that can map between different program representations. Bypasses the Erlang pre- processor. Parse program

Resolve the use of identifiers to their binding occurrences. Use location information to identify occurrences. Erlang allows a variable to have multiple binding occurrences, e.g. in different arms of a case expression. Resolve the use of identifiers to their binding occurrences. Use location information to identify occurrences. Erlang allows a variable to have multiple binding occurrences, e.g. in different arms of a case expression. Annotate AST

Capture structural similarity between expressions while keeping a structural skeleton of the original. Replace certain subtrees with a placeholder … … but only if sensible to do this, e.g. expressions including funs but not conditionals, patterns, try … catch …, receive, etc. Capture structural similarity between expressions while keeping a structural skeleton of the original. Replace certain subtrees with a placeholder … … but only if sensible to do this, e.g. expressions including funs but not conditionals, patterns, try … catch …, receive, etc. Generalise AST

Example of generalised code foo(X) -> ? = case ? of ? -> ?; ? -> ? end, ?, ?. foo(X) -> Y = case X of one -> 12; Others -> 196 end, X+Y, g(X,Y).

Pretty print generalised sub- expression sequences and then serialise into a single sequence. A delimiter separates each sub- expression sequence. Pretty print generalised sub- expression sequences and then serialise into a single sequence. A delimiter separates each sub- expression sequence. Serialise the AST foo(X, Y) -> A = case X>Y of true -> Z=1, X + Y + Z; false -> Z = 2, X + Y -2 end, A A = case … A Z=1 X + Y + Z -- Z = 2 X + Y -2

Hash each expression, mapping it to an 128 bit value, using non- clashing hash function. Expressions represented by start / end positions in the source code. Hash values stored in indexed table - indexes smaller than hash values. Hash each expression, mapping it to an 128 bit value, using non- clashing hash function. Expressions represented by start / end positions in the source code. Hash values stored in indexed table - indexes smaller than hash values. Hash expressions

Build suffix tree Build a suffix tree from the expression sequence. Clones are given by paths that branch.

Check a clone class for anti- unification. Will return no classes, one class, or multiple sub-classes each with the corresponding anti-unification function. Results depend on the threshold parameters. Check a clone class for anti- unification. Will return no classes, one class, or multiple sub-classes each with the corresponding anti-unification function. Results depend on the threshold parameters. Check clone classes

Example: clone candidate S1 = "This", S2 = " is a ", S3 = "string", [S1,S2,S3] S1 = "This", S2 = "is another ", S3 = "String", [S3,S2,S1] D1 = [1], D2 = [2], D3 = [3], [D1,D2,D3] D1 = [X+1], D2 = [5], D3 = [6], [D3,D2,D1] ? = ?, [?,?,?]

Example: clone from sub-sequence S1 = "This", S2 = " is a ", S3 = "string", [S1,S2,S3] S1 = "This", S2 = "is another ", S3 = "String", [S3,S2,S1] D1 = [1], D2 = [2], D3 = [3], [D1,D2,D3] D1 = [X+1], D2 = [5], D3 = [6], [D3,D2,D1] new_fun(NewVar_1, NewVar_2, NewVar_3) -> S1 = NewVar_1, S2 = NewVar_2, S3 = NewVar_3, {S1,S2,S3}.

Example: sub-clones S1 = "This", S2 = " is a ", S3 = "string", [S1,S2,S3] S1 = "This", S2 = "is another ", S3 = "String", [S3,S2,S1] D1 = [1], D2 = [2], D3 = [3], [D1,D2,D3] D1 = [X+1], D2 = [5], D3 = [6], [D3,D2,D1] new_fun(NewVar_1, NewVar_2, NewVar_3) -> S1 = NewVar_1, S2 = NewVar_2, S3 = NewVar_3, [S1,S2,S3]. new_fun(NewVar_1, NewVar_2, NewVar_3) -> S1 = NewVar_1, S2 = NewVar_2, S3 = NewVar_3, [S3,S2,S1].

Clone classes are reported in two different orders the size of the clone class, and the size of the members of the clone. Together with each class is the anti-unifier, rendered as an Erlang function definition to cut and paste into the program. Clone classes are reported in two different orders the size of the clone class, and the size of the members of the clone. Together with each class is the anti-unifier, rendered as an Erlang function definition to cut and paste into the program. Clone class output

SIP Case Study

Why test code particularly? Many people touch the code. Write some tests … write more by copy, paste and modify. Similarly with long-standing projects, with a large element of legacy code.

“Who you gonna call?” Can reduce by 20% just by aggressively removing all the clones identified … … what results is of no value at all. Need to call in the domain experts.

SIP case study Session Initiation Protocol SIP message manipulation allows rewriting rules to transform messages. Test by smm_SUITE.erl, 2658 LOC.

Reducing the case study ……

Step 1 The largest clone class has 15 members. The suggested function has no parameters, so the code is literally repeated.

Not step 1 The largest clone has 88 lines, and 2 parameters. But what does it represent? What to call it? Best to work bottom up.

The general pattern Identify a clone. Introduce the corresponding generalisation. Eliminate all the clone instances. So what’s the complication?

Step 3 23 line clone occurs; choose to replace a smaller clone. Rename function and parameters, and reorder them. new_fun() -> {FilterKey1, FilterName1, FilterState, FilterKey2, FilterName2} = create_filter_12(), ?OM_CHECK([#smmFilter{key=FilterKey1, filterName=FilterName1, filterState=FilterState, module=undefined}], ?SGC_BS, ets, lookup, [smmFilter, FilterKey1]), ?OM_CHECK([#smmFilter{key=FilterKey2, filterName=FilterName2, filterState=FilterState, module=undefined}], ?SGC_BS, ets, lookup, [smmFilter, FilterKey2]), ?OM_CHECK([#sbgFilterTable{key=FilterKey1, sbgFilterName=FilterName1, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey1]), ?OM_CHECK([#sbgFilterTable{key=FilterKey2, sbgFilterName=FilterName2, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey2]), {FilterName2, FilterKey2, FilterKey1, FilterName1, FilterState}. new_fun() -> {FilterKey1, FilterName1, FilterState, FilterKey2, FilterName2} = create_filter_12(), ?OM_CHECK([#smmFilter{key=FilterKey1, filterName=FilterName1, filterState=FilterState, module=undefined}], ?SGC_BS, ets, lookup, [smmFilter, FilterKey1]), ?OM_CHECK([#smmFilter{key=FilterKey2, filterName=FilterName2, filterState=FilterState, module=undefined}], ?SGC_BS, ets, lookup, [smmFilter, FilterKey2]), ?OM_CHECK([#sbgFilterTable{key=FilterKey1, sbgFilterName=FilterName1, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey1]), ?OM_CHECK([#sbgFilterTable{key=FilterKey2, sbgFilterName=FilterName2, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey2]), {FilterName2, FilterKey2, FilterKey1, FilterName1, FilterState}. check_filter_exists_in_sbgFilterTable(FilterKey, FilterName, FilterState) -> ?OM_CHECK([#sbgFilterTable{key=FilterKey, sbgFilterName=FilterName, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey]). check_filter_exists_in_sbgFilterTable(FilterKey, FilterName, FilterState) -> ?OM_CHECK([#sbgFilterTable{key=FilterKey, sbgFilterName=FilterName, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey]).

Steps 4, 5 2 variants of check_filter_exists_in_sbgFilterTable … Check for the filter occurring uniquely in the table: call to ets:tab2list instead of ets:lookup. Check a different table, replace sbgFilterTable by smmFilter. Don’t generalise: too many parameters, how to name? check_filter_exists_in_sbgFilterTable(FilterKey, FilterName, FilterState) -> ?OM_CHECK([#sbgFilterTable{key=FilterKey, sbgFilterName=FilterName, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey]). check_filter_exists_in_sbgFilterTable(FilterKey, FilterName, FilterState) -> ?OM_CHECK([#sbgFilterTable{key=FilterKey, sbgFilterName=FilterName, sbgFilterState=FilterState}], ?MP_BS, ets, lookup, [sbgFilterTable, FilterKey]).

Symbolic calls to deprecated code: erlang:module_loaded erlang:module_loaded(M) -> true | false code:is_loaded(M) -> {file, Loaded} | false Define new function code_is_loaded : code_is_loaded(BS, ModuleName, Result) -> ?OM_CHECK(Result, BS, erlang, module_loaded,[ModuleName]). Remove all calls using fold against function refactoring. Different checks: ?OM_CHECK vs ?CH_CHECK code_is_loaded(BS, om, ModuleName, false) -> ?OM_CHECK(false, BS, code, is_loaded, [ModuleName]). code_is_loaded(BS, om, ModuleName, true) -> ?OM_CHECK({file, atom_to_list(ModuleName)}, BS, code, is_loaded, [ModuleName]). But the calls to ?OM_CHECK have disappeared at step 6 … … a case of premature generalisation! Need to inline code_is_loaded/3 to be able to use this … Step 7

Step 10 ‘Widows’ and ‘orphans’ in clone identification. Avoid passing commands as parameters? Also at step 11. new_fun(FilterName, NewVar_1) -> FilterKey = ?SMM_CREATE_FILTER_CHECK(FilterName), %Add rulests to filter RuleSetNameA = "a", RuleSetNameB = "b", RuleSetNameC = "c", RuleSetNameD = "d", lines which handle the rules sets are elided... %Remove rulesets NewVar_1, {RuleSetNameA, RuleSetNameB, RuleSetNameC, RuleSetNameD, FilterKey}. new_fun(FilterName, NewVar_1) -> FilterKey = ?SMM_CREATE_FILTER_CHECK(FilterName), %Add rulests to filter RuleSetNameA = "a", RuleSetNameB = "b", RuleSetNameC = "c", RuleSetNameD = "d", lines which handle the rules sets are elided... %Remove rulesets NewVar_1, {RuleSetNameA, RuleSetNameB, RuleSetNameC, RuleSetNameD, FilterKey}. new_fun(FilterName, FilterKey) -> %Add rulests to filter RuleSetNameA = "a", RuleSetNameB = "b", RuleSetNameC = "c", RuleSetNameD = "d", lines which handle the rules sets are elided... %Remove rulesets {RuleSetNameA, RuleSetNameB, RuleSetNameC, RuleSetNameD}. new_fun(FilterName, FilterKey) -> %Add rulests to filter RuleSetNameA = "a", RuleSetNameB = "b", RuleSetNameC = "c", RuleSetNameD = "d", lines which handle the rules sets are elided... %Remove rulesets {RuleSetNameA, RuleSetNameB, RuleSetNameC, RuleSetNameD}.

Steps 14+ Similar code detection (default params): 16 clones, each duplicated once. 193 lines in total: get 145 line reduction. Reduce similarity to 0.5 rather than the default of 0.8: 47 clones. Other refactorings: data etc.

Going further

Property-based testing Property-based testing will deliver more effective tests, more efficiently. Property discovery Test and property evolution Property monitoring Analysing concurrent systems

Property discovery in Wrangler Find (test) code that is similar … … build a common abstraction … accumulate the instances … and generalise the instances. Example: Test code from Ericsson: different media and codecs. Generalisation to all medium/codec combinations.

Systems test: FSM discovery Use +ve and -ve cases.Use FSM to model expected behaviour. Test random paths through the FSM to test system function. Extract the FSM from sets of existing test cases.

Refactoring and testing Respect test code in EUnit, QuickCheck and Common Test … … and refactor tests along with refactoring the code itself. Refactor tests e.g. Tests into EUnit tests. Group EUnit tests into a single test generator. Move EUnit tests into a separate test module. Normalise EUnit tests. Extract setup and tear- down into EUnit fixtures.

Next steps Refine the notion of similarity … … to take account of insert / delete in command seqs. Scaling up: look for incremental version; check vs. libraries … Refactorings of tests and properties themselves. Extracting FSMs from sets of tests. Support property extraction from 'free' and EUnit tests.

Conclusions Efficient clone detection possible on medium-medium sized projects. This supports improved testing … … but only with expert involvement. There's a useful interaction between refactoring and testing.