Tools for Refactoring Functional Programs Simon Thompson with Huiqing Li Claus Reinke
LIL Design Models Prototypes Design documents Visible artifacts
LIL All in the code … Functional programs embody their design in their code. This is enabled by their high-level nature: constructs, types … data Message = Message Head Body data Head = Head Metadata Title data Metadata = Metadata [Tags] type Title = String … data Message = Message Head Body data Head = Head Metadata Title data Metadata = Metadata [Tags] type Title = String …
LIL Evolution Successful systems are long lived … … and evolve continuously. Supporting evolution of code and design?
LIL Soft-Ware There’s no single correct design … … different options for different situations. Maintain flexibility as the system evolves.
LIL Refactoring Refactoring means changing the design or structure of a program … without changing its behaviour. RefactorModify
LIL Not just programming Paper or presentation moving sections about; amalgamate sections; move inline code to a figure; animation; … Proof add lemma; remove, amalgamate hypotheses, … Program the topic of the lecture
LIL Splitting a function in two
LIL Splitting a function in two
LIL Splitting a function in two
LIL Splitting a function module Split where f :: [String] -> String f ys = foldr (++) [] [ y++"\n" | y <- ys ]
LIL Splitting a function module Split where f :: [String] -> String f ys = foldr (++) [] [ y++"\n" | y <- ys ]
LIL Splitting a function module Split where f :: [String] -> String f ys = join [y ++ "\n" | y <- ys] where join = foldr (++) []
LIL Splitting a function module Split where f :: [String] -> String f ys = join [y ++ "\n" | y <- ys] where join = foldr (++) []
LIL Splitting a function module Split where f :: [String] -> String f ys = join addNL where join zs = foldr (++) [] zs addNL = [y ++ "\n" | y <- ys]
LIL Splitting a function module Split where f :: [String] -> String f ys = join addNL where join zs = foldr (++) [] zs addNL = [y ++ "\n" | y <- ys]
LIL Splitting a function module Split where f :: [String] -> String f ys = join (addNL ys) where join zs = foldr (++) [] zs addNL ys = [y ++ "\n" | y <- ys]
LIL Splitting a function module Split where f :: [String] -> String f ys = join (addNL ys) where join zs = foldr (++) [] zs addNL ys = [y ++ "\n" | y <- ys]
LIL Splitting a function module Split where f :: [String] -> String f ys = join (addNL ys) join zs = foldr (++) [] zs addNL ys = [y ++ "\n" | y <- ys]
LIL Overview Example refactorings: what they involve. Building the HaRe tool. Design rationale. Infrastructure. Haskell and Erlang. The Wrangler tool. Conclusions.
LIL Haskell 98 Standard, lazy, strongly typed, functional programming language. Layout is significant … “offside rule” … and idiosyncratic. doSwap pnt = applyTP (full_buTP (idTP `adhocTP` inMatch `adhocTP` inExp `adhocTP` inDecl)) where inMatch ((HsMatch loc fun pats rhs ds)::HsMatchP) | fun == pnt = case pats of (p1:p2:ps) -> do pats'<-swap p1 p2 pats return (HsMatch loc fun pats' rhs ds) _ -> error "Insufficient arguments to swap." inMatch m = return m inExp (HsApp (Exp (HsApp e e1)) e2))::HsExpP) | expToPNT e == pnt = swap e1 e2 exp inExp e = return e doSwap pnt = applyTP (full_buTP (idTP `adhocTP` inMatch `adhocTP` inExp `adhocTP` inDecl)) where inMatch ((HsMatch loc fun pats rhs ds)::HsMatchP) | fun == pnt = case pats of (p1:p2:ps) -> do pats'<-swap p1 p2 pats return (HsMatch loc fun pats' rhs ds) _ -> error "Insufficient arguments to swap." inMatch m = return m inExp (HsApp (Exp (HsApp e e1)) e2))::HsExpP) | expToPNT e == pnt = swap e1 e2 exp inExp e = return e
LIL Why refactor Haskell? The only design artefact is (in) the code. Semantics of functional languages support large- scale transformations (?) Building real tools to support functional programming … heavy lifting. Platform for research and experimentation.
LIL Lift / demote f x y = … h … where h = … Hide a function which is clearly subsidiary to f ; clear up the namespace. f x y = … (h y) … h y = … Makes h accessible to the other functions in the module and beyond. Free variables: which parameters of f are used in h ? Need h not to be defined at the top level, …, Type of h will generally change ….
LIL Algebraic or abstract type? data Tr a = Leaf a | Node a (Tr a) (Tr a) Tr Leaf Node flatten :: Tr a -> [a] flatten (Leaf x) = [x] flatten (Node s t) = flatten s ++ flatten t
LIL Algebraic or abstract type? data Tr a = Leaf a | Node a (Tr a) (Tr a) isLeaf = … isNode = … … Tr isLeaf isNode leaf left right mkLeaf mkNode flatten :: Tr a -> [a] flatten t | isleaf t = [leaf t] | isNode t = flatten (left t) ++ flatten (right t)
LIL Information required Lexical structure of programs, abstract syntax, binding structure, type system and module system.
LIL Program transformations Program optimisation source-to-source transformations to get more efficient code Program derivation calculating efficient code from obviously correct specifications Refactoring transforming code structure usually bidirectional and conditional. Refactoring = Transformation + Condition
LIL Conditions: renaming f to g “No change to the binding structure” 1. No two definitions of g at the same level. 2. No capture of g. 3. No capture by g.
LIL Capture of renamed identifier h x = … h … f … g … where g y = … f x = … h x = … h … g … g … where g y = … g x = …
LIL Capture by renamed identifier h x = … h … f … g … where f y = … f … g … g x = … h x = … h … g … g … where g y = … g … g … g x = …
LIL Refactoring by hand? By hand = in a text editor Tedious Error-prone Implementing the transformation … … and the conditions. Depends on compiler for type checking, … … plus extensive testing.
LIL Machine support invaluable Reliable Low cost of do / undo, even for large refactorings. Increased effectiveness and creativity.
LIL Demonstration of HaRe, hosted in vim.
LIL
LIL
LIL
LIL The refactorings in HaRe Rename Delete Lift / Demote Introduce definition Remove definition Unfold Generalise Add/remove parameters Move def between modules Delete/add to exports Clean imports Make imports explicit data type to ADT Short-cut, warm fusion All module aware
LIL HaRe design rationale Integrate with existing development tools. Work with the complete language: Haskell 98 Preserve comments and the formatting style. Reuse existing libraries and systems. Extensibility and scriptability.
LIL Information required Lexical structure of programs, abstract syntax, binding structure, type system and module system.
LIL The Implementation of HaRe Information gathering Information gathering Pre-condition checking Pre-condition checking Program transformation Program transformation Program rendering Program rendering Strafunski
LIL Finding free variables ‘by hand’ instance FreeVbls HsExp where freeVbls (HsVar v) = [v] freeVbls (HsApp f e) = freeVbls f ++ freeVbls e freeVbls (HsLambda ps e) = freeVbls e \\ concatMap paramNames ps freeVbls (HsCase exp cases) = freeVbls exp ++ concatMap freeVbls cases freeVbls (HsTuple _ es) = concatMap freeVbls es … Boilerplate code: 1000 noise : 100 significant.
LIL Strafunski Strafunski allows a user to write general (read generic), type safe, tree traversing programs, with ad hoc behaviour at particular points. Top-down / bottom up, type preserving / unifying, fullstopone
LIL Strafunski in use Traverse the tree accumulating free variables from components, except in the case of lambda abstraction, local scopes, … Strafunski allows us to work within Haskell … Other options? Generic Haskell, Template Haskell, AG, Scrap Your Boilerplate, …
LIL Rename an identifier rename:: (Term t)=>PName->HsName->t->Maybe t rename oldName newName = applyTP worker where worker = full_tdTP (idTP ‘ adhocTP ‘ idSite) idSite :: PName -> Maybe PName idSite name orig) | v == oldName = return (PN newName orig) idSite pn = return pn
LIL The coding effort Transformations: straightforward in Strafunski … … the chore is implementing conditions that the transformation preserves meaning. This is where much of our code lies.
LIL Program rendering example -- This is an example module Main where sumSquares x y = sq x + sq y where sq :: Int->Int sq x = x ^ pow pow = 2 :: Int main = sumSquares module Main where sumSquares x y = sq pow x + sq pow y where pow = 2 :: Int sq :: Int->Int->Int sq pow x = x ^ pow main = sumSquares This is an example module Main where sumSquares x y = sq pow x + sq pow y where pow = 2 :: Int sq :: Int->Int->Int sq pow x = x ^ pow main = sumSquares 10 20
LIL Token stream and AST White space + comments only in token stream. Modification of the AST guides the modification of the token stream. After a refactoring, the program source is recovered from the token stream not the AST. Heuristics associate comments with program entities.
LIL Work in progress ‘Fold’ against definitions … find duplicate code. All, some or one? Effect on the interface … f x = … e … e … Symbolic evaluation Data refactorings Interfaces … ‘bad smell’ detection.
LIL API and DSL Refactorings Refactoring utilities Refactoring utilities Strafunski Haskell Combining forms Library functions Grammar as data Strafunski ???
LIL What have we learned? Efficiency and robustness of libraries in question. type checking large systems, linking, editor script languages (vim, emacs). The cost of infrastructure in building practical tools. Reflections on Haskell itself.
LIL Reflections on Haskell Cannot hide items in an export list (cf import). Field names for prelude types? Scoped class instances not supported. ‘Ambiguity’ vs. name clash. ‘Tab’ is a nightmare! Correspondence principle fails …
LIL Correspondence Operations on definitions and operations on expressions can be placed in one to one correspondence (R.D.Tennent, 1980)
LIL Correspondence Definitions where f x y = e f x | g1 = e1 | g2 = e2 Expressions let \x y -> e f x = if g1 then e1 else if g2 … …
LIL Function clauses f x | g1 = e1 f x | g2 = e2 Can ‘fall through’ a function clause … no direct correspondence in the expression language. f x = if g1 then e1 else if g2 … No clauses for anonymous functions … no reason to omit them.
LIL Haskell 98 vs. Erlang: generalities Haskell 98: a lazy, statically typed, purely functional programming language featuring higher-order functions, polymorphism, type classes and monadic effects. Erlang: a strict, dynamically typed functional programming language with support for concurrency, communication, distribution and fault- tolerance.
LIL Haskell 98 vs. Erlang: example % Factorial In Erlang. -module (fact). -export ([fac/1]). fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1). % Factorial In Erlang. -module (fact). -export ([fac/1]). fac(0) -> 1; fac(N) when N > 0 -> N * fac(N-1). -- Factorial In Haskell. module Fact(fac) where fac :: Int -> Int fac 0 = 1 fac n | n>0 = n * fac(n-1) -- Factorial In Haskell. module Fact(fac) where fac :: Int -> Int fac 0 = 1 fac n | n>0 = n * fac(n-1)
LIL Haskell 98 vs. Erlang: pragmatics Type system makes implementation complex. Layout and comment preservation. Types also affect the refactorings themselves. Clearer semantics for refactorings, but more complex infrastructure. Untyped traversals much simpler. Use the layout given by emacs. Use cases which cannot be understood statically. Dynamic semantics of Erlang makes refactorings harder to pin down.
LIL Challenges of Erlang refactoring Multiple binding occurrences of variables. Indirect function call or function spawn: apply (lists, rev, [[a,b,c]]) Multiple arities … multiple functions: rev/1 Concurrency Refactoring within a design library: OTP. Side-effects.
LIL Generalisation and side-effects -module (test). -export([f/0]). repeat(0) -> ok; repeat(N) -> io:format (“hello\n"), repeat(N-1). f( ) -> repeat(5). -module (test). -export([f/0]). repeat(0) -> ok; repeat(N) -> io:format (“hello\n"), repeat(N-1). f( ) -> repeat(5). -module (test). -export([f/0]). repeat(A, 0) -> ok; repeat(A, N) -> A, repeat(A,N-1). f( ) -> repeat (io:format (“hello\n”), 5). -module (test). -export([f/0]). repeat(A, 0) -> ok; repeat(A, N) -> A, repeat(A,N-1). f( ) -> repeat (io:format (“hello\n”), 5).
LIL Generalisation and side-effects -module (test). -export([f/0]). repeat(0) -> ok; repeat(N) -> io:format (“hello\n"), repeat(N-1). f( ) -> repeat(5). -module (test). -export([f/0]). repeat(0) -> ok; repeat(N) -> io:format (“hello\n"), repeat(N-1). f( ) -> repeat(5). -module (test). -export([f/0]). repeat(A, 0) -> ok; repeat(A, N) -> A(), repeat(A,N-1). f( ) -> repeat ( fun( )-> io:format (“hello\n”), 5). -module (test). -export([f/0]). repeat(A, 0) -> ok; repeat(A, N) -> A(), repeat(A,N-1). f( ) -> repeat ( fun( )-> io:format (“hello\n”), 5).
LIL The Wrangler Scanner/Parser Parse Tree Syntax tools AST annotated with comments Program analysis and transformation by the refactorer Transformed AST Pretty printer Program source Refactorer AST + comments + binding structure
LIL Teaching and learning design Exciting prospect of using a refactoring tool as an integral part of an elementary programming course. Learning a language: learn how you could modify the programs that you have written … … appreciate the design space, and … the features of the language.
LIL Conclusions Refactoring + functional programming: good fit. Real win from available libraries … with work. Substantial effort in infrastructure. De facto vs de jure : GHC vs Haskell 98. Correctness and verification … Language independence …