Playing with Haskell Data Neil Mitchell. Overview The “boilerplate” problem Haskell’s weakness (really!) Traversals and queries Generic traversals and.

Slides:



Advertisements
Similar presentations
The Art of Avoiding Work
Advertisements

Optional Static Typing Guido van Rossum (with Paul Prescod, Greg Stein, and the types-SIG)
Introduction to Algorithms Quicksort
Java Review Interface, Casting, Generics, Iterator.
Stacks & Their Applications COP Stacks  A stack is a data structure that stores information arranged like a stack.  We have seen stacks before.
Semantic Analysis Chapter 4. Role of Semantic Analysis Following parsing, the next two phases of the "typical" compiler are – semantic analysis – (intermediate)
0 PROGRAMMING IN HASKELL Chapter 10 - Declaring Types and Classes.
String is a synonym for the type [Char].
Trees. 2 Definition of a tree A tree is like a binary tree, except that a node may have any number of children Depending on the needs of the program,
Trees. Definition of a tree A tree is like a binary tree, except that a node may have any number of children –Depending on the needs of the program, the.
CS112 Intro to CS II with C++ Introduction. 6/25/2015Gene Itkis; cs1122 Problems and Programs Program helps articulate structure of Problem, and maybe.
Transformation and Analysis of Haskell Source Code Neil Mitchell 
C++ fundamentals.
CSE 413 Programming Languages & Implementation Hal Perkins Autumn 2012 Context-Free Grammars and Parsing 1.
Week 7 - Wednesday.  What did we talk about last time?  Recursive running time  Master Theorem  Introduction to trees.
Binary Trees. Node structure Data A data field and two pointers, left and right.
8/19/2015© Hal Perkins & UW CSEC-1 CSE P 501 – Compilers Parsing & Context-Free Grammars Hal Perkins Winter 2008.
Composite Design Pattern. Motivation – Dynamic Structure.
Game Trees: MiniMax strategy, Tree Evaluation, Pruning, Utility evaluation Adapted from slides of Yoonsuck Choe.
Working with Numbers in Alice - Converting to integers and to strings - Rounding numbers. - Truncating Numbers Samantha Huerta under the direction of Professor.
Minimax Trees: Utility Evaluation, Tree Evaluation, Pruning CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 2 Adapted from slides of Yoonsuck.
CS2852 Week 8, Class 2 Today Tree terminology Non-Binary and Non-Search Trees Tree Traversals (Remaining slides not yet shown) Tomorrow: Quiz Implementing.
Semantic Analysis Legality checks –Check that program obey all rules of the language that are not described by a context-free grammar Disambiguation –Name.
Lists in Python.
Scrap Your Boilerplate /~simonpj/papers/hmap/
Effective Java: Generics Last Updated: Spring 2009.
Data Structures : Project 5 Data Structures Project 5 – Expression Trees and Code Generation.
Heapsort. Heapsort is a comparison-based sorting algorithm, and is part of the selection sort family. Although somewhat slower in practice on most machines.
Copyright © Curt Hill Generic Classes Template Classes or Container Classes.
Lee CSCE 314 TAMU 1 CSCE 314 Programming Languages Haskell: Types and Classes Dr. Hyunyoung Lee.
Trees and beyond Tutorial #3 CPSC 261. Trees are just an example The next two weeks in labs we are playing with trees – Trees are interesting – Trees.
Chapter 6 (cont’) Binary Tree. 6.4 Tree Traversal Tree traversal is the process of visiting each node in the tree exactly one time. This definition does.
Automated Patch Generation Adapted from Tevfik Bultan’s Lecture.
Programming Abstractions Cynthia Lee CS106X. Topics:  Binary Search Tree (BST) › Starting with a dream: binary search in a linked list? › How our dream.
Interfaces About Interfaces Interfaces and abstract classes provide more structured way to separate interface from implementation
Copyright © 2008 Model Driven Solutions EKB XML Interface Jim Logan September 2008 Formerly Data Access Technologies.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
First Order Haskell Neil Mitchell York University λ.
(2,4) Trees1 What are they? –They are search Trees (but not binary search trees) –They are also known as 2-4, trees.
Scrap your boilerplate: generic programming in Haskell Ralf Lämmel, Vrije University Simon Peyton Jones, Microsoft Research.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 12 – C: structs, linked lists, and casts.
AOP/cross-cutting What is an aspect?. An aspect is a modular unit that cross-cuts other modular units. What means cross-cutting? Apply AOP to AOP. Tease.
Week 7 - Wednesday.  What did we talk about last time?  Recursive running time  Master Theorem  Symbol tables.
Copyright © 2009 Elsevier Chapter 4 :: Semantic Analysis Programming Language Pragmatics Michael L. Scott.
Copyright © Curt Hill Other Trees Applications of the Tree Structure.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
int [] scores = new int [10];
More Data Types CSCE 314 Spring CSCE 314 – Programming Studio Defining New Data Types Three ways to define types: 1.type – Define a synonym for.
OOP Basics Classes & Methods (c) IDMS/SQL News
Stacks This presentation shows – how to implement the stack – how it can be used in real applications.
INF3110 Group 2 EXAM 2013 SOLUTIONS AND HINTS. But first, an example of compile-time and run-time type checking Imagine we have the following code. What.
Haskell With Go Faster Stripes Neil Mitchell. Catch: Project Overview Catch checks that a Haskell program won’t raise a pattern match error head [] =
CONDITIONALS CITS1001. Scope of this lecture if statements switch statements Source ppts: Objects First with Java - A Practical Introduction using BlueJ,
ENCAPSULATION. WHY ENCAPSULATE? So far, the objects we have designed have all of their methods and variables visible to any part of the program that has.
String is a synonym for the type [Char].
Programming Abstractions
Data Structures: Disjoint Sets, Segment Trees, Fenwick Trees
Parsing & Context-Free Grammars
MCS680: Foundations Of Computer Science
CS 326 Programming Languages, Concepts and Implementation
COMP 103 Binary Search Trees.
Parsing & Context-Free Grammars Hal Perkins Autumn 2011
(Slides copied liberally from Ruth Anderson, Hal Perkins and others)
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees
PROGRAMMING IN HASKELL
Arrays and Collections
PROGRAMMING IN HASKELL
Course Overview PART I: overview material PART II: inside a compiler
Parsing & Context-Free Grammars Hal Perkins Summer 2004
Parsing & Context-Free Grammars Hal Perkins Autumn 2005
Presentation transcript:

Playing with Haskell Data Neil Mitchell

Overview The “boilerplate” problem Haskell’s weakness (really!) Traversals and queries Generic traversals and queries Competitors (SYB and Compos) Benchmarks

Data structures A tree of typed nodes Parent/child relationship is important

A concrete data structure data Expr = Val Int | Neg Expr | Add Expr Expr | Sub Expr Expr Simple arithmetic expressions

Task: Add one to every Val inc :: Expr -> Expr inc (Val i) = Val (i+1) inc (Neg x) = Neg (inc x) inc (Add x y) = Add (inc x) (inc y) inc (Sub x y) = Sub (inc x) (inc y) What is the worst thing about this code?

Many things! 1. If we add Mul, we need to change 2. The action is one line, obscured 3. Tedious, repetitive, dull 4. May contain subtle bugs, easy to overlook 5. Way too long

The boilerplate problem A lot of tasks: 1. Navigate a data structure (boilerplate) 2. Do something (action) Typically boilerplate is: Repetitive Tied to the data structure Much bigger than the action

Compared to Pseudo-OO 1 class Expr class Val : Expr {int i} class Neg : Expr {Expr a} class Add : Expr {Expr a, b} class Sub : Expr {Expr a, b} 1) Java/C++ are way to verbose to fit on slides!

Inc, in Pseudo-OO void inc(x){ if (x is Val) x.i += 1; if (x is Neg) inc(x.a) if (x is Add) inc(x.a); inc(x.b) if (x is Mul) inc(x.a); inc(x.b) } Casts, type evaluation etc omitted

Haskell’s weakness OO actually has a lower complexity Hidden very effectively by horrible syntax In OO objects are deconstructed In Haskell data is deconstructed and reconstructed OO destroys original, Haskell keeps original

Comparing inc for Add Haskell inc (Add x y) = Add (inc x) (inc y) OO if (x is Add) inc(x.a); inc(x.b) Both deconstruct Add (follow its fields) Only Haskell rebuilds a new Add

Traversals and Queries What are the common forms of “boilerplate”? Traversals Queries Other forms do exist, but are far less common

Traversals Move over the entire data structure Do “action” to each node Return a new data structure The previous example (inc) was a traversal

Queries Extract some information out of the data Example, what values are in an expression?

A query vals :: Expr -> [Int] vals (Val i) = [i] vals (Neg x) = vals x vals (Add x y) = vals x ++ vals y vals (Mul x y) = vals x ++ vals y Same issues as traversals

Generic operations Identify primitives Support lots of operations Neatly Minimal number of primitives These goals are in opposition! Here follow my basic operations…

Generic Queries allOver :: a -> [a] [,,,,, ]

The vals query vals x = [i | Val i <- allOver x] Uses Haskell list comprehensions – very handy for queries Can anyone see a way to improve on the above? Short, sweet, beautiful

More complex query Find all negative literals that the user negates: [i | Neg (Val i) <- allOver x, i < 0] Rarely gets more complex than that

Generic Traversals Have some “mutator” Apply to each item traversal :: (a -> a) -> a -> a 1. Bottom up 2. Top down – automatic 3. Top down – manual

Bottom-up traversal mapUnder :: (a -> a) -> a -> a

The inc traversal inc x = mapUnder f x where f (Val x) = Val (x+1) f x = x Say the action (first line) Boilerplate is all do nothing

Top-down queries Bottom up is almost always best Sometimes information is pushed down Example: Remove negation of add f (Neg (Add x y)) = Add (Neg x) (Neg y) Does not work, x may be Add f (Neg (Add x y)) = Add (f (Neg x)) (f (Neg y))

Top-down traversal mapOver :: (a -> a) -> a -> a Produces one element per call

One element per call? Sometimes a traversal does not produce one element If zero made, need to explicitly continue In two made, wasted work Can write an explicit traversal

Top-down manual compos :: (a -> a) -> a -> a

Compos noneg (Neg (Add x y)) = Add (noneg (Neg x)) (noneg (Neg y)) noneg x = compos noneg x Compos does no recursion, leaves this to the user The user explicitly controls the flow

Other types of traversal Monadic variants of the above allOverContext :: a -> [(a, a -> a)] Useful for doing something once fold :: ([r] -> a) -> (x -> a -> r) -> x -> r mapUnder with a different return

The Challenge Pick an operation Will code it up “live”

Traversals for your data Haskell has type classes allOver :: Play a => a -> [a] Each data structure has its own methods allOver Expr /= allOver Program

Minimal interface Writing 8+ traversals is annoying Can define all traversals in terms of one: replaceChildren :: x -> ([x], [x] -> x) Get all children Change all children

Properties replaceChildren :: x -> ([x], [x] -> x) (children, generate) = replaceChildren x generate children == generate y length y == length children

Some examples mapOver f x = gen (map (mapOver f) child) where (child,gen) = replaceChildren (f x) mapUnder f x = f (gen child2) where (child,gen) = replaceChildren x child2 = map (mapUnder f) child) allOver x = x : concatMap allOver child Where (child,gen) = replaceChildren x

Writing replaceChildren A little bit of thought Reasonably easy Using GHC, these instances can be derived automatically

Competitors: SYB + Compos Not Haskell 98, GHC only Use scary types… Compos Provides compos operator and fold Scrap Your Boilerplate (SYB) Very generic traversals

Compos Based on GADT’s No support for bottom-up traversals compos :: (forall a. a -> m a) -> (forall a b. m (a -> b) -> m a -> m b) -> (forall a. t a -> m (t a)) -> t c -> m (t c)

Scrap Your Boilerplate (SYB) Full generic traversals Based on similar idea of children But is actual children, of different types! gfoldl :: (forall a b. Term a => w (a -> b) -> a -> w b) -> (forall g. g -> w g) -> a -> w a

SYB vs Play, children SYB Play

SYB continued Traversals are based on types: 0 `mkQ` f f :: Expr -> Int mkQ converts a function on Expr, to a function on all types Then apply mkQ everywhere

Paradise benchmark salaryBill :: Company -> Float salaryBill = everything (+) (0 `mkQ` billS) billS :: Salary -> Float billS (S f) = f salaryBill c = case c of S s -> s _ -> composOpFold 0 (+) salaryBill c salaryBill x = sum [x | S x <- allOverEx x] SYB Compos Play

Runtime cost - queries

Runtime cost - traversals

In the real world? Used in Catch about 100 times Used in Yhc.Core library Used by other people Yhc Javascript converter Settings file converter

Conclusions Generic operations with simple types Only 1 simple primitive If you only remember two operations: allOver – queries mapUnder – traversals