Type Checking, Inference, & Elaboration CS153: Compilers Greg Morrisett.

Slides:



Advertisements
Similar presentations
Types and Programming Languages Lecture 13 Simon Gay Department of Computing Science University of Glasgow 2006/07.
Advertisements

Functional Programming Lecture 10 - type checking.
A Third Look At ML 1. Outline More pattern matching Function values and anonymous functions Higher-order functions and currying Predefined higher-order.
Type Inference David Walker COS 320. Criticisms of Typed Languages Types overly constrain functions & data polymorphism makes typed constructs useful.
1 A simple abstract interpreter to compute Sign s.
A little bit on Class-Based OO Languages CS153: Compilers Greg Morrisett.
Closures & Environments CS153: Compilers Greg Morrisett.
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
Semantics Static semantics Dynamic semantics attribute grammars
CMSC 330: Organization of Programming Languages Tuples, Types, Conditionals and Recursion or “How many different OCaml topics can we cover in a single.
Letrec fact(n) = if zero?(n) then 1 else *(n, (fact sub1(n))) 4.4 Type Inference Type declarations aren't always necessary. In our toy typed language,
Cs776 (Prasad)L4Poly1 Polymorphic Type System. cs776 (Prasad)L4Poly2 Goals Allow expression of “for all types T” fun I x = x I : ’a -> ’a Allow expression.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
1 Mooly Sagiv and Greta Yorsh School of Computer Science Tel-Aviv University Modern Compiler Design.
1 How to transform an analyzer into a verifier. 2 OUTLINE OF THE LECTURE a verification technique which combines abstract interpretation and Park’s fixpoint.
Exercise: Balanced Parentheses
Recap 1.Programmer enters expression 2.ML checks if expression is “well-typed” Using a precise set of rules, ML tries to find a unique type for the expression.
Winter Compiler Construction T7 – semantic analysis part II type-checking Mooly Sagiv and Roman Manevich School of Computer Science Tel-Aviv.
8. Introduction to Denotational Semantics. © O. Nierstrasz PS — Denotational Semantics 8.2 Roadmap Overview:  Syntax and Semantics  Semantics of Expressions.
1 Programming Languages (CS 550) Lecture Summary Functional Programming and Operational Semantics for Scheme Jeremy R. Johnson.
Getting started with ML ML is a functional programming language. ML is statically typed: The types of literals, values, expressions and functions in a.
Algebraic Optimization CS153: Compilers Greg Morrisett.
Type checking © Marcelo d’Amorim 2010.
ML: a quasi-functional language with strong typing Conventional syntax: - val x = 5; (*user input *) val x = 5: int (*system response*) - fun len lis =
Compiler Construction
Control Flow Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
ML: a quasi-functional language with strong typing Conventional syntax: - val x = 5; (*user input *) val x = 5: int (*system response*) - fun len lis =
Catriel Beeri Pls/Winter 2004/5 type reconstruction 1 Type Reconstruction & Parametric Polymorphism  Introduction  Unification and type reconstruction.
Introduction to ML - Part 2 Kenny Zhu. What is next? ML has a rich set of structured values Tuples: (17, true, “stuff”) Records: {name = “george”, age.
Introduction to ML Last time: Basics: integers, Booleans, tuples,... simple functions introduction to data types This time, we continue writing an evaluator.
Functional programming: LISP Originally developed for symbolic computing First interactive, interpreted language Dynamic typing: values have types, variables.
CSE341: Programming Languages Lecture 11 Type Inference Dan Grossman Winter 2013.
CS321 Functional Programming 2 © JAS Type Checking Polymorphism in Haskell is implicit. ie the system can derive the types of all objects. This.
Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L06-1 September 26, 2006http:// Type Inference September.
Exercise Determine the output of the following program assuming static and dynamic scoping. Explain the difference, if there is any. object MyClass { val.
A Third Look At ML Chapter NineModern Programming Languages, 2nd ed.1.
12/9/20151 Programming Languages and Compilers (CS 421) Elsa L Gunter 2112 SC, UIUC Based in part on slides by Mattox.
Error Example - 65/4; ! Toplevel input: ! 65/4; ! ^^ ! Type clash: expression of type ! int ! cannot have type ! real.
CMSC 330: Organization of Programming Languages Lambda Calculus and Types.
CS412/413 Introduction to Compilers Radu Rugina Lecture 13 : Static Semantics 18 Feb 02.
CMSC 330: Organization of Programming Languages Operational Semantics.
Arrays Using array as an expression, on the right-hand side Assigning to an array.
Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. L05-1 September 21, 2006http:// Types and Simple Type.
Lesson 10 Type Reconstruction
Abstract Syntax cs7100 (Prasad) L7AST.
Type Checking and Type Inference
Programming Languages and Compilers (CS 421)
CSE341: Programming Languages Lecture 11 Type Inference
Programming Languages Dan Grossman 2013
ML: a quasi-functional language with strong typing
Reasoning about code CSE 331 University of Washington.
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
The Metacircular Evaluator
Objective caml Daniel Jackson MIT Lab for Computer Science 6898: Advanced Topics in Software Design March 18, 2002.
CSE341: Programming Languages Lecture 11 Type Inference
The Metacircular Evaluator
CSE 341 Section 5 Winter 2018.
CSE341: Programming Languages Lecture 11 Type Inference
SUNY-Buffalo Zhi Yang Courtesy of Greg Morrisett
The Metacircular Evaluator (Continued)
6.001 SICP Variations on a Scheme
CSE-321 Programming Languages Introduction to Functional Programming
6.001 SICP Interpretation Parts of an interpreter
CSE341: Programming Languages Lecture 11 Type Inference
Assignments and Procs w/Params
Recursive Procedures and Scopes
CSE341: Programming Languages Lecture 11 Type Inference
PROGRAMMING IN HASKELL
CSE341: Programming Languages Lecture 11 Type Inference
Presentation transcript:

Type Checking, Inference, & Elaboration CS153: Compilers Greg Morrisett

Statics After parsing, we have an AST. Critical issue: –not all operations are defined on all values. –e.g., (3 / 0), sub("foo",5), 42(x) Options 1.don't worry about it (C, C++, etc.) 2.force all operations to be total (Scheme) e.g., (3 / 0) -> 0, 42(x) -> try to rule out ill-formed expressions (ML)

Type Soundness Construct a model of the source language –i.e., interpreter –This tells us where operations are partial. –And partiality is different for different languages (e.g., "foo" + "bar" may be meaningful in some languages, but not others.) Construct a function TC: AST -> bool –when true, should ensure interpreting the AST does not result in an undefined operation. Prove that TC is correct.

Simple Language: type tipe = Int_t | Arrow_t of tipe*tipe | Pair_t of tipe*tipe type exp = Var of var | Int of int | Plus_i of exp*exp | Lambda of var * tipe * exp | App of exp*exp | Pair of exp * exp | Fst of exp | Snd of exp

Interpreter: let rec interp (env:var->value)(e:exp) = match e with | Var x -> env x | Int I -> Int_v i | Plus_i(e1,e2) -> (match interp env e1, interp env e2 of | Int i, Int j -> Int_v(i+j) | _,_ => error()) | Lambda(x,t,e) => Closure_v{env=env,code=(x,e)} | App(e1,e2) => (match interp env e1, interp env e2 of | Closure_v{env=cenv,code=(x,e)},v -> interp (extend cenv x v) e | _,_ -> error()

Type Checker: let rec tc (env:var->tipe) (e:exp) = match e with | Var x -> env x | Int _ -> Int_t | Plus_i(e1,e2) -> (match tc env e1, tc env e with | Int_t, Int_t -> Int_t | _,_ => error()) | Lambda(x,t,e) -> Fn_t(t,tc (extend env x t) e) | App(e1,e2) -> (match (tc env e1, tc env e2) with | Arrow_t(t1,t2), t -> if (t1 != t) then error() else t2 | _,_ -> error())

Notes: In the interpreter, we only evaluate the body of a function when it's applied. In the type-checker, we always check the body of the function (even if it's never applied.) Because of this, we must assume the input has some type (say t 1 ) and reflect this in the type of the function (t 1 -> t 2 ). Dually, at a call site (e 1 e 2 ), we don't know what closure we're going to get. But we can calculate e 1 's type, check that e 2 is an argument of the right type, and also determine what type e 1 will return.

Growing the language type tipe =... | Bool_t type exp =... | True | False | If of exp*exp*exp let rec interp env e =... | True -> True_v | False -> False_v | If(e1,e2,e3) -> (match interp env e1 with True_v -> interp env e2 | False_v -> interp env e3 | _ => error())

Type-Checking let rec tc (env:var->tipe) (e:exp) = match e with... | True -> Bool_t | False -> Bool_t | If(e1,e2,e3) -> (let (t1,t2,t3) = (tc env e1,tc env e2,tc env e3) in match t1 with | Bool_t -> if (t2 != t3) then error() else t2 | _ => error())

Refining Types We can easily add new types that distinguish different subsets of values. type tipe =... | True_t | False_t | Bool_t | Pos_t | Neg_t | Zero_t | Int_t | Any_t

Modified Type-Checker let rec tc (env:var->tipe) (e:exp) =... | True -> True_t | False -> False_t | If(e1,e2,e3) -> (match tc env e1 with True_t -> tc env e2 | False_t -> tc env e3 | Bool_t -> (let (t2,t3) = (tc env e2, tc env e3) in lub t2 t3) | _ => error())

Least Upper Bound let lub t1 t2 = match t1, t2 with | True_t (Bool_t|False_t) -> Bool_t | False_t,(Bool_t|True_t) -> Bool_t | Zero_t, (Neg_t|Pos_t|Int_t) -> Int_t | Neg_t, (Zero_t|Pos_t|Int_t) -> Int_t | Pos_t, (Zero_t|Pos_t|Int_t) -> Int_t | _,_ -> if (t1 = t2) then t1 else Any_t Bool_t True_t False_t Int_t Zero_t Pos_t Neg_t Any_t

Refining Integers into Zero,Neg,Pos let rec tc (env:var->tipe) (e:exp) =... | Int 0 -> Zero_t | Int i -> if i < 0 then Neg_t else Pos_t | Plus(e1,e2) -> (match tc env e1, tc env e2 with | Zero_t,t2 -> t2 | t1,Zero_t -> t1 | Pos_t,Pos_t -> Pos_t | Neg_t,Neg_t -> Neg_t | (Neg_t|Pos_t|Int_t),Int_t -> Int_t | Int_t,(Neg_t|Pos_t) -> Int_t | _,_ -> error())

Subtyping as Subsets If we think of types as sets of values, then a subtype corresponds to a subset and lub corresponds to union. e.g., Pos_t <= Int_t since every positive integer is also an integer. For conditionals, we want to find the least type of the types the two branches might return. –Adding "Any_t" ensures there is a type. –(Not always a good thing to have…) –Need NonPos_t, NonNeg_t and NonZero_t to get least upper bounds.

Extending Subtyping: What about pairs? –(T 1 * T 2 ) <= (U 1 * U 2 ) when T 1 <= U 1 and T 2 <= U 2 –But only when immutable! –Why? What about functions? –(T 1 -> T 2 ) U 2 ) when U 1 <= T 1 and T 2 <= U 2. –Why?

Problems with Mutability: let f(p :ref(Pos_t)) = let q :ref(Int_t) = p in q := 0; 42 div (!p)

Another Way to See it: Any shared, mutable data structure can be thought of as an immutable record of pairs of methods (with hidden state): –p : ref(Pos_t) => –p : { get: unit -> Pos_t, set: Pos_t -> unit } When is ref(T) <= ref(U)? When: –unit->T U or T <= U and –T->unit unit or U <= T. –Thus, only when T = U !

N-Tuples and Simple Records: (T 1 * … * T n * T n+1 ) <= (U 1 * … * U n ) when T i <= U i. –Why? Non-Permutable Records: { l 1 :T 1,…, l n :T n, l n+1 :T n+1 } <= { l 1 :U 1,…, l n :U n } when T i <= U i. –Assumes {x:int,y:int} != {y:int,x:int} –That is, the position of a label is independent of the rest of the labels in the type. –In SML (or for Java interfaces) this is not the case.

SML-Style Records: Compiler sorts by label. So if you write {y:int,z:int,x:int}, the compiler immediately rewrites it to {x:int,y:int,z:int}. So you need to know all of the labels to determine their positions. Consider: {y:int,z:int,x:int} <= {y:int,z:int} but {y,z,x} == {x,y,z} <= {y,z}

If you want both: If you want permutability & dropping, you need to either copy or use a dictionary: x y z p = {x=42,y=55,z=66}:{x:int,y:int,z:int} q : {y:int,z:int} = p y z

Type Inference let rec tc (env:(var*tipe) list) (e:exp) = match e with | Var x -> lookup env x | Lambda(x,e) -> (let t = guess() in Arrow_t(t,tc (extend env x t) e)) | App(e1,e2) -> (match tc env e1, tc env e2 with | Arrow_t(t1,t2), t -> if t1 != t then error() else t2 | _,_ => error())

Extend Types with Guesses: type tipe = Int_t | Arrow_t of tipe*tipe | Guess of (tipe option ref) fun guess() = Guess(ref None)

Must Handle Guesses | Lambda(x,e) -> let t = guess() in Arrow_t(t,tc (extend env x t) e) | App(e1,e2) -> (match tc env e1, tc env e2 with | Arrow_t(t1,t2), t -> if t1 != t then error() else t2 | Guess (r as ref None), t -> let t2 = guess() in r := Some(Fn_t(t,t2)); t2 | Guess (ref Some (Fn_t(t1,t2))), t -> if t1 != t then error() else t2

Cleaner: let rec tc (env: (var*tipe) list) (e:exp) = match e with | Var x -> lookup env x | Lambda(x,e) -> let t = guess() in Fn_t(t,tc (extend env x t) e) | App(e1,e2) -> let (t1,t2) = (tc env e1, tc env e2) in let t = guess() in if unify t1 (Arrow_t(t2,t)) then t else error()

Where: let rec unify (t1:tipe) (t2:tipe):bool = if (t1 = t2) then true else match t1,t2 with | Guess(ref(Some t1')), _ -> unify t1’ t2 | Guess(r as (ref None)), t2 -> (r := t2; true) | _, Guess(_) -> unify t2 t1 | Int_t, Int_t -> true | Arrow_t(t1a,t1b), Arrow_t(t2a,t2b)) -> unify t1a t2a && unify t1b t2b

Subtlety Consider: fun x => x x We guess g1 for x –We see App(x,x) –recursive calls say we have t1=g1 and t2=g1. –We guess g2 for the result. –And unify(g1,Arrow_t(g1,g2)) –So we set g1 := Some(Arrow_t(g1,g2)) –What happens if we print the type?

Fixes: Do an "occurs" check in unify: let rec unify (t1:tipe) )t2:tipe):bool = if (t1 = t2) then true else case (t1,t2) of (Guess(r as ref None),_) => if occurs r t2 then error() else (r := Some t2; true) | … Alternatively, be careful not to loop anywhere. –In particular, when comparing t1 = t2, we must code up a graph equality, not a tree equality.

Polymorphism: Consider: fun x => x We guess g1 for x –We see x. –So g1 is the result. –We return Arrow_t(g1,g1) –g1 is unconstrained. –We could constraint it to Int_t or Arrow_t(Int_t,Int_t) or any type. –In fact, we could re-use this code at any type.

ML Expressions: type exp = Var of var | Int of int | Lambda of var * exp | App of exp*exp | Let of var * exp * exp

Naïve ML Type Inference: let rec tc (env: (var*tipe) list) (e:exp) = match e with | Var x -> lookup env x | Lambda(x,e) -> let t = guess() in Arrow_t(t,tc (extend env x t) e) end | App(e1,e2) -> let (t1,t2) = (tc env e1, tc env e2) in let t = guess() in if unify t1 (Fn_t(t2,t)) then t else error() | Let(x,e1,e2) => (tc env e1; tc env (substitute(e1,x,e2))

Example: let id = fn x => x in (id 3, id "fred") end ====> ((fun x => x) 3, (fun x => x) "fred")

Better Approach (DM): type tvar = string type tipe = Int_t | Arrow_t of tipe*tipe | Guess of (tipe option ref) | Var_t of tvar type tipe_scheme = Forall of (tvar list * tipe)

ML Type Inference let rec tc (env:(var*tipe_scheme) list) (e:exp) = match e with | Var x -> instantiate(lookup env x) | Int _ -> Int_t | Lambda(x,e) -> let g = guess() in Arrow_t(g,tc (extend env x (Forall([],t)) e) | App(e1,e2) -> let (t1,t2,t) = (tc env e1,tc env e2,guess()) in if unify(t1,Fn_t(t2,t)) then t else error() | Let(x,e1,e2) -> let s = generalize(env,tc env e1) in tc (extend env x s) e2 end

Instantiation let instantiate(s:tipe_scheme):tipe = let val Forall(vs,t) = s val vs_and_ts : (var*tipe) list = map (fn a => (a,guess()) vs in substitute(vs_and_ts,t) end

Generalization: let generalize(e:env,t:tipe):tipe_scheme = let t_gs = guesses_of_tipe t in let env_list_gs = map (fun (x,s) -> guesses_of s) e in let env_gs = foldl union empty env_list_gs let diff = minus t_gs env_gs in let gs_vs = map (fun g -> (g,freshvar())) diff in let tc = subst_guess(gs_vs,t) in Forall(map snd gs_vs, tc) end

Explanation: Each let-bound value is generalized. –e.g., g->g generalizes to Forall a.a -> a. Each use of a let-bound variable is instantiated with fresh guesses: –e.g., if f:Forall a.a->a, then in f e, then the type we assign to f is g->g for some fresh guess g. But we can't generalize guesses that might later become constrained. –Sufficient to filter out guesses that occur elsewhere in the environment. –e.g., if the expression has type g1->g2 and y:g1, then we might later use y in a context that demands it has type int, such as y+1.

Effects: The algorithm given is equivalent to substituting the let-bound expression. But in ML, we evaluate CBV, not CBN! let id = (print "Hello"; fn x => x) in (id 42, id "fred”) != ((print "Hello";fn x=>x) 42, (print "Hello";fn x=>x) "fred")

Problem: let r = ref (fn x=>x) (* r : Forall 'a.ref('a->'a) *) in r := (fn x => x+1); (* r:ref(int->int) *) (!r)("fred") (* r:ref(string->string) *)

"Value Restriction" When is let x=e1 in e2 equivalent to subst(e1,x,e2) ? If e1 has no side effects. –reads/writes/allocation of refs/arrays. –input, output. –non-termination. So only generalize when e1 is a value. –or something easy to prove equivalent to a value.

Real Algorithm: let rec tc (env:var->tipe_scheme) (e:exp) = match e with... | Let(x,e1,e2) -> let s = if may_have_effects e1 then Forall([],tc env e1) else generalize(env,tc env e1) in tc (extend env x s) e2 end

Checking Effects: let rec may_have_effects e = match e with | Int _ -> false | Var _ -> false | Lambda _ -> false | Pair(e1,e2) -> may_have_effects e1||may_have_effects e2 | App _ -> true