Typed Compilation of Recursive Datatypes Joseph C. Vanderwaart, Derek Dreyer, Leaf Petersen, Karl Crary, Robert Harper, and Perry Cheng Carnegie Mellon University TLDI 2003
2 SML Datatypes Elegant mechanism for defining recursive variant types, such as: datatype intlist = Nil | Cons of int * intlist Important that constructor applications and pattern matching should be implemented efficiently Subject of this talk: –How to implement SML datatypes efficiently in a type-preserving compiler
3 Formal Framework Harper and Stone’s type-theoretic interpretation of Standard ML: –“Elaborates” SML programs into a type theory Reasons for using HS: –Models first phase of type-preserving compiler, in particular the TILT compiler (developed at CMU) –Can explain datatype semantics in terms of type theory
4 Overview Three interpretations of datatypes: –Harper-Stone interpretation –Transparent interpretation –Coercion interpretation Comparison on three axes: –Efficiency –Fidelity to the Definition of SML –Meta-theoretic complexity
The Harper-Stone Interpretation
6 Datatype Semantics SML datatypes are generative: –Identical datatype declarations in separate modules yield distinct (abstract) types HS elaborates datatypes as modules providing: –The datatype itself defined as a recursive sum type –Functions to construct and destruct values of the datatype HS models generativity by “sealing” the datatype module with an abstract signature
7 ExpDec Example datatype exp = VarExp of var | LetExp of dec * exp and dec = ValDec of var * exp | SeqDec of dec * dec VarExp(v) ¼ “v” LetExp(d,e) ¼ “let d in e” ValDec(v,e) ¼ “val v = e” SeqDec(d1,d2) ¼ “d1; d2”
8 ExpDec Implementation structure ExpDec :> EXPDEC = struct type exp = 1 ( , ).(var + * , var * + * ) type dec = 2 ( , ).(var + * , var * + * ) fun exp_in x = roll exp (x) fun exp_out x = unroll exp (x) fun dec_in x = roll dec (x) fun dec_out x = unroll dec (x) end
9 ExpDec Interface signature EXPDEC = sig type exp type dec val exp_in : var + (dec * exp) -> exp val exp_out : exp -> var + (dec * exp) val dec_in : (var * exp) + (dec * dec) -> dec val dec_out : dec -> (var * exp) + (dec * dec) end
10 Elaborating Constructor Calls Client of the datatype does the injection into the sum, then calls the datatype’s “ in ” function: VarExp(v) Ã ExpDec.exp_in(inj 1 (v)) LetExp(d,e) Ã ExpDec.exp_in(inj 2 (d,e)) ValDec(v,e) Ã ExpDec.dec_in(inj 1 (v,e)) SeqDec(d1,d2) Ã ExpDec.dec_in(inj 2 (d1,d2)) But the cost of function calls to the in functions is too expensive.
11 Inlining the Constructor Calls We would like to inline the roll ’s to avoid calling the exp_in and dec_in functions: VarExp(v) Ã roll ExpDec.exp (inj 1 (v)) LetExp(d,e) Ã roll ExpDec.exp (inj 2 (d,e)) ValDec(v,e) Ã roll ExpDec.dec (inj 1 (v,e)) SeqDec(d1,d2) Ã roll ExpDec.dec (inj 2 (d1,d2)) But the definitions of exp and dec are not known outside of ExpDec, so inlining the roll ’s is ill-typed!
12 Separate Compilation Not a problem if client of datatype defined in same compilation unit: –Unseal the datatype ) roll ’s become well-typed Is a problem if client of datatype is defined in separately compiled module: –Datatype is an abstract import of client –Can’t assume knowledge of implementation –Similar problem for datatypes in functor arguments
A Transparent Interpretation
14 Making Datatypes Transparent Expose the implementation of a datatype as a recursive sum type in its interface: signature EXPDEC = sig type exp = 1 ( , ).(var + * , var * + * ) type dec = 2 ( , ).(var + * , var * + * ) (* in and out function specs as before *) end Inlining calls to the in and out functions is now well-typed outside of ExpDec
15 Implications of Transparency Datatypes are no longer generative –Identically defined datatypes are “visibly” equal –More types are equivalent, more programs may typecheck Matching a datatype specification is harder –To match a datatype spec, a datatype must now be implemented as a particular recursive sum type –Depending on how you define recursive type equivalence, fewer programs may typecheck!
16 Transparent Matching Example struct datatype exp = VarExp of var | LetExp of dec * exp and dec = ValDec of var * exp | SeqDec of dec * dec end :> sig type exp datatype dec = ValDec of var * exp | SeqDec of dec * dec end ?
17 Transparent Matching Example struct type exp = 1 ( , ).(var + * , var * + * ) type dec = 2 ( , ).(var + * , var * + * ) end :> sig type exp datatype dec = ValDec of var * exp | SeqDec of dec * dec end ?
18 Transparent Matching Example struct type exp = 1 ( , ).(var + * , var * + * ) type dec = 2 ( , ).(var + * , var * + * ) end :> sig type exp type dec = 1 ( ).(var * exp + * ) end ?
19 Transparent Matching Example struct type exp = 1 ( , ).(var + * , var * + * ) type dec = 2 ( , ).(var + * , var * + * ) end :> sig type exp type dec = 1 ( ).(var * exp + * ) end ? = ?
20 Notation Use to stand for a recursive type, i.e.: ::= k ( 1,..., n ).( 1,..., n ) (k 2 1..n) Expansion of a recursive type: expand( ) For example, if intlist = . 1 + int * then expand ( intlist ) = 1 + int * intlist
21 Iso-Recursive Types Iso-recursive equivalence is purely structural: – expand( ), but the two are isomorphic –roll : expand( ) ! –unroll : ! expand( Works fine for H-S with abstract datatypes, but…
22 Transparent Matching Example struct type exp = 1 ( , ).(var + * , var * + * ) type dec = 2 ( , ).(var + * , var * + * ) end :> sig type exp type dec = 1 ( ).(var * exp + * ) end ? X
23 Equi-Recursive Types Another form of recursive type equivalence: – = expand( ) – . ( ) represents unique solution of = ( ) – = . ( ) iff = ( ) Equi-recursive equivalence is sufficient: –dec matches its specification –Enables transparent interpretation to accept all valid SML datatype matchings
24 Equi-Recursive Types Recall from the example: dec = 2 ( , ).(var + * , var * + * ) and we need dec = 1 ( ).(var * exp + * ) Suffices to show dec satisfies the fixed point equation: dec = var * exp + dec * dec Which follows from: dec = expand( dec ) = var * 1 ( ) + 2 ( ) * 2 ( ) = var * exp + dec * dec
25 A Hybrid Equivalence Equi-recursive equivalence is overkill: –Unnecessary to equate a recursive type with a non-recursive type (its expansion) Hybrid of iso- and equi-recursive equivalence: –Based on FLINT intermediate lang. [League and Shao] –Restriction of Amadio-Cardelli algorithm –Only equates ’s with ’s Paper gives details of the hybrid algorithm, along with formal argument that it is sufficient
26 Complications Strong versions of type equivalence not well studied outside simply typed -calculus. (TILT IL’s have h.-o. constructors, singleton kinds…) Conflicts with SML semantics: –Datatypes no longer generative. –Problems involving datatypes in sharing and where type constraints. –To implement SML, must handle these issues another way.
The Coercion Interpretation
28 Those in and out Functions Recall the definitions given during elaboration: fun in(x) = roll (x) fun out(x) = unroll (x) Consider the roll and unroll operations. –Commonly implemented as “no-ops”. That is, the values v and roll (v) are represented the same. So, roll and unroll are just “retyping” operators, or coercions. –Untyped machine code for in / out same as for the identity function.
29 New type constructor: 1 ) 2 – Inhabited only by coercive terms – Coerciveness of exp_in, exp_out reflected in type – Applications can be ignored at runtime signature EXPDEC = sig type exp type dec val exp_in : var + (dec * exp) exp val exp_out : exp var + (dec * exp) val dec_in : (var * exp) + (dec * dec) dec val dec_out : dec (var * exp) + (dec * dec) end ExpDec Revisited ) ) ) ) -> At runtime, exp_in, exp_out act as the identity, but: – Cannot be recognized from the type
30 Coercions New constructs for the internal language: –Coercion values fold / unfold replace roll / unroll –Special type 1 ) 2 distinguishes them from functions. –Special application syntax: e Define in/out using coercions val in : expand( ) ) = fold val out : ) expand( ) = unfold Define constructor app’s using coercion app’s VarExp(x) Ã 1 (x))
31 Coercion Erasure Why are coercion applications better than function applications? Because: –A closed value of coercion type can only be fold or unfold. –No work is required at run time to apply either fold or unfold. –To compile generate the same code as for e. Safety argument (in the paper) –Formalized via a translation into an untyped target calculus.
32 Performance Run times of benchmarks under 3 interpretations. Harper-Stone ¼ 37% slower than the others Coercion interpretation about the same as transparent. Coercion interpretation is faithful to SML semantics, requires only simple extension to the type theory.
33 Conclusion Efficiency Conformance to SML Semantics Meta-theoretic Simplicity Harper-Stone Transparent ? Coercion
34 Transparent Interpretation –Remove all type abstraction – must recover datatype generativity and sharing constraints by other means. –New point in design space of recursive type equivalence. Coercion Interpretation –Preserve abstract semantics of datatypes. –Contribution: Coercion types may be generally useful.