1 A Certified Type-Preserving Compiler from Lambda Calculus to Assembly Language Adam Chlipala University of California, Berkeley An experiment with variable.

Slides:



Advertisements
Similar presentations
Transposing F to C Transposing F to C Andrew Kennedy & Don Syme Microsoft Research Cambridge, U.K.
Advertisements

Static and User-Extensible Proof Checking Antonis StampoulisZhong Shao Yale University POPL 2012.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
1 1 Regression Verification for Multi-Threaded Programs Sagar Chaki, SEI-Pittsburgh Arie Gurfinkel, SEI-Pittsburgh Ofer Strichman, Technion-Haifa Originally.
ISBN Chapter 3 Describing Syntax and Semantics.
Code-Carrying Proofs Aytekin Vargun Rensselaer Polytechnic Institute.
Typed Assembly Languages COS 441, Fall 2004 Frances Spalding Based on slides from Dave Walker and Greg Morrisett.
1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.
Twelf: The Quintessential Proof Assistant for Language Metatheory Karl Crary Carnegie Mellon University Joint work with Robert Harper and Michael Ashley-Rollman.
Programming Language Semantics Mooly SagivEran Yahav Schrirber 317Open space html://
MinML: an idealized programming language CS 510 David Walker.
Sparkle A theorem prover for the functional language Clean Maarten de Mol University of Nijmegen February 2002.
Semantics with Applications Mooly Sagiv Schrirber html:// Textbooks:Winskel The.
1 Explicit Contexts in LF Karl Crary Carnegie Mellon University Workshop on Mechanized Metatheory, 9/21/06.
Describing Syntax and Semantics
1/25 Pointer Logic Changki PSWLAB Pointer Logic Daniel Kroening and Ofer Strichman Decision Procedure.
A Theory of Hygienic Macros David Herman, Mitchell Wand Northeastern University.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
1 Formal Semantics. 2 Why formalize? ML is tricky, particularly in corner cases generalizable type variables? polymorphic references? exceptions? Some.
ASPfun: A Distributed Object Calculus and its Formalization in Isabelle Work realized in collaboration with Florian Kammüller and Henry Sudhof (Technische.
Implementing a Dependently Typed λ -Calculus Ali Assaf Abbie Desrosiers Alexandre Tomberg.
Automated tactics for separation logic VeriML Reconstruct Z3 Proof Safe incremental type checker Certifying code transformation Proof carrying hardware.
Mechanizing Metatheory without Typing Contexts TYPES 2011 September 10, 2011 Jonghyun Park, Jeongbong Seo, Sungwoo Park, Gyesik Lee* Pohang University.
CS412/413 Introduction to Compilers and Translators Spring ’99 Lecture 11: Functions and stack frames.
A Mechanized Model of the Theory of Objects 1.Functional  -calculus in Isabelle 2.Confluence Proof in Isabelle 3.Ongoing Work, Applications, Conclusion.
Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications Chapter.
CMSC 330: Organization of Programming Languages Operational Semantics.
Carnegie Mellon Vadim Zaliva, Franz Franchetti Carnegie Mellon University Department of Electrical and Computer Engineering Funded by the DARPA I2O HACMS.
Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
Cooperative Integration of an Interactive Proof Assistant and an Automated Prover Adam Chlipala and George C. Necula University of California, Berkeley.
Mostly-Automated Verification of Low-Level Programs in Computational Separation Logic Adam Chlipala Harvard University PLDI 2011.
1 Interactive Computer Theorem Proving CS294-9 October 5, 2006 Adam Chlipala UC Berkeley Lecture 7: Programming with Proofs.
1 Interactive Computer Theorem Proving CS294-9 November 30, 2006 Adam Chlipala UC Berkeley Lecture 14: Twelf.
1 Interactive Computer Theorem Proving CS294-9 October 19, 2006 Adam Chlipala UC Berkeley Lecture 9: Beyond Primitive Recursion.
1 Interactive Computer Theorem Proving CS294-9 September 7, 2006 Adam Chlipala UC Berkeley Lecture 3: Data structures and Induction.
Code Generation Instruction Selection Higher level instruction -> Low level instruction Register Allocation Which register to assign to hold which items?
1 Parametric Higher-Order Abstract Syntax for Mechanized Semantics Adam Chlipala Harvard University ICFP 2008.
Thoughts on Programming with Proof Assistants Adam Chlipala University of California, Berkeley PLPV Workshop.
1 A Certified Type-Preserving Compiler from Lambda Calculus to Assembly Language Adam Chlipala University of California, Berkeley PLDI 2007.
Generic Programming and Proving for Programming Language Metatheory
Lecture 11: Proof by Reflection
CS314 – Section 5 Recitation 9
Mathematical Foundations
Functional Programming
Advanced Computer Systems
Compiler Design (40-414) Main Text Book:
Programming Languages and Compilers (CS 421)
Unit – 3 :LAMBDA CALCULUS AND FUNCTIONAL PROGRAMMING
Sparkle a functional theorem prover
Introduction to Parsing (adapted from CS 164 at Berkeley)
The DE Language Erik Reeber 6/30/04.
A Verified Compiler for an Impure Functional Language
Lesson 4 Typed Arithmetic Typed Lambda Calculus
Programming Languages and Compilers (CS 421)
Engineering Aspects of Formal Metatheory
Closure Representations in Higher-Order Programming Languages
This Lecture Substitution model
Semantics In Text: Chapter 3.
Madhusudan Parthasarathy
This Lecture Substitution model
OBJ first-order functional language based on equational logic
This Lecture Substitution model
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Recursive Procedures and Scopes
Programming Languages and Compilers (CS 421)
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Presentation transcript:

1 A Certified Type-Preserving Compiler from Lambda Calculus to Assembly Language Adam Chlipala University of California, Berkeley An experiment with variable binding, denotational semantics, and logical relations in Coq

2 The Big Picture Intermediate Program I Source Program Target Program Intermediate Program n.... Certifying compilation: Source and target programs are observationally equivalent. Compiler Source Program Certified compiler: For any valid input, the compiler produces an observationally equivalent output. Simply-typed lambda calculus Idealized assembly language with abstract, type-directed garbage collector Transformations: CPS conversion, closure conversion, explicit heap allocation, register allocation,... Implemented in Coq Theorem proved in Coq

3 Type-Preserving Compilation ● Preserve static type information in some prefix of the compilation process. ● Taken all the way, you end up with typed assembly language, proof-carrying code, etc.. ● More modestly, implement nearly tag-free garbage collection. – Replace tag bits, boxing, etc., with static tables mapping registers to types. – Used in the MLton SML compiler.

4 What's tricky? ● Nested variable scopes ● Relational reasoning ● Proof management and automation This is what the POPLmark Challenge is all about!

5 Design Decision #1: Dependently-Typed ASTs Compiler Input Program Output Program Type Preservation Theorem. If the input program has type T, then the output program has type C(T). Semantics Preservation Theorem. If the input program has meaning M, then the output program has meaning C(M). Use dependent types to make the compiler type-preserving by construction! Typing Derivation

6 Design Decision #2: Denotational Semantics Compiler Input Program Output Program Semantics Preservation Theorem. If the input program has meaning M, then the output program has meaning C(M). Operational Semantics Version: If the input program multi-steps to result v, then the output program multi-steps to result v. Denotational Semantics Version: 1. Compile the input program to CIC. 2. Compile the output program to CIC. 3. The two results must be equal.

7 Secret Weapons Programming with dependent types is hard! Generic programming system Object language description Syntactic support functions + generic proofs of their correctness The trickiest bits deal with “administrative” operations that adjust variable bindings... but these are still routine and hardly language-specific! Writing formal proofs is hard! “Put the rooster to work!” The combination of dependent types and denotational semantics enables some very effective decision procedures to be coded in Coq's Ltac language.

8 Rest of the Talk... ● Summary of compilation ● Dependently-typed ASTs ● Denotational semantics in Coq ● Writing compiler passes –...including generic programming of helper functions ● Proving semantics preservation

9 o ::= r | n | new(R, R) | read(r, n) i ::= r := o; i | jump r p ::= (I, i) Source and Target Languages Source language: simply-typed lambda calculus ¿ ::= N | ¿ ! ¿ e ::= n | x | e e | ¸x : ¿, e Target language: idealized assembly language

10 Compiler Stages ¸x, f x Closure conversion let F = ¸e, ¸x, ¸k, e.1 x k in k top (hF, [f]i) Explicit heap allocation let F = ¸e, ¸x, ¸k, e.1.1 e.1.2 x k in let r1 = [f] in let r2 = [F, r1] in k top (r2) Flattening F: r4 := r1.1; r1 := r4.2; r4 := r4.1; jump r4 main: r3 := r1.1; r1 := r1.2; r2 := new [f]; r2 := new [F, r2]; jump r3 CPS conversion k top (¸x, ¸k, f x k)

11 Correctness Proof ● Compiler and proof implemented entirely within Coq 8.0 ● Axioms: – Functional extensionality: ● 8f, g, (8x, f(x) = g(x)) ) f = g – Uniqueness of equality proofs: ● 8¿, 8x, y : ¿, 8P1, P2 : x = y, P1 = P2 ● The compiler is almost runnable as part of a proof.

12 Denotational Semantics of the Source Language

13 For Types... Inductive ty : Set := | Nat : ty | Arrow : ty -> ty -> ty. Fixpoint tyDenote (t : ty) : Set := match t with | Nat => nat | Arrow t1 t2 => tyDenote t1 -> tyDenote t2 end.

14 Representing Terms Inductive term : Set := | Const : nat -> term | Var : name -> term | Lam : name -> term -> term | App : term -> term -> term. Nominal syntax

15 Representing Terms Inductive term : Set := | Const : nat -> term | Var : nat -> term | Lam : term -> term | App : term -> term -> term. De Bruijn syntax

16 Representing Terms Inductive term : nat -> Set := | Const : forall n, nat -> term n | Var : forall n x, x term n | Lam : forall n, term (S n) -> term n | App : forall n, term n -> term n -> term n. Dependent de Bruijn syntax

17 Representing Terms Inductive term : list ty -> ty -> Set := | Const : forall G, nat -> term G Nat | Var : forall G t, var G t -> term G t | Lam : forall G dom ran, term (dom :: G) ran -> term G (Arrow dom ran) | App : forall G dom ran, term G (Arrow dom ran) -> term G dom -> term G ran. Dependent de Bruijn syntax with typing

18 Term Denotations Fixpoint termDenote (G : list ty) (t : ty) (e : term G t) {struct e} : subst tyDenote G -> tyDenote t := match e in (term G t) return (subst tyDenote G -> tyDenote t) with | Const _ n => fun _ => n | Var _ _ x => fun s => varDenote x s | Lam _ _ _ e' => fun s => fun x => termDenote e' (SCons x s) | App _ _ _ e1 e2 => fun s => (termDenote e1 s) (termDenote e2 s) end.

19 Definition of “Values” for Free OperationalDenotationa l n value ¸x : ¿, e value Syntactic characterization used throughout definitions and proofs Inherit any “canonical forms” properties of the underlying Coq types. “A natural number is either zero or a successor of another natural number.” Caveat: We don't get the same kind of property for functions!

20 No Substitution Function! OperationalDenotationa l Customized syntactic substitution function written for each object language (¸x : ¿, e 1 ) e 2 ! e 1 [x := e 2 ] Reduction rules defined using substitution Coq's operational semantics provides the substitution operation for us!

21 Free Metatheorems OperationalDenotationa l For each object language, give customized, syntactic proofs of properties like: ● Type safety – preservation ● Type safety – progress ● Confluence ● Strong normalization ●... Object Langua ge Object Langua ge Meta-theorems proved once and for all about CIC The majority of programming language theory mechanization experiments only look at proving these sorts of theorems!

22 Free Theorems Proof. By reflexivity of equality. ‖ Coq's proof checker identifies as equivalent terms that reduce to each other! This means that both compilation of terms into CIC and evaluation of the results are “zero cost” operations.

23 But Wait! Doesn't that only work for languages that are: ● Strongly normalizing ● Purely functional ● Deterministic ● Single-threaded ●...etc... (In other words, a lot like Coq)

24 Monads to the Rescue! ● Summary rebuttal: Take a cue from Haskell. ● Use object language agnostic “embedded languages” to allow expression of “effectful” computations ● Keep using Coq's definitional equality to handle reasoning about pure sublanguages, and even some of the mechanics of impure pieces.

25 Non-Strongly-Normalizing Languages For closed, first-order programs with basic block structure (e.g., structured assembly) (pc 0, mem 0 ) (pc 1, mem 1 ) Basic block denotation function A total denotation function that executes a basic block, determining the next program counter and memory state. (pc 1, mem 1 ) (pc 2, mem 2 ) (pc 3, mem 3 ) (pc 1, mem 1 ) (pc 2, mem 2 ) (pc 3, mem 3 ) Potentially- infinite trace A function runs basic blocks repeatedly to build a lazy list describing an execution trace. (no “non-computational” definitions required)

26 Co-inductive Traces T ::= n | ? | ✰, T Termination with a natural number answer Run-time failure Take one more step of computation. By keeping only these summaries of program executions, we enable effective equality reasoning. Example: Garbage collection safety Equality of traces is a good way to characterize the appropriate effect on programs from rearranging the heap and root pointers to a new, isomorphic configuration.

27 Example Compilation Phase: CPS Transform Translation works in some context ¡... but used in context ¡, u : ¿ 1 ! ¿ 2 ! Recall that terms are represented as typing derivations. We need a syntactic helper function equivalent to a weakening lemma. Type error!

28 Dependently-Typed Syntactic Helper Functions? ● Could just write this function from scratch for each new language. – Probably using tactic-based proof search – The brave (and patient) write the CIC term directly. ● My recipe for writing generic substitution functions involves three auxiliary recursive functions! ● Much nicer to automate these details using generic programming! – Write each function once, not once per object language.

29 What Do We Need? weaken : forall (G : list ty) (t : ty), term G t -> forall (t' : ty), term (t' :: G) t 1. The helper function itself 2. Lemmas about the function For any term e, properly-typed substitution ¾, and properly-typed value v: Can prove this generically for any compositional denotation function! For example, for simply-typed lambda calculus, there must exist f var, f app, and f lam such that:

30 Reflection-Based Generic Programming Language Definition (Coq inductive type) Reflected Language Definition (term of CIC) Coq plug-in (outside the logic) Generic function Specific function (type-compatible with original language definition) Denotation Function Reflected Denotation Function Coq plug-in Generic proof Specific proof

31 What to Prove? Overall correctness theorem: The compilation of a program of type N runs to the same result as the original program does. What do we prove about individual phases? Prove that input/output pairs are in an appropriate logical relation. E.g., for the CPS transform: This function space contains many functions not representable in our object languages!

32 In the Trenches Easy first step: Use introduction rules for forall's and implications at the start of the goal.

33 In the Trenches Now we're blocked at the tricky point for automated provers: proving existential facts and applying universal facts. Key observation: The quantified variables have very specific dependent types. We can use greedy quantifier instantiation!

34 In the Trenches

35 In the Trenches Existential hypotheses are easy to eliminate.

36 In the Trenches We can't make further progress with this hypothesis, since no term of the type given for k exists in the proof state.

37 In the Trenches We can simplify the conclusion by applying rewrite rules (like those we generated automatically) until no more apply.

38 In the Trenches Now the conclusion has a subterm with the right type to instantiate a hypothesis!

39 In the Trenches We can use H 1 to rewrite the goal.

40 In the Trenches

41 And That's That! ● This strategy does almost all of the proving for the CPS transformation correctness proof! – About 20 lines of proof script total. ● Basic approach: – Figure out the right syntactic rewrite lemmas, prove them, and add them as hints. – State the induction principle to use. – Call a generic tactic from a library.

42 A Recipe for Certified Compilers 1.Define object languages with dependently-typed ASTs. 2.Give object languages denotational semantics. 3.Use generic programming to build basic support functions and lemmas. 4.Write compiler phases as dependently-typed Coq functions. 5.Express phase correctness with logical relations. 6.Prove correctness theorems using a generic decision procedure relying heavily on greedy quantifier instantiation.

43 Design Decisions ● Why dependently-typed ASTs? – Avoid well-formedness side conditions – Easy to construct denotational semantics defined only over well-typed terms – Makes greedy quantifier instantiation realistic ● Why denotational semantics? – Concise to define – Known to work well with code transformation – Many reasoning steps come for free via Coq's definitional equality

44 Conclusion ● Yet another bag of suggestions on how to formalize programming languages and their metatheories and tools! ● Would be interesting to see other approaches to formalizing this kind of compilation. Acknowledgements Thanks to my advisor George Necula. This work was funded by a US National Defense fellowship and the US National Science Foundation.