Building “Correct” Compilers

Slides:



Advertisements
Similar presentations
Copyright 2000 Cadence Design Systems. Permission is granted to reproduce without modification. Introduction An overview of formal methods for hardware.
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Lecture 11: Code Optimization CS 540 George Mason University.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Program Representations. Representing programs Goals.
Automated Soundness Proofs for Dataflow Analyses and Transformations via Local Rules Sorin Lerner* Todd Millstein** Erika Rice* Craig Chambers* * University.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Automatically Proving the Correctness of Compiler Optimizations Sorin Lerner Todd Millstein Craig Chambers University of Washington.
CS 536 Spring Global Optimizations Lecture 23.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
Administrative info Subscribe to the class mailing list –instructions are on the class web page, which is accessible from my home page, which is accessible.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
X := 11; if (x == 11) { DoSomething(); } else { DoSomethingElse(); x := x + 1; } y := x; // value of y? Phase ordering problem Optimizations can interact.
Another example p := &x; *p := 5 y := x + 1;. Another example p := &x; *p := 5 y := x + 1; x := 5; *p := 3 y := x + 1; ???
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
From last time S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
From last lecture x := y op z in out F x := y op z (in) = in [ x ! in(y) op in(z) ] where a op b =
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Intermediate Code. Local Optimizations
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Provably Correct Compilers (Part 2) Nazrul Alam and Krishnaprasad Vikram April 21, 2005.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t.
Automatically Checking the Correctness of Program Analyses and Transformations.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Precision Going back to constant prop, in what cases would we lose precision?
Have Your Verified Compiler And Extend It Too Zachary Tatlock Sorin Lerner UC San Diego.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Example x := read() v := a + b x := x + 1 w := x + 1 a := w v := a + b z := x + 1 t := a + b.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
Proving Optimizations Correct using Parameterized Program Equivalence University of California, San Diego Sudipta Kundu Zachary Tatlock Sorin Lerner.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
Introduction to Optimization
Code Optimization.
Data Flow Analysis Suman Jana
Weakest Precondition of Unstructured Programs
Static Single Assignment
State your reasons or how to keep proofs while optimizing code
Fall Compiler Principles Lecture 8: Loop Optimizations
Machine-Independent Optimization
Introduction to Optimization
University Of Virginia
Optimizing Transformations Hal Perkins Autumn 2011
Another example: constant prop
Compilers have many bugs
Optimizing Transformations Hal Perkins Winter 2008
Code Optimization Overview and Examples Control Flow Graph
Resolution Proofs for Combinational Equivalence
Fall Compiler Principles Lecture 10: Loop Optimizations
Data Flow Analysis Compiler Design
Pointer analysis.
Introduction to Optimization
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Presentation transcript:

Building “Correct” Compilers K. Vikram and S. M. Nazrul A.

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

The Seven Grand Challenges Introduction The Seven Grand Challenges In Vivo  In Silico Science for Global Ubiquitous Computing Memories for Life Scalable Ubiquitous Computing Systems The Architecture of the Brain and Mind Dependable Systems Evolution Journeys in Non-classical computations

The Seven Grand Challenges Introduction The Seven Grand Challenges In Vivo  In Silico Science for Global Ubiquitous Computing Memories for Life Scalable Ubiquitous Computing Systems The Architecture of the Brain and Mind Dependable Systems Evolution Journeys in Non-classical computations The dependable systems evolution challenge seems to be the most immediately required challenge to be met. The others are almost a luxury, but this one is a fair necessity.

Dependable Systems Evolution Introduction Dependable Systems Evolution A long standing problem Loss of financial resources, human lives Compare with other engineering fields! Non-functional requirements Safety, Reliability, Availability, Security, etc.

Why the sudden interest? Introduction Why the sudden interest? Was difficult so far, but now … Greater Technology Push Model checkers, theorem provers, programming theories and other formal methods Greater Market Pull Increased dependence on computing

A small but significant step Introduction A small but significant step Building Correct Compilers

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Why are correct compilers hard to build? Motivation Why are correct compilers hard to build? Bugs don’t manifest themselves easily Where is the bug – program or compiler? Possible solutions Check semantic equivalence of the two programs (translation validation, etc.) Prove compilers sound (manually) Drawbacks? Conservative, Difficult, Actual code not verified

Testing Compiled Source Prog compiler To get benefits, must: Motivation Testing Source Compiled Prog compiler exp- ected output output input DIFF run! To get benefits, must: run over many inputs compile many test cases No correctness guarantees: neither for the compiled prog nor for the compiler

Verify each compilation Motivation Verify each compilation Source Compiled Prog compiler Semantic DIFF Translation validation [Pnueli et al 98, Necula 00] Credible compilation [Rinard 99] Compiler can still have bugs. Compile time increases. “Semantic Diff” is hard.

Proving the whole compiler correct Motivation Proving the whole compiler correct Source Compiled Prog compiler Correctness checker

Proving the whole compiler correct Motivation Proving the whole compiler correct compiler Correctness checker Option 1: Prove compiler correct by hand. Proofs are long… And hard. Compilers are proven correct as written on paper. What about the implementation? Correctness checker Link? Proof Proof Proof «¬  $  \ r t  l / .

gcc-bugs mailing list And this is only for February 2003! Motivation gcc-bugs mailing list Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results: c/9525: incorrect code generation on SSE2 intrinsics target/7336: [ARM] With -Os option, gcc incorrectly computes the elimination offset optimization/9325: wrong conversion of constants: (int)(float)(int) (INT_MAX) optimization/6537: For -O (but not -O2 or -O0) incorrect assembly is generated optimization/6891: G++ generates incorrect code when -Os is used optimization/8613: [3.2/3.3/3.4 regression] -O2 optimization generates wrong code target/9732: PPC32: Wrong code with -O2 –fPIC c/8224: Incorrect joining of signed and unsigned division … And this is only for February 2003! On a mature compiler!

Need for Automation compiler Motivation Need for Automation compiler This approach: proves compiler correct automatically. Correctness checker Automatic Theorem Prover

This seems really hard! Task of proving compiler correct The Challenge This seems really hard! Task of proving compiler correct Complexity of proving a compiler correct. Complexity that an automatic theorem prover can handle. Automatic Theorem Prover

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Automated Theorem Proving Brief detour thru ATP Started with AI applications Reasoning about FOL sound and complete 1965: Unification and Resolution Combinatorial Explosion. SAT (NP-Complete) and FOL (decidable) Refinements of Resolution, Term Rewriting, Higher order Logics Interactive Theorem Proving Efficient Implementation Techniques Coq, Nuprl, Isabelle, Twelf, PVS, Simplify, etc.

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Focus on Optimizations Optimizations are the most error prone Only phase that performs transformations that can potentially change semantics Front-end and back-end are relatively static

Optimizations Common Optimizations Constant Propagation: replace constant valued variables with constants Common sub-expression elimination: avoid recomputing value if value has been computed earlier in the program Loop invariant removal: move computations into less frequently executed portions of the program Strength Reduction: replace expensive operations (multiplication) with simpler ones (addition) Dead code removal: eliminate unreachable code and code that is irrelevant to the output of the program

Constant Propagation Examples Optimizations Constant Propagation Examples

Constant Propagation Condition Optimizations Constant Propagation Condition Suppose x is used at program point p If on all possible execution paths from START of procedure to p x has constant value c at p then replace x by c

The Analysis Algorithm Optimizations The Analysis Algorithm Build the control flow graph (CFG) of the program Make flow of control explicit Perform symbolic evaluation to determine constants Replace constant-valued variable uses by their values and simplify expressions and control flow

Optimizations Building the CFG

Building the CFG Composed of Basic Blocks Nodes of CFG Edges of CFG Optimizations Building the CFG Composed of Basic Blocks Straight line code without any branches or merges of control flow Nodes of CFG Statements (basic blocks)/switches/merges Edges of CFG Possible control flow sequence

Symbolic Evaluation Assign each variable the bottom value initially Optimizations Symbolic Evaluation Assign each variable the bottom value initially Propagate changes in variable values as statements are executed Based on the idea of Abstract Interpretation

Symbolic Evaluation Flow Functions Confluence Operation Optimizations Symbolic Evaluation Flow Functions x := e state@out = state@in{eval(e, state@in)/x} Confluence Operation join over all incoming edges

Symbolic Evaluation Flow Functions Confluence Operation Optimizations Symbolic Evaluation Flow Functions x := e state@out = ƒ (state@in) Confluence Operation join over all incoming edges

The Dataflow analysis algorithm Optimizations The Dataflow analysis algorithm Associate one state vector with each edge of CFG. Initialize all entries to Set all entries on outgoing edge from START to Evaluate the expression and update the output edge Continue till a fixed point is reached

Optimizations Example Evaluation

Termination Condition Optimizations Termination Condition If each flow function ƒ is monotonic i.e. x ≤ y => ƒ (x) ≤ ƒ (y) And if the lattice is of finite height The dataflow algorithm terminates

Other Optimizations constant propagation available expression All Paths Any Path constant propagation available expression reaching definitions busy expressions live variables Forward Flow Backward Flow

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Making the problem easier Overview Making the problem easier Task of proving compiler correct Automatic Theorem Prover

Making the problem easier Overview Making the problem easier Task of proving optimizer correct Only prove optimizer correct. Trust front-end and code-generator. Automatic Theorem Prover

Making the problem easier Overview Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language. Automatic Theorem Prover

Making the problem easier Overview Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language. Separate correctness from profitability. Automatic Theorem Prover

Making the problem easier Overview Making the problem easier Task of proving optimizer correct Write optimizations in Cobalt, a domain-specific language. Separate correctness from profitability. Factor out the hard and common parts of the proof, and prove them once by hand. Automatic Theorem Prover

The Design Overview Interpreter Input Output Cobalt Program Cobalt is a domain specific language. The input is a program in a C-like language that has the usual features of an imperative language. Input Output Cobalt Program

Overview The Design

The Compiler Overview Front End Source Code 10011011 Back 00010100 End if (…) { x := …; } else { y := …; } …; Front End Source Code 10011011 00010100 01101101 Back End Binary Executable

Results Cobalt language Correctness checker for Cobalt opts Overview Results Cobalt language realistic C-like IL, operates on a CFG implemented const prop and folding, branch folding, CSE, PRE, DAE, partial DAE, and simple forms of points-to analyses Correctness checker for Cobalt opts using the Simplify theorem prover Execution engine for Cobalt opts in the Whirlwind compiler

Overview Cobalt  Rhodium  ?

Caveats May not be able to express your opt Cobalt: Overview Caveats May not be able to express your opt Cobalt: no interprocedural optimizations for now. optimizations that build complicated data structures may be difficult to express. A sound Cobalt optimization may be rejected by the correctness checker. Trusted computing base (TCB) includes: front-end and code-generator, execution engine, correctness checker, proofs done by hand once

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Constant Prop (straight-line code) Forward Optimizations Constant Prop (straight-line code) statement y := 5 y := 5 statements that don’t define y statement x := y x := y x := 5 REPLACE

Adding arbitrary control flow Forward Optimizations Adding arbitrary control flow if statement y := 5 y := 5 y := 5 y := 5 is followed by statements that don’t define y until x := y x := 5 statement x := y REPLACE then transform statement to x := 5

Forward Optimizations Constant prop in English if statement y := 5 is followed by statements that don’t define y until statement x := y then transform statement to x := 5

Forward Optimizations Constant prop in Cobalt if statement y := 5 stmt(Y := C) boolean expressions evaluated at nodes in the CFG is followed by followed by statements that don’t define y ¬ mayDef(Y) until until statement x := y X := Y then X := C transform statement to x := 5 English version Cobalt version

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Proving correctness automatically Proving Optimizations Correct Proving correctness automatically y := 5 y := 5 y := 5 Witnessing region Invariant: y == 5 x := y x := 5

Constant prop revisited Proving Optimizations Correct Constant prop revisited Ask a theorem prover to show: A statement satisfying stmt(Y := C) establishes Y == C A statement satisfying ¬mayDef(Y) maintains Y == C The statements X := Y and X := C have the same semantics in a program state satisfying Y == C stmt(Y := C) followed by ¬ mayDef(Y) until X := Y X := C with witness Y == C

Generalize to any forward optimization Proving Optimizations Correct Generalize to any forward optimization Ask a theorem prover to show: A statement satisfying 1 establishes P A statement satisfying 2 maintains P The statements s and s’ have the same semantics in a program state satisfying P 1 followed by 2 until s s’ with witness We showed by hand once that these conditions imply correctness. P

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Profitability heuristics Optimization correct  safe to perform any subset of the matching transformations. So far, all transformations were also profitable. In some cases, many transformations are legal, but only a few are profitable.

The two pieces of an optimization Profitability Heuristics The two pieces of an optimization Transformation pattern: defines which transformations are legal. 1 followed by 2 until s s’ with witness P filtered through choose Profitability heuristic: describes which of the legal transformations to actually perform. does not affect soundness. can be written in a language of the user’s choice. This way of factoring an optimization is crucial to our ability to prove optimizations sound automatically.

Profitability heuristic example: PRE Profitability Heuristics Profitability heuristic example: PRE PRE as code duplication followed by CSE

Profitability heuristic example: PRE Profitability Heuristics Profitability heuristic example: PRE PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { ... } Code duplication x := a + b;

Profitability heuristic example: PRE Profitability Heuristics Profitability heuristic example: PRE PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { } x := Code duplication CSE self-assignment removal x := a + b; a + b; x;

Profitability heuristic example: PRE Profitability Heuristics Profitability heuristic example: PRE Legal placements of x := a + b Profitable placement a := ...; b := ...; if (...) { x := a + b; } else { ... }

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

The Cobalt Language Operates on a Control Flow Graph A rewrite rule Pure Analyses The Cobalt Language Operates on a Control Flow Graph A rewrite rule A guard to ensure appropriate conditions A predicate condition Filtered thru the choose function Pure analysis like pointer analysis. Verify properties such as no null pointer dereference

The Cobalt Language Pure analyses also possible Verify properties For use by other transformations

Constant prop revisited (again) Pure Analyses Constant prop revisited (again) stmt(Y := C) followed by ¬ mayDef(Y) until X := Y X := C with witness Y == C

mayDef in Cobalt followed by until with witness stmt(Y := C) Pure Analyses mayDef in Cobalt stmt(Y := C) followed by ¬ mayDef(Y) until X := Y X := C with witness Y == C

mayDef in Cobalt followed by until with witness Very conservative! Pure Analyses mayDef in Cobalt stmt(Y := C) followed by ¬ mayDef(Y) until X := Y X := C with witness Very conservative! Can we do better? Y == C

mayDef in Cobalt followed by until with witness Very conservative! Pure Analyses mayDef in Cobalt stmt(Y := C) followed by ¬ mayDef(Y) until X := Y X := C with witness Very conservative! Can we do better? Y == C

mayDef in Cobalt followed by until with witness stmt(Y := C) Pure Analyses mayDef in Cobalt stmt(Y := C) followed by ¬ mayDef(Y) until X := Y X := C with witness Y == C

mayDef in Cobalt followed by until with witness Pure Analyses mayDef in Cobalt stmt(Y := C) followed by ¬ mayDef(Y) until X := Y X := C with witness mayPntTo is a pure analysis. It computes dataflow info, but performs no transformations. Y == C

mayPntTo in Cobalt followed by defines with witness stmt(decl X) Pure Analyses mayPntTo in Cobalt decl X stmt(decl X) followed by ¬ stmt(... := &X) defines s addrNotTaken(X) with witness mayPntTo(X,Y) , ¬ addrNotTaken(Y) “no location in the store points to X”

Outline Introduction: Setting the high level context Motivation Detours Automated Theorem Proving Compiler Optimizations thru Dataflow Analysis Overview of the Cobalt System Forward optimizations in cobalt Proving Cobalt Optimizations Correct Profitability Heuristics Pure Analyses Concluding Remarks

Expressiveness of Cobalt Concluding Remarks Expressiveness of Cobalt Constant propagation, folding Copy propagation Common Subexpression Elimination Branch Folding Partial Redundancy Elimination Loop invariant code motion Partial Dead Assignment Elimination

Future work Improving expressiveness Inferring the witness Concluding Remarks Future work Improving expressiveness interprocedural optimizations one-to-many and many-to-many transformations Inferring the witness Generate specialized compiler binary from the Cobalt sources.

Summary and Conclusion Concluding Remarks Summary and Conclusion Optimizations written in a domain-specific language can be proven correct automatically. The correctness checker found several subtle bugs in Cobalt optimizations. A good step towards proving compilers correct automatically.