Automatically Proving the Correctness of Compiler Optimizations Sorin Lerner Todd Millstein Craig Chambers University of Washington.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
1 Translation Validation: From Simulink to C Michael RyabtsevOfer Strichman Technion, Haifa, Israel Acknowledgement: sponsored by a grant from General.
Program Representations. Representing programs Goals.
ISBN Chapter 3 Describing Syntax and Semantics.
Automated Soundness Proofs for Dataflow Analyses and Transformations via Local Rules Sorin Lerner* Todd Millstein** Erika Rice* Craig Chambers* * University.
The Design and Implementation of a Certifying Compiler [Necula, Lee] A Certifying Compiler for Java [Necula, Lee et al] David W. Hill CSCI
Python Programming Chapter 1: The way of the program Saad Bani Mohammad Department of Computer Science Al al-Bayt University 1 st 2011/2012.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
CS 536 Spring Global Optimizations Lecture 23.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
From last time: live variables Set D = 2 Vars Lattice: (D, v, ?, >, t, u ) = (2 Vars, µ, ;,Vars, [, Å ) x := y op z in out F x := y op z (out) = out –
Administrative info Subscribe to the class mailing list –instructions are on the class web page, which is accessible from my home page, which is accessible.
Chapter 2: Algorithm Discovery and Design
Advanced Compilers CSE 231 Instructor: Sorin Lerner.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
X := 11; if (x == 11) { DoSomething(); } else { DoSomethingElse(); x := x + 1; } y := x; // value of y? Phase ordering problem Optimizations can interact.
Another example p := &x; *p := 5 y := x + 1;. Another example p := &x; *p := 5 y := x + 1; x := 5; *p := 3 y := x + 1; ???
Software Reliability Methods Sorin Lerner. Software reliability methods: issues What are the issues?
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
A Type System for Expressive Security Policies David Walker Cornell University.
From last time S1: l := new Cons p := l S2: t := new Cons *p := t p := t l p S1 l p tS2 l p S1 t S2 l t S1 p S2 l t S1 p S2 l t S1 p L2 l t S1 p S2 l t.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
From last lecture x := y op z in out F x := y op z (in) = in [ x ! in(y) op in(z) ] where a op b =
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Recap from last time We saw various different issues related to program analysis and program transformations You were not expected to know all of these.
Intermediate Code. Local Optimizations
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Provably Correct Compilers (Part 2) Nazrul Alam and Krishnaprasad Vikram April 21, 2005.
Describing Syntax and Semantics
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Composing Dataflow Analyses and Transformations Sorin Lerner (University of Washington) David Grove (IBM T.J. Watson) Craig Chambers (University of Washington)
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t.
Automatically Checking the Correctness of Program Analyses and Transformations.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Precision Going back to constant prop, in what cases would we lose precision?
Have Your Verified Compiler And Extend It Too Zachary Tatlock Sorin Lerner UC San Diego.
© Janice Regan, CMPT 128, Jan CMPT 128 Introduction to Computing Science for Engineering Students Creating a program.
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
Chapter 3 Developing an algorithm. Objectives To introduce methods of analysing a problem and developing a solution To develop simple algorithms using.
Chapter Twenty-ThreeModern Programming Languages1 Formal Semantics.
Formal Semantics Chapter Twenty-ThreeModern Programming Languages, 2nd ed.1.
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Secure Compiler Seminar 4/11 Visions toward a Secure Compiler Toshihiro YOSHINO (D1, Yonezawa Lab.)
Intro to Planning Or, how to represent the planning problem in logic.
Program Representations. Representing programs Goals.
PLC '06 Experience in Testing Compiler Optimizers Using Comparison Checking Masataka Sassa and Daijiro Sudo Dept. of Mathematical and Computing Sciences.
Static Techniques for V&V. Hierarchy of V&V techniques Static Analysis V&V Dynamic Techniques Model Checking Simulation Symbolic Execution Testing Informal.
Proving Optimizations Correct using Parameterized Program Equivalence University of California, San Diego Sudipta Kundu Zachary Tatlock Sorin Lerner.
Credible Compilation With Pointers Martin Rinard and Darko Marinov Laboratory for Computer Science Massachusetts Institute of Technology.
Advanced Compiler Design
Weakest Precondition of Unstructured Programs
Sparkle a functional theorem prover
Proof Carrying Code and Proof Preserving Program Transformations
Repetition (While-Loop) version]
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
State your reasons or how to keep proofs while optimizing code
Eugene Gavrin – MSc student
Compilers have many bugs
Data Flow Analysis Compiler Design
Pointer analysis.
Advanced Compiler Design
Building “Correct” Compilers
Presentation transcript:

Automatically Proving the Correctness of Compiler Optimizations Sorin Lerner Todd Millstein Craig Chambers University of Washington

Goal: correct compilers The compiler is usually part of the trusted computing base. “But I use gcc, and it works great!”

gcc-bugs mailing list c/9525: incorrect code generation on SSE2 intrinsics target/7336: [ARM] With -Os option, gcc incorrectly computes the elimination offset optimization/9325: wrong conversion of constants: (int)(float)(int) (INT_MAX) optimization/6537: For -O (but not -O2 or -O0) incorrect assembly is generated optimization/6891: G++ generates incorrect code when -Os is used optimization/8613: [3.2/3.3/3.4 regression] -O2 optimization generates wrong code target/9732: PPC32: Wrong code with -O2 –fPIC c/8224: Incorrect joining of signed and unsigned division … Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results: And this is only for February 2003! On a mature compiler!

compiler Source Compiled Prog run! input exp- ected output Testing No correctness guarantees: neither for the compiled prog nor for the compiler DIFF To get benefits, must: run over many inputs compile many test cases output

Verify each compilation compiler Source Compiled Prog Semantic DIFF Translation validation [Pnueli et al 98, Necula 00] Credible compilation [Rinard 99] Compiler can still have bugs. Compile time increases. “Semantic Diff” is hard.

Proving the whole compiler correct compiler Source Compiled Prog Correctness checker

Proving the whole compiler correct compiler Correctness checker Correctness checker Option 1: Prove compiler correct by hand. Proofs are long… And hard. Compilers are proven correct as written on paper. What about the implementation? Proof «¬«¬  $  \ r t  l /. Link?

Correctness checker Our Approach Our approach: prove compiler correct automatically. Automatic Theorem Prover compiler

This seems really hard! Automatic Theorem Prover Task of proving compiler correct Complexity that an automatic theorem prover can handle. Complexity of proving a compiler correct.

Making the problem easier Automatic Theorem Prover Task of proving compiler correct

Making the problem easier Automatic Theorem Prover Task of proving optimizer correct Only prove optimizer correct. Trust front-end and code- generator.

Making the problem easier Automatic Theorem Prover Write optimizations in Cobalt, a domain-specific language. Task of proving optimizer correct

Making the problem easier Automatic Theorem Prover Separate correctness from profitability. Write optimizations in Cobalt, a domain-specific language. Task of proving optimizer correct

Making the problem easier Write optimizations in Cobalt, a domain-specific language. Separate correctness from profitability. Factor out the hard and common parts of the proof, and prove them once by hand. Automatic Theorem Prover Task of proving optimizer correct

Results Cobalt language –realistic C-like IL –implemented const prop and folding, branch folding, CSE, PRE, DAE, partial DAE, and simple forms of points-to analyses Correctness checker for Cobalt opts –using the Simplify theorem prover Execution engine for Cobalt opts –in the Whirlwind compiler

Caveats May not be able to express your opt Cobalt: –no interprocedural optimizations for now. –optimizations that build complicated data structures may be difficult to express. A sound Cobalt optimization may be rejected by the correctness checker. Trusted computing base (TCB) includes: –front-end and code-generator, execution engine, correctness checker, proofs done by hand once

Outline Overview Forward optimizations (see paper for backwards) –Example: constant propagation –Strategy for proving forward optimizations sound Profitability heuristics Pure analyses

y := 5 x := y REPLACE x := 5 statement y := 5 statements that don’t define y statement x := y Constant Prop (straight-line code)

Adding arbitrary control flow y := 5 x := y REPLACE x := 5 statement y := 5 statements that don’t define y statement x := y y := 5 is followed by until transform statement to x := 5 if then

Constant prop in statement y := 5 statements that don’t define y is followed by until if then transform statement to x := 5 statement x := y English

boolean expressions evaluated at nodes in the CFG stmt(Y := C) X := Y followed by until Cobalt versionEnglish version : mayDef(Y) statement y := 5 statements that don’t define y is followed by until if then transform statement to x := 5 statement x := y Constant prop inCobalt X := C

Outline Overview Forward optimizations (see paper for backwards) –Example: constant propagation –Strategy for proving forward optimizations sound Profitability heuristics Pure analyses

Proving correctness automatically y := 5 x := yx := 5 y := 5 Witnessing region Invariant: y == 5

Constant prop revisited stmt(Y := C) : mayDef(Y) X := Y followed by until with witness Y == C Ask a theorem prover to show: 1.A statement satisfying stmt(Y := C) establishes Y == C 2.A statement satisfying :mayDef(Y) maintains Y == C 3.The statements X := Y and X := C have the same semantics in a program state satisfying Y == C X := C

Generalize to any forward optimization Ask a theorem prover to show: 1.A statement satisfying  1 establishes P 2.A statement satisfying  2 maintains P 3.The statements s and s’ have the same semantics in a program state satisfying P We showed by hand once that these conditions imply correctness. 11 22 s followed by until with witness P s’

Outline Overview Forward optimizations (see paper for backwards) Profitability heuristics Pure analyses

Profitability heuristics Optimization correct ) safe to perform any subset of the matching transformations. So far, all transformations were also profitable. In some cases, many transformations are legal, but only a few are profitable.

The two pieces of an optimization  1 followed by  2 until s s’ with witness P filtered through choose Transformation pattern: –defines which transformations are legal. Profitability heuristic: –describes which of the legal transformations to actually perform. –does not affect soundness. –can be written in a language of the user’s choice. This way of factoring an optimization is crucial to our ability to prove optimizations sound automatically.

Profitability heuristic example: PRE PRE as code duplication followed by CSE

Profitability heuristic example: PRE a :=...; b :=...; if (...) { a :=...; x := a + b; } else {... } x := a + b; Code duplication PRE as code duplication followed by CSE

Profitability heuristic example: PRE PRE as code duplication followed by CSE a :=...; b :=...; if (...) { a :=...; x := a + b; } else { } x := x := a + b; Code duplication CSE self-assignment removal a + b; x;

Profitability heuristic example: PRE a :=...; b :=...; if (...) { a :=...; x := a + b; } else {... } x := a + b; Legal placements of x := a + b Profitable placement

Outline Overview Forward optimizations (see paper for backwards) Profitability heuristics Pure analyses

Constant prop revisited (again) stmt(Y := C) : mayDef(Y) X := Y followed by until with witness Y == C X := C

mayDef in Cobalt stmt(Y := C) : mayDef(Y) X := Y followed by until with witness Y == C X := C

mayDef in Cobalt Very conservative! Can we do better? stmt(Y := C) : mayDef(Y) X := Y followed by until with witness Y == C X := C

mayDef in Cobalt Very conservative! Can we do better? stmt(Y := C) : mayDef(Y) X := Y followed by until with witness Y == C X := C

mayDef in Cobalt stmt(Y := C) : mayDef(Y) X := Y followed by until with witness Y == C X := C

mayDef in Cobalt mayPntTo is a pure analysis. It computes dataflow info, but performs no transformations. stmt(Y := C) : mayDef(Y) X := Y followed by until with witness Y == C X := C

mayPntTo in Cobalt addrNotTaken(X) “no location in the store points to X” decl X s mayPntTo(X,Y), : addrNotTaken(Y) stmt(decl X) followed by : stmt(... := &X) defines with witness

Future work Improving expressiveness –interprocedural optimizations –one-to-many and many-to-many transformations Inferring the witness Generate specialized compiler binary from the Cobalt sources.

Summary and Conclusion Optimizations written in a domain-specific language can be proven correct automatically. Our correctness checker found several subtle bugs in Cobalt optimizations. A good step towards proving compilers correct automatically.