Automatically Checking the Correctness of Program Analyses and Transformations
Compilers have many bugs Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results: [Bug middle-end/19650] New: miscompilation of correct code [Bug c++/19731] arguments incorrectly named in static member specialization [Bug rtl-optimization/13300] Variable incorrectly identified as a biv [Bug rtl-optimization/16052] strength reduction produces wrong code [Bug tree-optimization/19633] local address incorrectly thought to escape [Bug target/19683] New: MIPS wrong-code for 64-bit multiply [Bug c++/19605] Wrong member offset in inherited classes Bug java/19295] [4.0 regression] Incorrect bytecode produced for bitwise AND … Total of 545 matches… And this is only for Janary 2005! On a mature compiler!
Compiler bugs cause problems if (…) { x := …; } else { y := …; } …; Compiler Exec They lead to buggy executables They rule out having strong guarantees about executables
The focus: compiler optimizations A key part of any optimizing compiler Original program Optimization Optimized program
The focus: compiler optimizations A key part of any optimizing compiler Hard to get optimizations right Lots of infrastructure-dependent details There are many corner cases in each optimization There are many optimizations and they interact in unexpected ways It is hard to test all these corner cases and all these interactions
Goals Make it easier to write compiler optimizations student in an undergrad compiler course should be able to write optimizations Provide strong guarantees about the correctness of optimizations automatically (no user intervention at all) statically (before the opts are even run once) Expressive enough for realistic optimizations
The Rhodium work A domain-specific language for writing optimizations: Rhodium A correctness checker for Rhodium optimizations An execution engine for Rhodium optimizations Implemented and checked the correctness of a variety of realistic optimizations
Broader implications Many other kinds of program manipulators: code refactoring tools, static checkers My work is about program analyses and transformations, the core of any program manipulator Enables safe extensible program manipulators Allow end programmers to easily and safely extend program manipulators Improve programmer productivity
Outline Introduction Overview of the Rhodium system Writing Rhodium optimizations Checking Rhodium optimizations Evaluation
Rhodium system overview Written by me Rhodium Execution engine Checker Written by programmer Rdm Opt
Rhodium system overview Written by me Rhodium Execution engine Checker Written by programmer Rdm Opt
Rhodium system overview Rdm Opt Rdm Opt Rdm Opt Checker Checker Checker
Rhodium system overview Compiler if (…) { x := …; } else { y := …; } …; Rhodium Execution engine Exec Rdm Opt Rdm Opt Rdm Opt Checker Checker
The technical problem Tension between: Challenge: develop techniques Expressiveness Automated correctness checking Challenge: develop techniques that will go a long way in terms of expressiveness that allow correctness to be checked
Contribution: three techniques Rdm Opt Verification Task Checker Show that for any original program: behavior of original program = behavior of optimized program Verification Task Automatic Theorem Prover
Contribution: three techniques Rdm Opt Verification Task Verification Task Automatic Theorem Prover
Contribution: three techniques Rdm Opt Verification Task Verification Task Automatic Theorem Prover
Contribution: three techniques Rdm Opt Rhodium is declarative declare intent using rules execution engine takes care of the rest First, Rhodium is a declarative language. There are no loops, no branches, no program counter. Programmers declare their intent using rules, and then the execution takes care of the rest. These rules are similar to what are called flow functions, which are the way that compiler writers typically think about optimizations. As a result, these rules provide a familiar way of expressing optimizations, while shielding the programmer from infrastructure dependent details. Because these rules talk about only one statement at a time, they are much easier to reason about automatically, and so the verifications task becomes simpler. Automatic Theorem Prover
Contribution: three techniques Rdm Opt Rhodium is declarative declare intent using rules execution engine takes care of the rest Automatic Theorem Prover
Contribution: three techniques Part that must be reasoned about Heuristics not affecting correctness Rdm Opt Rhodium is declarative Factor out heuristics legal transformations vs. profitable transformations Next, I noticed that a large part of these optimizations are profitability heuristics that figure out whether or not an optimization will improve performance. These heuristics do not affect correctness. For example, in inlining, there is a lot of computation that goes on in order to determine if a function call should be inlined. But all this computation does not in any way affect whether or not inlining the function is correct. Rhodium forces the programmer to factor out profitability heuristics from the rest of the optimization. First, the programmer declares what transformations are legal, and this is the bolded part here. This part is simple even for complicated optimizations. Then, using an arbitrary language, the programmer can implement profitability heuristics that pick which ones of these legal transformations to actually perform. Because these heuristics do not affect correctness, the checker only needs to reason about the small bolded part here, and so the verification task becomes simpler. Automatic Theorem Prover
Contribution: three techniques Part that must be reasoned about Heuristics not affecting correctness Rhodium is declarative Factor out heuristics legal transformations vs. profitable transformations Automatic Theorem Prover
Contribution: three techniques opt- dependent Rhodium is declarative Factor out heuristics Split verification task opt-dependent vs. opt-independent opt- independent Automatic Theorem Prover
Contribution: three techniques Rhodium is declarative Factor out heuristics Split verification task opt-dependent vs. opt-independent Automatic Theorem Prover
Contribution: three techniques Rhodium is declarative Factor out heuristics Split verification task opt-dependent vs. opt-independent Automatic Theorem Prover
Contribution: three techniques Rhodium is declarative Factor out heuristics Split verification task Result: Expressive language Automated correctness checking Automatic Theorem Prover
Outline Introduction Overview of the Rhodium system Writing Rhodium optimizations Checking Rhodium optimizations Evaluation
MustPointTo analysis a = &b a b c = a c a b d = *c d = b
MustPointTo info in Rhodium a = &b mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c
MustPointTo info in Rhodium a = &b a = &b mustPointTo (a, b) mustPointTo (a, b) a b a b c = a c = a mustPointTo (a, b) mustPointTo (c, b) c a b mustPointTo (a, b) mustPointTo (c, b) c a b d = *c d = *c
MustPointTo info in Rhodium define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b Fact correct on edge if: mustPointTo (a, b) a b whenever program execution reaches edge, meaning of fact evaluates to true in the program state c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c
Propagating facts define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c
Propagating facts define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b a = &b if currStmt == [X = &Y] then mustPointTo(X,Y)@out if currStmt == [X = &Y] then mustPointTo(X,Y)@out mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c
Propagating facts define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo(X,Y)@out mustPointTo (a, b) a b c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c
Propagating facts define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo(X,Y)@out a b mustPointTo (a, b) mustPointTo (a, b) if mustPointTo(X,Y)@in Æ currStmt == [Z = X] then mustPointTo(Z,Y)@out c = a c = a c a b mustPointTo (a, b) mustPointTo (c, b) mustPointTo (c, b) d = *c
Propagating facts define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo(X,Y)@out mustPointTo (a, b) a b if mustPointTo(X,Y)@in Æ currStmt == [Z = X] then mustPointTo(Z,Y)@out c = a mustPointTo (a, b) mustPointTo (c, b) c a b d = *c
Transformations define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo(X,Y)@out mustPointTo (a, b) a b if mustPointTo(X,Y)@in Æ currStmt == [Z = X] then mustPointTo(Z,Y)@out c = a c a b mustPointTo (a, b) if mustPointTo(X,Y)@in Æ currStmt == [Z = *X] then transform to [Z = Y] mustPointTo (c, b) mustPointTo (c, b) d = *c d = *c d = b
Transformations define fact mustPointTo(X:Var,Y:Var) with meaning « X == &Y ¬ a = &b if currStmt == [X = &Y] then mustPointTo(X,Y)@out mustPointTo (a, b) a b if mustPointTo(X,Y)@in Æ currStmt == [Z = X] then mustPointTo(Z,Y)@out c = a c a b mustPointTo (a, b) if mustPointTo(X,Y)@in Æ currStmt == [Z = *X] then transform to [Z = Y] mustPointTo (c, b) d = *c d = b
Profitability heuristics Legal transformations (identified by the Rhodium rules) Profitability Heuristics Subset of legal transformations (actually performed)
Profitability heuristic example 1 Inlining Many heuristics to determine when to inline a function compute function sizes, estimate code-size increase, estimate performance benefit maybe even use AI techniques to make the decision However, these heuristics do not affect the correctness of inlining They are just used to choose which of the correct set of transformations to perform
Profitability heuristic example 2 Partial redundancy elimination (PRE) a := ...; b := ...; if (...) { x := a + b; } else { ... } x := a + b;
Profitability heuristic example 2 PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { ... } Code duplication x := a + b;
Profitability heuristic example 2 PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { ... } x := Code duplication CSE x := a + b; a + b; x;
Profitability heuristic example 2 PRE as code duplication followed by CSE a := ...; b := ...; if (...) { x := a + b; } else { ... } x := Code duplication CSE self-assignment removal x := a + b; x;
Profitability heuristic example 2 Legal placements of x := a + b Profitable placement a := ...; b := ...; if (...) { x := a + b; } else { ... }
Semantics of a Rhodium opt Run propagation rules in a loop until there are no more changes (optimistic iterative analysis) Then run transformation rules Then run profitability heuristics For better precision, combine propagation rules and transformations rules using our previous composition framework [POPL 02]
More facts define fact mustNotPointTo(X:Var,Y:Var) with meaning « X &Y ¬ define fact doesNotPointIntoHeap(X:Var) with meaning « 9 Y:Var . X == &Y ¬ define fact hasConstantValue(X:Var,C:Const) with meaning « X == C ¬
More rules if currStmt == [X = *A] Æ mustNotPointToHeap(A)@in Æ 8 B:Var . mayPointTo(A,B)@in ) mustNotPointTo(B,Y) then mustNotPointTo(X,Y)@out if currStmt == [Y = I + BE ] Æ varEqualArray(X,A,J)@in Æ equalsPlus(J,I,BE)@in Æ : mayDef(X) Æ : mayDefArray(A) Æ unchanged(BE) then varEqualArray(X,A,Y)@out
More in Rhodium More powerful pointer analyses Heap summaries Analyses across procedures Interprocedural analyses Analyses that don’t care about the order of statements Flow-insensitive analyses
Outline Introduction Overview of the Rhodium system Writing Rhodium optimizations Checking Rhodium optimizations Evaluation
Rhodium correctness checker Exec Compiler Rhodium Execution engine Rdm Opt if (…) { x := …; } else { y := …; } …; Checker Rdm Opt Checker
Rhodium correctness checker Rdm Opt Checker
Rhodium correctness checker Rdm Opt Checker Checker Automatic theorem prover
Rhodium correctness checker Rhodium optimization define fact … if … then … if … then transform … Profitability heuristics Checker Automatic theorem prover
Rhodium correctness checker Rhodium optimization define fact … if … then … if … then transform … Checker Automatic theorem prover
Rhodium correctness checker Rhodium optimization Opt-independent define fact … if … then … if … then transform … Lemma For any Rhodium opt: If Local VCs are true Then opt is correct Proof «¬ $ \ r t l Checker VCGen VCGen Local VC Local VC Opt-dependent Automatic theorem prover
Local verification conditions define fact mustPointTo(X,Y) with meaning « X == &Y ¬ Local VCs (generated and proven automatically) Assume: Propagated fact is correct Show: All incoming facts are correct if mustPointTo(X,Y)@in Æ currStmt == [Z = X] then mustPointTo(Z,Y)@out Assume: Original stmt and transformed stmt have same behavior Show: All incoming facts are correct if mustPointTo(X,Y)@in Æ currStmt == [Z = *X] then transform to [Z = Y]
Local correctness of prop. rules define fact mustPointTo(X,Y) with meaning « X == &Y ¬ Local VC (generated and proven automatically) Show: « Z == &Y ¬ (out) « X == &Y ¬ (in) Æ out = step (in , [Z = X] ) Assume: Assume: All incoming facts are correct if mustPointTo(X,Y)@in Æ currStmt == [Z = X] Show: Propagated fact is correct then mustPointTo(Z,Y)@out mustPointTo (X, Y) mustPointTo (Z, Y) Z := X
Local correctness of prop. rules define fact mustPointTo(X,Y) with meaning « X == &Y ¬ Local VC (generated and proven automatically) Show: « Z == &Y ¬ (out) « X == &Y ¬ (in) Æ out = step (in , [Z = X] ) Assume: currStmt == [Z = X] then mustPointTo(Z,Y)@out if mustPointTo(X,Y)@in Æ X Y mustPointTo (X, Y) mustPointTo (Z, Y) Z := X in Z := X Z Y ? out
Outline Introduction Overview of the Rhodium system Writing Rhodium optimizations Checking Rhodium optimizations Evaluation
Dimensions of evaluation Ease of use Correctness guarantees Usefulness of the checker Expressiveness
Ease of use Joao Dias Erika Rice, summer 2004 Guarantees Usefulness Expressiveness Ease of use Joao Dias third year graduate student in compilers at Harvard less than 45 mins to write CSE and copy prop Erika Rice, summer 2004 only knowledge of compilers: one undergrad class started writing Rhodium optimizations in a few days Simple interface to the compiler’s structures pattern matching “flow functions” familiar to compiler 101 students
Correctness guarantees Ease of use Guarantees Usefulness Expressiveness Correctness guarantees Once checked, optimizations are guaranteed to be correct Caveat: trusted computing base execution engine checker implementation proofs done by hand once by me Adding a new optimization does not increase the size of the trusted computing base
Usefulness of the checker Ease of use Guarantees Usefulness Expressiveness Usefulness of the checker Found subtle bugs in my initial implementation of various optimizations define fact equals(X:Var, E:Expr) with meaning « X == E ¬ x := x + 1 x = x + 1 if currStmt == [X = E] then equals(X,E)@out equals (x , x + 1)
Usefulness of the checker Ease of use Guarantees Usefulness Expressiveness Usefulness of the checker Found subtle bugs in my initial implementation of various optimizations define fact equals(X:Var, E:Expr) with meaning « X == E ¬ x := x + 1 x = x + 1 if currStmt == [X = E] then equals(X,E)@out if currStmt == [X = E] Æ “X does not appear in E” then equals(X,E)@out equals (x , x + 1)
Usefulness of the checker Ease of use Guarantees Usefulness Expressiveness Usefulness of the checker Found subtle bugs in my initial implementation of various optimizations define fact equals(X:Var, E:Expr) with meaning « X == E ¬ x = x + 1 x = x + 1 x = *y + 1 if currStmt == [X = E] Æ “X does not appear in E” then equals(X,E)@out if currStmt == [X = E] Æ “E does not use X” then equals(X,E)@out equals (x , x + 1) equals (x , *y + 1)
Rhodium expressiveness Ease of use Guarantees Usefulness Expressiveness Rhodium expressiveness Traditional optimizations: const prop and folding, branch folding, dead assignment elim, common sub-expression elim, partial redundancy elim, partial dead assignment elim, arithmetic invariant detection, and integer range analysis. Pointer analyses must-point-to analysis, Andersen's may-point-to analysis with heap summaries Loop opts loop-induction-variable strength reduction, code hoisting, code sinking Array opts constant propagation through array elements, redundant array load elimination
Rhodium expressiveness Ease of use Guarantees Usefulness Expressiveness Rhodium expressiveness Traditional optimizations: const prop and folding, branch folding, dead assignment elim, common sub-expression elim, partial redundancy elim, partial dead assignment elim, arithmetic invariant detection, and integer range analysis. Pointer analyses must-point-to analysis, Andersen's may-point-to analysis with heap summaries Loop opts loop-induction-variable strength reduction, code hoisting, code sinking Array opts constant propagation through array elements, redundant array load elimination
Expressiveness limitations Ease of use Guarantees Usefulness Expressiveness Expressiveness limitations May not be able to express your optimization in Rhodium opts that build complicated data structures opts that perform complicated many-to-many transformations (e.g.: loop fusion, loop unrolling) A correct Rhodium optimization may be rejected by the correctness checker limitations of the theorem prover limitations of first-order logic
Summary Rhodium system makes it easier to write optimizations provides correctness guarantees is expressive enough for realistic optimizations Rhodium system provides a foundation for safe extensible program manipulators