Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 4 July 13, 2001 LF, Oracle Strings, and Proof Tools Lipari School.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Semantics Static semantics Dynamic semantics attribute grammars
Chapter 10- Instruction set architectures
Foundational Certified Code in a Metalogical Framework Karl Crary and Susmit Sarkar Carnegie Mellon University.
Current Techniques in Language-based Security David Walker COS 597B With slides stolen from: Steve Zdancewic University of Pennsylvania.
David Evans CS655: Programming Languages University of Virginia Computer Science Lecture 20: Total Correctness; Proof-
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
March 4, 2005Susmit Sarkar 1 A Cost-Effective Foundational Certified Code System Susmit Sarkar Thesis Proposal.
Types, Proofs, and Safe Mobile Code The unusual effectiveness of logic in programming language research Peter Lee Carnegie Mellon University January 22,
VIDE als voortzetting van Cocktail SET Seminar 11 september 2008 Dr. ir. Michael Franssen.
Lectures on Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 2 (of 3) June 21-22, 2003 University of Oregon 2004 Summer School on Software.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
An Introduction to Proof-Carrying Code David Walker Princeton University (slides kindly donated by George Necula; modified by David Walker)
The Design and Implementation of a Certifying Compiler [Necula, Lee] A Certifying Compiler for Java [Necula, Lee et al] David W. Hill CSCI
Code-Carrying Proofs Aytekin Vargun Rensselaer Polytechnic Institute.
Catriel Beeri Pls/Winter 2004/5 last 55 Two comments on let polymorphism I. What is the (time, space) complexity of type reconstruction? In practice –
CLF: A Concurrent Logical Framework David Walker Princeton (with I. Cervesato, F. Pfenning, K. Watkins)
Extensible Verification of Untrusted Code Bor-Yuh Evan Chang, Adam Chlipala, Kun Gao, George Necula, and Robert Schneck May 14, 2004 OSQ Retreat Santa.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Under the Hood of the Open Verifier Bor-Yuh Evan Chang, Adam Chlipala, Kun Gao, George Necula, and Robert Schneck October 21, 2003 OSQ Group Meeting.
Proof-system search ( ` ) Interpretation search ( ² ) Main search strategy DPLL Backtracking Incremental SAT Natural deduction Sequents Resolution Main.
Complexity Analysis (Part I)
Twelf: The Quintessential Proof Assistant for Language Metatheory Karl Crary Carnegie Mellon University Joint work with Robert Harper and Michael Ashley-Rollman.
1 A Dependently Typed Assembly Language Hongwei Xi University of Cincinnati and Robert Harper Carnegie Mellon University.
Programmability with Proof-Carrying Code George C. Necula University of California Berkeley Peter Lee Carnegie Mellon University.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Language-Based Security Proof-Carrying Code Greg Morrisett Cornell University Thanks to G.Necula & P.Lee.
Lectures on Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 1 (of 3) June 21-22, 2003 University of Oregon 2004 Summer School on Software.
Lectures on Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 3 (of 3) June 21-22, 2003 University of Oregon 2004 Summer School on Software.
A Type System for Expressive Security Policies David Walker Cornell University.
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Describing Syntax and Semantics
Extensible Untrusted Code Verification Robert Schneck with George Necula and Bor-Yuh Evan Chang May 14, 2003 OSQ Retreat.
Cormac Flanagan University of California, Santa Cruz Hybrid Type Checking.
Software Engineering Prof. Dr. Bertrand Meyer March 2007 – June 2007 Chair of Software Engineering Static program checking and verification Slides: Based.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CSC3315 (Spring 2009)1 CSC 3315 Programming Languages Hamid Harroud School of Science and Engineering, Akhawayn University
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
ITEC 352 Lecture 12 ISA(3). Review Buses Memory ALU Registers Process of compiling.
Basics of Java IMPORTANT: Read Chap 1-6 of How to think like a… Lecture 3.
Proof-Carrying Code & Proof-Carrying Authentication Stuart Pickard CSCI 297 June 2, 2005.
Dr. José M. Reyes Álamo 1.  Review: ◦ Statement Labels ◦ Unconditional Jumps ◦ Conditional Jumps.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 3 July 12, 2001 VC Generation and Proof Representation Lipari.
Towards Automatic Verification of Safety Architectures Carsten Schürmann Carnegie Mellon University April 2000.
Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 2 July 11, 2001 Overview of PCC and Safety Policies Lipari School.
Introduction to Problem Solving. Steps in Programming A Very Simplified Picture –Problem Definition & Analysis – High Level Strategy for a solution –Arriving.
Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 3 July 12, 2001 VC Generation and Proof Representation Lipari.
Semantics In Text: Chapter 3.
Secure Compiler Seminar 4/11 Visions toward a Secure Compiler Toshihiro YOSHINO (D1, Yonezawa Lab.)
COP4020 Programming Languages Introduction to Axiomatic Semantics Prof. Robert van Engelen.
CSE Winter 2008 Introduction to Program Verification January 15 tautology checking.
Lecture 5 1 CSP tools for verification of Sec Prot Overview of the lecture The Casper interface Refinement checking and FDR Model checking Theorem proving.
Types and Programming Languages Lecture 11 Simon Gay Department of Computing Science University of Glasgow 2006/07.
SAFE KERNEL EXTENSIONS WITHOUT RUN-TIME CHECKING George C. Necula Peter Lee Carnegie Mellon U.
Programming Language Concepts (CIS 635) Elsa L Gunter 4303 GITC NJIT,
CSE 60641: Operating Systems George C. Necula and Peter Lee, Safe Kernel Extensions Without Run-Time Checking, OSDI ‘96 –SIGOPS Hall of fame citation:
Arrays. Outline 1.(Introduction) Arrays An array is a contiguous block of list of data in memory. Each element of the list must be the same type and use.
Types and Programming Languages Lecture 3 Simon Gay Department of Computing Science University of Glasgow 2006/07.
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
Prof. Necula CS 164 Lecture 171 Operational Semantics of Cool ICOM 4029 Lecture 10.
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Agenda  Quick Review  Finish Introduction  Java Threads.
Interface specifications At the core of each Larch interface language is a model of the state manipulated by the associated programming language. Each.
Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 2 July 11, 2001 Overview of PCC and Safety Policies Lipari School.
Advanced Computer Systems
Certified Code Peter Lee Carnegie Mellon University
Closure Representations in Higher-Order Programming Languages
A Trustworthy Proof Checker
Presentation transcript:

Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 4 July 13, 2001 LF, Oracle Strings, and Proof Tools Lipari School on Foundations of Wide Area Network Programming

“Applets, Not Craplets” A Demo

Cedilla Systems Architecture Code producerHost Ginseng Native code Proof Special J Java binary ~52KB, written in CWritten in OCaml

Annotations Cedilla Systems Architecture Code producerHost Proof checker VCGen Axioms Native code Proof VC Special J Java binary

Annotations Cedilla Systems Architecture Code producerHost Java binary Proof generator Proof checker VCGen Axioms Certifying compiler VCGen VC Native code Proof VC

Java Virtual Machine JVM Java Verifier JNI Class file Native code Proof- carrying code Checker

Show either the Mandelbrot or NBody3D demo.

Crypto Test Suite Results [Cedilla Systems] sec On average, 158% faster than Java, 72.8% faster than Java with a JIT.

Java Grande Suite v2.0 [Cedilla Systems] sec

Java Grande Bench Suite [Cedilla Systems] ops

Ginseng VCGen Checker Safety Policy Dynamic loading Cross-platform support ~15KB, roughly similar to a KVM verifier (but with floating-point). ~4KB, generic. ~19KB, declarative and machine-generated. ~22KB, some optional.

Practical Considerations

Trusted Computing Base The trusted computing base is the software infrastructure that is responsible for ensuring that only safe execution is possible. Obviously, any bugs in the TCB can lead to unsafe execution. Thus, we want the TCB to be simple, as well as fast and small.

VCGen’s Complexity Fortunately, we shall see that proofchecking can be quite simple, small, and fast. VCGen, at core, is also simple and fast. But in practice it gets to be quite complicated.

VCGen’s Complexity Some complications: If dealing with machine code, then VCGen must parse machine code. Maintaining the assumptions and current context in a memory- efficient manner is not easy. Note that Sun’s kVM does verification in a single pass and only 8KB RAM!

VC Explosion a == b a == c f(a,c) a := xc := x a := yc := y a=b => (x=c => safe f (y,c)  x<>c => safe f (x,y))  a<>b => (a=x => safe f (y,x)  a<>x => safe f (a,y)) Exponential growth in size of the VC is possible.

VC Explosion a == b a == c f(a,c) a := xc := x a := yc := y INV: P(a,b,c,x) (a=b => P(x,b,c,x)  a<>b => P(a,b,x,x))  (  a’,c’. P(a’,b,c’,x) => a’=c’ => safe f (y,c’)  a’<>c’ => safe f (a’,y)) Growth can usually be controlled by careful placement of just the right “join-point” invariants.

Stack Slots Each procedure will want to use the stack for local storage. This raises a serious problem because a lot of information is lost by VCGen (such as the value) when data is stored into memory. We avoid this problem by assuming that procedures use up to 256 words of stack as registers.

Exercise 8. Just as with loop invariants, our actual join-point invariants includes a specification of the registers that might be modified since the dominating block. Why might this be a useful thing to do? Why might it be a bad thing to do?

Callee-save Registers Standard calling conventions dictate that the contents of some registers be preserved. These callee-save registers are specified along with the pre/post- conditions for each procedure. The preservation of their values must be verified at every return instruction.

Introduction to Efficient Representation and Validation of Proofs

High-Level Architecture Explanation Code Verification condition generator Checker Safety policy Agent Host

Goals We would like a representation for proofs that is compact, fast to check, requires very little memory to check, and is “canonical” (in the sense of accommodating many different logics without requiring a total reimplementation of the checker).

Three Approaches 1. Direct representation of a logic. 2. Use of a Logical Framework. 3. Oracle strings. We will reject (1). Today we introduce (2) and (3).

Logical Framework For representation of proofs we use the Edinburgh Logical Framework (LF).

Reynolds’ Example Skip?

Formal Proofs Write “x is a proof of P” as x:P. Examples of predicates P: (for all A, B) A and B => B and A (for all x, y, z) x x < z What do the proofs look like?

Inference Rules We can write proofs by stitching together inference rules. An example inference rule: If we have a proof x of P and a proof y of Q, then x and y together constitute a proof of P  Q. Or, more compactly:  if x:P, y:Q then (x,y):P*Q. A proof, written in our compact notation.

More Inference Rules Another inference rule: Assume we have a proof x of P. If we can then obtain a proof b of Q, then we have a proof of P  Q. if [x:P] b:Q then fn (x:P) => b : P  Q. More rules:  if x:P*Q then fst(x):P  if y:P*Q then snd(y):Q

Types and Proofs So, for example: fn (x:P*Q) => (snd(x), fst(x)) : P*Q  Q*P We can develop a full metalanguage based on this principle of “proofs as programs”. Typechecking gives us proofchecking! Codified in the LF language.

LF i Skip?

LF Example This classic example illustrates how LF is used to represent the terms and rules of a logical system.

LF Example in Elf Syntax exp : type pred : type pf : pred -> type true : pred /\ : pred -> pred -> pred => : pred -> pred -> pred all : (exp -> pred) -> pred truei : pf true andi : {P:pred} {R:pred} pf P -> pf R -> pf (/\ P R) andel : {P:pred} {R:pred} pf (/\ P R) -> pf P impi : {P:pred} {R:pred} (pf P -> pf R) -> pf (=> P R) alli : {P:exp -> pred} ({X:exp} pf (P X)) -> pf (all P) alle : {P:exp -> pred} {E:exp} pf (all P) -> pf (P E) The same example, but using Pfenning’s Elf syntax.

LF as a Proof Representation LF is canonical, in that a single typechecker for LF can serve as a proofchecker for many different logics specified in LF. [See Avron, et al. ‘92] But the efficiency of the representation is poor.

Size of LF Representation Proofs in LF are extremely large, due to large amounts of repetition. Consider the representation of P  P  P for some predicate P: The proof of this predicate has the following LF representation: (=> P (/\ P P )) (impi P (/\ P P ) ([X:pf P ] andi P P x x))

Checking LF The nice thing is that typechecking is enough for proofchecking. [The theorem is in the LF paper.] But the proofs are extremely large. (impi P (/\ P P ) ([X:pf P ] andi P P X X)) : pf (=> P (/\ P P ))

Implicit LF A dramatic improvement can be achieved by using a variant of LF, called Implicit LF, or LF i. In LF i, parts of the proof can be replaced by placeholders. (impi * * ([X:*] andi * * X X)) : pf (=> P (/\ P P ))

Soundness of LF i The soundness of the LF i type system is given by a theorem that states: If, in context , a term M has type A in LF i (and  and A are placeholder-free), then there is a term M’ such that M’ has type A in LF.

Typechecking LF i The typechecking algorithm for LF i is given in [Necula & Lee, LICS98]. A key aspect of the algorithm is that it avoids repeated typechecking of reconstructed terms. Hence, the placeholders save not only space, but also time.

Effectiveness of LF i In experiments with PCC, LF i leads to substantial reductions in proof size and checking time. Improvements increase nonlinearly with proof size.

The Need for Improvement Despite the great improvement of LF i, in our experiments we observe that LF i proofs are 10%- 200% the size of the code.

How Big is a Proof? A basic question is how much essential information is in a proof? In this proof, there are only 2 uses of rules and in each case they were the only rule that could have been used. (impi * * ([X:*] andi * * x x)) : pf (=> P (/\ P P ))

Improving the Representation We will now improve on the compactness of proof representation by making use of the observation that large parts of proofs are deterministically generated from the inference rules.

Additional References For LF: Harper, Honsell, & Plotkin. A framework for defining logics. Journal of the ACM, 40(1), , Jan Avron, Honsell, Mason, & Pollack. Using typed lambda calculus to implement formal systems on a machine. Journal of Automated Reasoning, 9(3), , 1992.

Additional References For Elf: Pfenning. Logic programming in the LF logical framework. Logical Frameworks, Huet & Plotkin (Eds.), , Cambridge Univ. Press, Pfenning. Elf: A meta-language for deductive systems (system description). 12th International Conference on Automated Deduction, LNAI 814, , 1994.

Oracle-Based Checking

Necula’s Example Syntax of Girard’s System F ty : type int : ty arr : ty -> ty -> ty all : (ty -> ty) -> ty exp : type z : exp s : exp -> exp lam : (exp -> exp) -> exp app : exp -> exp -> exp of : exp -> ty -> type

Necula’s Example Typing Rules for System F tz : of z int ts : {E:exp} of E int -> of (s E) int tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2) tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T) tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)

LF Representation Consider the lambda expression It is represented in LF as follows: ( f.(f x.x) (f 0)) y.y app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)

Necula’s Example Now suppose that this term is an applet, with the safety policy that all applets must be well-typed in System F. One way to make a PCC is to attach a typing derivation to the term.

Typing Derivation in LF (tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))

Typing Derivation in LF i (tapp * * (all ([T:*] arr T T)) int (tlam * * * ([F:*][FT:of F (all ([T:ty] arr T T))] (tapp * * int (tapp * * (arr int int) (arr int int) (tins * * * FT) (tlam * * * ([X:*][XT:*] XT))) (tapp * * int int (tins * * * FT) t0)))) (tgen * * ([T:*] (tlam * * * ([Y:*] [YT:*] YT))))) I think. I did this by hand!

LF Representation Using 16 bits per token, the LF representation of the typing derivation requires over 2,200 bits. The LF i representation requires about 700 bits. (The term itself requires only about 360 bits.) Skip ahead

A Bit More about LF i To convert an LF term into an LF i term, a representation algorithm is used. [Necula&Lee, LICS98] Intuition: When typechecking a term: c M 1 M 2 … M n : A (in a context ) we know, if A has no placeholders, that some of the M 1 …M n may appear in A.

A Bit More about LF i, cont’d For example, when the rule is applied at top level, the first two arguments are present in the term and thus can be elided. tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T app (lam [F:exp] app (app F (lam [X:exp] X)) (app F 0)) (lam [Y:exp] Y)

A Bit More about LF i, cont’d A similar trick works at lower levels by relying on the fact that typing constraints are solved in a certain order (e.g., right-to-left). See the paper for complete details.

Can We Do Better? tz : of z int ts : {E:exp} of E int -> of (s E) int tlam : {E:exp->exp} {T1:ty} {T2:ty} ({X:exp} of X T1 -> of (E X) T2) -> of (lam E) (arr T1 T2) tapp : {E1:exp} {E2:exp} {T:ty} {T2:ty} of E1 (arr T2 T) -> of E2 T2 -> of (app E1 E2) T tgen : {E:exp} {T:ty->ty} ({T1:ty} of E (T T1)) -> of E (all T) tins : {E:exp} {T:ty->ty} {T1:ty} of E (all T) -> of E (T T1)

Determinism Looking carefully at the typing rules, we observe: For any typing goal where the term is known but the type is not: 3 possibilities: tgen, tins, other. If type structure is also known, only 2 choices, tapp or other.

How Much Essential Information? (tapp (lam [F:exp] (app (app F (lam [X:exp] X)) (app F 0))) (lam ([X:exp] X)) (all ([T:ty] arr T T)) int (tlam (all ([T:ty] arr T T)) int ([F:exp] (app (app F (lam [X:exp] X)) (app F 0))) ([F:exp][FT:of F (all ([T:ty] arr T T))] (tapp (app F (lam [X:exp] X)) (app F 0) int int (tapp F (lam [X:exp] X) (arr int int) (arr int int) (tins F ([T:ty] arr T T) (arr int int) FT) (tlam int int ([X:exp] X) ([X:exp][XT:of X int] XT))) (tapp F 0 int int (tins F ([T:ty] arr T T) int FT) t0)))) (tgen (lam [Y:exp] Y) ([T:ty] arr T T) ([T:ty] (tlam T T ([Y:exp] Y) ([Y:exp] [YT:of Y T] YT)))))

How Much Essential Information? There are 15 applications of rules in this derivation. So, conservatively: log 2 3  15 = 30 bits In other words, 30 bits should be enough to encode the choices made by a type inference engine for this term.

Oracle-based Checking Idea: Implement the proofchecker as a nondeterministic logic interpreter whose program consists of the derivation rules, and initial goal is the judgment to be verified. We will avoid backtracking by relying on the oracle string. Skip ahead

Why Higher-Order? The syntax of VCs for the Java type-safety policy is as follows: The LF encodings are simple Horn clauses (and requiring only first- order unification). Higher- order features only for implication and universal quantification. E ::= x | c E 1 … E n F ::= true | F 1  F 2 |  x.F | E | E  F

Why Higher-Order? Perhaps first-order Horn logic (or perhaps first-order hereditary Harrop formulas) is enough. Indeed, first-order expressions and formulas seem to be enough for the VCs in type-safety policies. However, higher-order and modal logics would require higher-order features.

A Simplification A Fragment of LF Level-0 types.  A ::= a | A 1  A 2 Level-1 types (-normal form).  B ::= a M 1 … M n | B 1  B 2 |  x:A.B Level-0 kinds.  K ::= Type | A  K Level-0 terms (-normal form).  M ::= x:A.M | c M 1 … M n | x M 1 … M n

LF Fragment This fragment simplifies matters considerably, without restricting the application to PCC. Level-0 types to encode syntax. Level-1 types to encode derivations. No level-1 terms since we never reconstruct a derivation, only verify that one exists!

LF Fragment, cont’d ty : type exp : type of : exp -> ty -> type Level-0 types. Level-1 type family. Disallowing level-2 and higher type families seems not to have any practical impact.

Logic Interpreter Goals G ::= B | M = M’ |  x:B.G |  x:A.G | T | G 1  G 2. For Necula’s example, the interpreter will be started with the goal  t : ty. of E t

Naïve Interpreter solve( B 1  B 2 ) =  x:B 1. solve( B 2 ) solve(  x:A.B ) =  x:A. solve( B ) solve( a M 1 … M n ) = subgoals( B, a M 1 … M n ) where B is the type of a level-1 constant or a level-1 quantified variable (in scope), as selected by the oracle. subgoals( B 1  B 2, B ) =  x:B 1. solve( B 2 ) subgoals(  x:A.B’, B ) =  x:A. solve( B ) subgoals( a M 1 ’ … M n ’, a M 1 … M n ) = M 1 = M 1 ’  …  M n = M n ’

Necula’s Example Consider  solve(of E t ) This consults the oracle. Since there are 3 level-1 constants that could be used at this point, 2 bits are fetched from the oracle string (to select tapp).

Higher-Order Unification The unification goals that remain after solve are higher-order and thus only semi-decidable. A nondeterministic unification procedure (also driven by the oracle string) is used. Some standard LP optimizations are also used.

Certifying Theorem Proving

Time does not allow a description here. See: Necula and Lee. Proof generation in the Touchstone theorem prover. CADE’00. Of particular interest: Proof-generating congruence- closure and simplex algorithms.

Certifying Compilation Skip ahead

The Basic Trick Recall the bcopy program: public class Bcopy { public static void bcopy(int[] src, int[] dst) { int l = src.length; int i = 0; for(i=0; i<l; i++) { dst[i] = src[i]; }

Unoptimized Loop Body L11 : movl4(%ebx), %eax cmpl%eax, %edx jaeL24 L17 : cmpl$0, 12(%ebp) movl8(%ebx, %edx, 4), %esi jeL21 L20 : movl12(%ebp), %edi movl4(%edi), %eax cmpl%eax, %edx jaeL24 L23 : movl%esi, 8(%edi, %edx, 4) movl%edi, 12(%ebp) incl%edx L9 : ANN_INV(ANN_DOM_LOOP, %LF_(/\ (of rm mem ) (of loc1 (jarray jint) ))%_LF, RB(EBP,EBX,ECX,ESP,FTOP,LOC4,LOC3)) cmpl%ecx, %edx jlL11 Bounds check on src. Bounds check on dst. Note: L24 raises the ArrayIndex exception.

Unoptimized Code is Easy In the absence of optimizations, proving the safety of array accesses is relatively easy. Indeed, in this case it is reasonable for VCGen to verify the safety of the array accesses. As the optimizer becomes more successful, verification gets harder.

Role of Loop Invariants It is for this reason that the optimizer’s knowledge must be conveyed to the theorem prover. Essentially, any facts about program values that were used to perform and code-motion optimizations must be declared in an invariant.

Optimized Loop Body L7: ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl%esi, %edx jaeL13 movl8(%ebx, %edx, 4), %edi movl%edi, 8(%eax, %edx, 4) incl%edx cmpl%ecx, %edx Essential facts about live variables, used by the compiler to eliminate bounds- checks in the loop body.

Certifying Compiling and Proving Intuitively, we will arrange for the Prover to be at least as powerful as the Compiler’s optimizer. Hence, we will expect the Prover to be able to “reverse engineer” the reasoning process that led to the given machine code. An informal concept, needing a formal understanding!

What is Safety, Anyway? If the compiler fails to optimize away a bounds-check, it will insert code to perform the check. This means that programs may still abort at run-time, albeit with a well-defined exception. Is this safe behavior?

Resource Constraints Bounds on certain resources can be enforced via counting. In a Reference Intepreter:  Maintain a global counter.  Increment the count for each instruction executed.  Verify for each instruction that the limit is not exceeded.  Use the compiler to optimize away the counting operations.

Compiler as a Theorem-Proving Front-End The compiler is essentially a user- interface to a theorem prover. The possibilities for interactive use of compilers have been described in the literature but not developed very far. PCC may extend the possibilities.

Developing PCC Tools

Compiler Development The PCC infrastructure catches many (probably most) compiler bugs early. Our standard regression test does not execute the object code! Principle: Most compiler bugs show up as safety violations.

Example Bug … L42:movl4(%eax), %edx testl %edx, %edx jleL47 L46:… set up for loop … L44:… enter main loop code … … jl L44 jmp L32 L47:fldz fldz L32:… return sequence … ret

Example Bug … L42:movl4(%eax), %edx testl %edx, %edx jleL47 L46:… set up for loop … L44:… enter main loop code … … jl L44 jmp L32 L47:fldz L32:… return sequence … ret Error in rarely executed compensation code is caught by the Proof Generator.

Another Example Bug Suppose bcopy’s inner loop is changed: L7: ANN_LOOP( … ) cmpl%esi, %edx jaeL13 movl8(%ebx, %edx, 4), %edi movl%edi, 8(%eax, %edx, 4) incl%edx cmpl%ecx, %edx jlL7 ret

Another Example Bug Suppose bcopy’s inner loop is changed: L7: ANN_LOOP( … ) cmpl%esi, %edx jaeL13 movl8(%ebx, %edx, 4), %edi movl%edi, 8(%eax, %edx, 4) addl2, %edx cmpl%ecx, %edx jlL7 ret Again, PCC spots the danger.

Yet Another class Floatexc extends Exception { public static int f(int x) throws Floatexc { return x;} public static int g(int x) { return x;} public static float handleit (int x, int y) { float fl=0; try { x=f(x); fl=1; y=f(y); } catch (Floatexc b) { fl+=fl; } return fl; }

Yet Another …Install handler… pushl$_6except8Floatexc_C call__Jv_InitClass addl$4, %esp …Enter try block… L17: movl$0, -4(%ebp) pushl8(%ebp) call_6except8Floatexc_MfI addl$4, %esp movl%eax, %ecx … …A handler… L22: flds-4(%ebp) fadds-4(%ebp) jmpL18 … To the end

Why PCC May be a Reasonable Idea

Ten Good Things About PCC 1. Someone else does all the really hard work. 2. The host system changes very little....

Logic as a lingua franca Certifying Prover CPU Code Proof Engine

Logic as a lingua franca Certifying Prover CPU Proof Checker Policy VC Code Language/compiler/machine dependences isolated from the proof checker. Expressed as predicates and derivations in a formal logic.

Logic as a lingua franca Certifying Prover CPU … iadd iaload... Proof Checker Policy VC Code can be in any language once a Safety Policy is supplied.

Logic as a lingua franca Certifying Prover CPU … addl %eax,%ebx testl %ecx,%ecx jz NULLPTR movl 4(%ecx),%edx cmpl %edx,%ebx jae ARRAYBNDS movl 8(%ecx.%ebx.4).%edx... Proof Checker Policy VC … addl %eax, % testl %ecx,%e jz NULLPTR movl 4(%ecx),% cmpl %edx,%eb jae ARRAYBND movl 8(%ecx. Adequacy of dynamic checks and “wrappers” can be verified.

Logic as a lingua franca Certifying Prover CPU … add %eax,%ebx movl 8(%ecx,%ebx,4)... Proof Checker Policy VC Safety of optimized code can be verified.

Ten Good Things About PCC 3. You choose the language. 4. Optimized (“unsafe”) code is OK. 5. Verifies that your optimizer and dynamic checks are OK. …

The Role of Programming Languages Civilized programming languages can provide “safety for free”. Well-formed/well-typed  safe. Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.

Certifying Compilers [Necula & Lee, PLDI’98] Intuition: Compiler “knows” why each translation step is semantics-preserving. So, have it generate a proof that safety is preserved. “Small theorems about big programs.” Don’t try to verify the whole compiler, but only each output it generates.

Automation via Certifying Compilation Certifying Compiler CPU Proof Checker Policy VC Source code Proof Object code Looks and smells like a compiler. % spjc foo.java bar.class baz.c -ljdk1.2.2

Ten Good Things About PCC 6. Can sometimes be easy-to-use. 7. You can still be a “hero theorem hacker” if you want....

Ten Good Things About PCC 8. Proofs are a “semantic checksum”. 9. Possibility for richer safety policies. 10. Co-exists peacefully with crypto.

Acknowledgments George Necula. Robert Harper and Frank Pfenning. Mark Plesko, Fred Blau, and John Gregorski.

Microsoft’s ActiveF Technology y o uy o u WANT TO prove TODA y ? WH a T DO