Certified Code Peter Lee Carnegie Mellon University Informatics Jamboree May 2002
Arianne 5 June 4, 1996
“Better, Faster, Cheaper” 1999 Mars polar lander Mars climate observer
“After a crew member mistakenly entered a zero into the data field of an application, the computer system proceeded to divide another quantity by that zero. The operation caused a buffer overflow, in which data leaked from a temporary storage space in memory, and the error eventually brought down the ship's propulsion system. The result: the USS Yorktown was dead in the water for more than two hours.”
According to CERT, buffer overflow attacks are the #1 exploit for network security attacks. http://www.cert.org/summaries/
429M mobile phones sold in 2001, vs 96M PCs 95% phones will be dynamically programmable by ‘04. 64Mbits of RAM in 2002. Battery life a primary factor. Efficiency and bandwidth are (still) precious.
Observations Simple problems “in the details.” Reuse is critical but perilous Executable content Performance matters a lot
Safety engineering Small theorems about large programs Precise interfaces and checking of interface compliance Good performance
The Safe Code Problem Please install and execute this.
Certified code A “certified” maze A B
Logical frameworks The Edinburgh Logical Framework is a language for specifying logics. Quick tutorial
Proof-carrying code Code LF proof Proof rules Verification condition generator LF typechecker LF proof Agent Proof rules Host
Proof-carrying code OK, but let me quickly look over the instructions first. Please install and execute this. Code producer Host
Proof-carrying code Code producer Host
Proof-carrying code This store instruction is dangerous! Code producer Host
Proof-carrying code Can you prove that it is always safe? Code producer Host
Proof-carrying code Yes! Here’s the proof I got from my certifying Java compiler! Can you prove that it is always safe? Code producer Host
Proof-carrying code Your proof checks out. Code producer Host
Proof-carrying code I believe you because I believe in logic! Code producer Host
Semantics Define the states of the target machine S = (, , pc) and a transition function Step(S). Define also the safe machine states via the safety policy SP(S). program program counter register state
Semantics, cont’d Then we have the following predicate for safe execution: Safe(S) = n:Nat. SP(Stepn(S)) and proof-carrying code: PCC = (S0:State, P:Safe(S0))
Verification conditions The purpose of the verification conditions is to provide predicates that imply Safe(Si) for each machine state Si. Typically we must start with some initial pre/post conditions.
Implementation of PCC Reasonable in size (0-10%). Code Certifying Prover Proof Checker Proof Simple, small (<52KB), and fast. No longer need to trust this component. CPU
The role of programming languages Civilized programming languages can provide “safety for free”. Well-formed/well-typed safe. Idea: Arrange for the compiler to “explain” why the target code it generates preserves the safety properties of the source program.
Certifying compilation Prover Certifying Compiler Source code Proof Object Looks and smells like a compiler. % spjc foo.java bar.class baz.c -ljdk1.2.2 Proof Checker CPU Example
Crypto test suite results sec
Java Grande Suite v2.0 sec
Necula and Lee [’96] A complicated C program! Code LF proof Verification condition generator A complicated C program! LF typechecker LF proof Agent Proof rules Host
Typed Assembly Language [Morrisett, et al., ‘98] Use modern type theory to develop a static type system for machine code. Prove decidability of typechecking. Prove soundness of type system. Developing such a type system is very hard, but done only once.
TAL fact: ALL rho.{r1:int, sp:{r1:int, sp:rho}::rho} jgz r1, positive mov r1,1 ret positive: push r1 ; sp : int::{t1:int,sp:rho}::rho sub r1,r1,1 call fact[int::{r1:int,sp:rho}::rho] imul r1,r1,r2 pop r2 ; sp : {r1:int,sp:rho}:: ret
KVM Example [Frank Yellin, Sun] 0. aload_0 1. astore_1 2. goto 10 Long Number | <> 5. aload_1 6. invokeStatic nextValue(Number) 9. astore_1 10. aload_1 11. invokeVirtual intValue() 14. ffne 5 17. return static void test(Long x) { Number y = x; while (y.IntValue() != 0) { y = nextValue(y); } return y; Join-point typing annotations
Eliminating VCGen We can eliminate VCGen by using the logic to encode a global invariant on states, Inv(S). Then, the proof must show: Inv(S0) S:State. Inv(S) ! Inv(Step(S)) S:State. Inv(S) ! SP(S)
Foundational PCC Appel and Felty [’00] develop a semantic model of types, starting from the foundations of mathematical logic. This model is used to construct the global invariant. Hamid, Shao, et al. define the global invariant to be a syntactic well- formedness condition on machine states.
Temporal-logic PCC Bernard and Lee [’02] define the global invariant via a temporal- logic specification. A trusted generic program then interprets these specifications to extract verification conditions.
Conclusions Code safety is a problem of increasing importance. Certified code is a possible low- cost way to apply type theory and program verification to this problem.
Formal proofs Write “x is a proof of P” as x:P. Example of a predicate: We can write proofs by stitching together the application of rules of inference.
Example inference rule If we have a proof x of P and a proof y of Q, then x and y together constitute a proof of P Q. Or, in ASCII: Given x:P, y:Q then (x,y):P*Q.
More inference rules Assume we have a proof x of P. If we can then obtain a proof b of Q, then we have a proof of P Q. Given [x:P] b:Q then fn (x:P) => b : P Q. More rules: Given x:P*Q then fst(x):P Given y:P*Q then snd(y):Q
Types and proofs So, for example: fn (x:P*Q) => (snd(x), fst(x)) : P*Q Q*P This is an ML program! LF provides additional expressive power and an adequacy theorem. return
Example: Source code public class Bcopy { public static void bcopy(int[] src, int[] dst) { int l = src.length; int i = 0; for(i=0; i<l; i++) { dst[i] = src[i]; }
Example: Target code L7: ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3) ANN_LOOP(INV = { (csubneq ebx 0), (csubneq eax 0), (csubb edx ecx), (of rm mem)}, MODREG = (EDI,EDX,EFLAGS,FFLAGS,RM)) cmpl %esi, %edx jae L13 movl 8(%ebx, %edx, 4), %edi movl %edi, 8(%eax, %edx, 4) incl %edx cmpl %ecx, %edx jl L7 ret L13: call __Jv_ThrowBadArrayIndex ANN_UNREACHABLE nop L6: call __Jv_ThrowNullPointer ANN_LOCALS(_bcopy__6arrays5BcopyAIAI, 3) .text .align 4 .globl _bcopy__6arrays5BcopyAIAI _bcopy__6arrays5BcopyAIAI: cmpl $0, 4(%esp) je L6 movl 4(%esp), %ebx movl 4(%ebx), %ecx testl %ecx, %ecx jg L22 ret L22: xorl %edx, %edx cmpl $0, 8(%esp) movl 8(%esp), %eax movl 4(%eax), %esi
Example: Proof excerpt (LF representation) ANN_PROOF(_6arrays6Bcopy1_MbcopyAIAI, %LF_(andi (impi [H_1 : pf (of _p22 (jarray jint))] (andi (impi [H_2 : pf (of _p23 (jarray jint))] (andi (impi [H_3 : pf (of _p21 mem)] (andi (impi [H_4 : pf (ceq (sub _p23 0))] truei) (andi (impi [H_5 : pf (cneq (sub _p23 0))] (andi (rd4 (arrLen H_2 (nullcsubne H_5)) szint) (andi (nullcsubne H_5) (andi H_3 (andi H_1 (andi (impi [H_10 : pf (nonnull _p23)] (andi (impi [H_11 : pf (of _p64 mem)] (andi (impi [H_12 : pf (of _p65 (jarray jint))] (andi (impi [H_13 : pf (cnlt (sub _p49 (sel4 _p21 (add _p23 4))))] (andi H_11 truei)) (andi (impi [H_15 : pf (clt (sub _p49 (sel4 _p21 (add _p23 4))))] (andi (rd4 (arrLen H_2 H_10) szint) (andi (impi [H_17 : pf (cnb (sub _p49 (sel4 _p64 (add _p23 4))))] (andi (impi [H_18 : pf (cb (sub _p49 (sel4 _p64 (add _p23 4))))] (andi (rd4 (arrElem H_2 H_11 H_10 szint (ultcsubb H_18)) szint) (andi (impi [H_20 : pf (ceq (sub _p65 0))] (andi (impi [H_21 : pf (cneq (sub _p65 0))] (andi (rd4 (arrLen H_12 (nullcsubne H_21)) szint) (andi (impi [H_23 : pf (cnb (sub _p49 (sel4 _p64 (add _p65 4))))] (andi (impi [H_24 : pf (cb (sub _p49 (sel4 _p64 (add _p65 4))))] (andi (wr4 (arrElem H_12 H_11 (nullcsubne H_21) szint (ultcsubb H_24)) szint (jintany (sel4 _p64 (add _p23 (add (mul _p49 4) 8))))) (andi H_10 (andi (ofamem 1) (andi H_12 truei))))) truei)))) truei))) truei)))))) truei)%_LF)
Abadi’s favorite slide rlrrllrrllrlrlrllrlrrllrrll…
Example: Complete proof (Oracle representation) Lprf_6arrays6Bcopy1_MbcopyAIAI: ANN_ARCHW1DECL 0x76, 0xab, 0xb5, 0xd8, 0xeb, 0x10 Example