Low-Level Program Verification
Components of a Certifying Framework Specifications No Proof Proof Checker Yes machine code CPU certified code (machine code + proof) specifications: program safety/security/correctness + machine model automated proof checker need not trust the correctness of proofs
Low-Level Machine Code Verification Machine code is the executable form of programs Why verify machine code Bugs in compilers may produce buggy machine code, even if source code is correct There are manually written assembly code in OS kernels
The Machine (program) P ::=(C,S,I) (data heap) H f1: I1 addu … lw … sw … … j f pc 1 2 … f2: I2 r1 r2 r3 … rn f3: I3 (register file) R … (code heap) C (state) S ::=(H,R) (instr. seq.) I ::={f I}* (program) P ::=(C,S,I)
Operational semantics
The CAP Logic Judgments Certified Assembly Programming [Yu et al. ESOP 2003] Certified Assembly Programming Judgments
State assertions - Examples a S. S.H(100) > 0 S.R(r1) = 17 a' S. odd(S.R(r1) ) S. a S a' S
Inference Rules Well-formed program: Well-formed code heap:
Inference Rules (2)
Inference Rules (3) means logical implication
Verification of malloc/free
Verification of malloc/free (2)
Soundness Lemma (Preservation). If and , then there exists an assertion a’ such that . Lemma (Progress). If , then there exists a program such that
Soundness (2) Theorem (Soundness). If , then for all natural number n, there exists a program such that , and then then and then
Program Specifications (spec) ::= {f a}* a1 a2 a3 (data heap) H a f1: I1 addu … lw … sw … … j f pc 1 2 … f2: I2 r1 r2 r3 … rn f3: I3 (register file) R … (code heap) C (state) S ::=(H,R) (instr. seq.) I ::={f I}* (program) P ::=(C,S,I)
Invariant-Based Verification P0 c1 P1 c2 P2 c3 … cn Pn Initial condition: Inv(P0) Progress: if Inv(P), then P’. P c P’. Preservation: if Inv(P) and P c P’, then Inv(P’). Invariants are hard to find
How to verify function call? fp stack void f(){ void h(){ h(); return; return; } } ra ?? ct f: ... sw $ra, -4($fp) h: jal h ;; $ra contains ct ct: lw $ra, -4($fp) jr $ra ... jr $ra pc Does f use the right return addr.?
Specifications Challenges SCAP specifications: (p, g) f uses the “right” return addr.? Hoare triple {p} f {q}? In different basic blocks! g0 f: ... sw $ra, -4($fp) jal h ct: lw $ra, -4($fp) jr $ra {(p0, g0)} {$ra = n …} SCAP specifications: (p, g) p: State Prop g: State State Prop g1 {(p1, g1)} What is state? State contains memory, register file The exit point does not have to be return point: raise exceptions, weak continuations, context switching g0 S S’ S’.$ra = S.$ra …
Program Spec. and Code Pointers Program Specification ::= {f1(p1,g1), …,fn(pn,gn)} “safe” to return (jr $ra): $radom() ($ra)=(p,g) p holds at the time of return p1 jal f p2 jal h g0 g1 g2 p3 jr $ra p4 g3 jr $ra Only code labels specified in \Psi are good code pointers To make sure it is safe to jump to some code, the code label must be defined in \Psi with some spec (p, g) … g4 jr $ra
SCAP : Stack Invariant Always safe to return? … Logical control stack jr $ra g0 S0 S1 S1.$ra (S1.$ra))=(p1, g1) p1 S1 S1 g1 p2 g0 S0 S1 g1 S1 S2 S2.$ra (S2.$ra)=(p2, g2) p2 S2 S2 g2 p3 S3 g0 S0 S1 g1 S1 S2 g2 S2 S3 S3.$ra (S3.$ra)=(p3, g3) p3 S3 g3 … Logical control stack
SCAP : Stack Invariant Invariant: WFST(n, g0, S0, ) S1. g0 S0 S1 p1,g1. (S1.$ra)=(p1, g1) p1 S1 WFST(n-1, g1, S1, ) WFST(0, g0, S0, ) S1. g0 S0 S1 Invariant: p S n.WFST(n, g, S, ) p0 S0 g0 p1 jr $ra S1 g1 p2 S2 g2 p3 S3 g3 Logical control stack
SCAP : Invariant Preservation Inv(S): p S n.WFST(n, g, S, ) c p S n.WFST(n,g,S,) S S’ p’ S’ n.WFST(n,g’,S’,) p’,g’
SCAP: call … … p S WFST(n, g, S, ) p0 S0 WFST(n+1, g0, S0, ) p jr $ra p1 jr $ra g1 S1 jal f g1 S1 n n S2 S2 … … Proof obligations p S p0 S0 g0 S0 S1 S0.$ra = S1.$ra p S g0 S0 S1 p1 S1 p S g0 S0 S1 g1 S1 S2 g S S2
SCAP: the call rule (p0, g0) = (f) (p1, g1) = (fret) H,R. p (H,R) p0 (H,R{rafret}) H,R,S1. p (H,R) g0 (H,R{rafret}) S1 p1 S1 (S2. g1 S1 S2 g S S2) S0,S1. g0 S0 S1 S0.$ra = S1.$ra |- {(p,g)} jal f fret
SCAP: ret … … p S g S S1 p S WFST(n, g S, ) jr $ra n-1 n-1 … … p S g S S1
SCAP: return rule S. p S g S S |- {(p,g)} jr $ra
SCAP: direct jump (or tail call) p S WFST(n, g S, ) p0 S0 WFST(n, g0 S0, ) p p0 p0 S S0 g0 g0 g jr $ra jr $ra j f n n S1 S1 … … p S p0 S0 p S g0 S0 S1 g S S1
SCAP: sequential |- {(p’,g’)} I S. p S p’(AuxStep(c,S)) S,S’. p S g’(AuxStep(c,S)) S’ g S S’ |- {(p,g)} c;I
Other control flows Stack unwinding Stack cutting setjmp/longjmp in C
Call with Multiple Return Addr. g1 jr ra p g Multi-ret
Call with Multiple Ret. or Tail Call
Generalization: Stack unwinding/cutting p1 g1 jr ra p g Multi-ret g1 p1 jr ra p g + p1 g1 jr ra p g Tail-call
Change of Invariant
env cannot outlive the stack frame of rev ! setjmp/longjmp jmp_buf env = …; void cmp0(int x,jmp_buf env){ cmp1(x, env); } int rev(int x){ if (setjmp(env) == 0){ cmp0(x, env); return 0; }else{ return 1; } f0 pc pc void cmp1(int x,jmp_buf env){ if (x == 0) longjmp(env, 1); else return; } pc env f0 … … sp …
Read the paper at: http://flint. cs. yale. edu/flint/publications/sbca