Overview of Back-end for CComp Zhaopeng Li Software Security Lab. June 8, 2009
Outline Design Points Assembly Language : “x86” Low-level Intermediate Language Future Work
Design Points Assembly Language – Target : SCAP with x86 abstract machine; – Maybe next version the program logic is changed; – Or another machine will be used. Low-level Intermediate Language – Hide some machine-specific things; – Note that, this level can be just a helper to generate code and proof.
Assembly Language : “x86”
Some Topics about “x86” Data Representation – 32-bit vs “fake” 32-bit Don’t care how to store the data as bits. Integer : 4 bytes Pointer : 4 bytes Data Alignment Callee-saved Registers – EBX, ESI, EDI, EBP
Some Topics about “x86” (cont.) Calling convention: 1.Parameters passed on the stack, pushed from right to left; Or the first three are passed through register EAX, ECX and EDX, and the other are passed on the stack; 2.Register EAX, ECX, and EDX are used in the callee; Other registers must be saved on the stack and pop before the return of the function; 3.Return value is stored in the register EAX ; 4.Caller cleans up the stack (parameter).
Some Topics about “x86” (cont.) Prolog (typical) _function: push ebp ;store the old base pointer mov esp, ebp ;make the base ; pointer point to the current stack ; location sub x, esp ; x is the size, in bytes Epilog(typical) mov ebp, esp ;reset the stack to ; "clean" away the local variables pop ebp ;restore the original base pointer ret ;return from the function ebp old ebp old eip parameters esp local variables ebp esp old ebp old eip parameters local variables … … … … … … old eip parameters ebp … … esp func. entry after Stack frame setup after the return enter x, 0 leave ret leave ret
Assembly Abstract Machine “m86” Code Heap (C) – Code storage, – Unchanged during execution Machine State – Memory (M) – Register File (R) – Instruction Pointer (eip), current instruction c = C(eip) Or just use instruction sequence (I)
Assembly Language : “x86” “AT&T-syntax” Reg. r ::= eax | ebx | ecx | edx | esi | edi | esp | ebp FReg. fr ::= sf | zf Int. b ::= n (integer) Instr. i ::= add r 1, r 2 | addi n, r | sub r 1, r 2 | subi n, r | mul r 1, r 2 | muli n, r | mov r 1, r 2 | movi n, r | movs r 1, n(r 2 ) | movl n(r 1 ), r 2 | push r | pop r | cmp r 1, r 2 | cmpi n, r | je r, b | jne r, b | jg r, b | jge r, b | jmp b | call b | ret | enter n, 0 | leave | malloc r | free r
Program Logic Based on SCAP Specification (p, g) – p : State -> Prop – g : State -> State -> Prop Inference Rules – Well-formed program Well-formed basic block Well-formed instruction
Main Objects Code Generation – Minimize the proof size Eg. the temporary result should be put in register not on the stack Assertion – Building (p, g) for each basic block – Generating (p, g) for each program point Proof – Generating proof for functions/basic blocks – (reusing the proof of VC in source level)
Assertion Relationship Basic block1 f : {p} //{q} Basic block1 Basic block2 L1 : {p 1 } f : {(p’, g)} L1 : {(p’ 1,g 1 )} Intermediate Language x86 Assembly Lanuage p’ = trans(p) /\ param p /\stack-reg p g = trans(q) /\ callee-saved-reg g /\ stack g p’ = trans(p) /\ param p /\stack-reg p g = trans(q) /\ callee-saved-reg g /\ stack g p’ 1 = trans(p 1 ) /\ param p 1 /\ stack-reg p 1 g 1 = ? p’ 1 = trans(p 1 ) /\ param p 1 /\ stack-reg p 1 g 1 = ?
Figure Out G push ebp mov esp, ebp sub $12, esp push ebp mov esp, ebp sub $12, esp Basic block2 f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} L1 : {g1} R 0 (ebp) = R(ebp) /\ R 0 (esp) = R(esp) -4 R’(ebp) = R(ebp) /\ R 0 (ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R 0 (esp) = R(esp) -4 R’(ebp) = R(ebp) /\ R 0 (ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R 0 (esp) = R(esp) -4 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 Leave ret Leave ret R’ R R R0R0 R0R0 g0g0 g0g0 The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic. The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic.
Figure Out G (cont.) push ebp mov esp, ebp sub $12, esp push ebp mov esp, ebp sub $12, esp Basic block2 f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} L1 : {g1} R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R 1 (ebp) = R 0 (esp) /\ R 1 (esp) = R 0 (esp) R’(ebp) = R 0 (ebp) /\ R 1 (ebp) = R 0 (esp) /\ R’(esp)=R 0 (esp)+8 /\ R 1 (esp) = R 0 (esp) R’(ebp) = R 0 (ebp) /\ R 1 (ebp) = R 0 (esp) /\ R’(esp)=R 0 (esp)+8 /\ R 1 (esp) = R 0 (esp) R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R0R0 R0R0 R1R1 R1R1 Leave ret Leave ret R’ R R g0g0 g0g0 g1g1 g1g1 The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic. The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic.
Figure Out G (cont.) push ebp mov esp, ebp sub $12, esp push ebp mov esp, ebp sub $12, esp Basic block2 f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4} L1 : {g1} R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = R 0 (ebp) /\ R’(esp)=R 0 (esp)+8 R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R’(ebp) = M 1 (R 1 (ebp)) /\ R’(esp)=R 1 (esp)+8 R0R0 R0R0 R1R1 R1R1 Leave ret Leave ret R’ R R R 2 (ebp) = R 1 (ebp) /\ R 2 (esp) = R 1 (esp)-12 R’(ebp) = M 1 (R 1 (ebp)) /\ R 2 (ebp) = R 1 (ebp) /\ R’(esp)=R 1 (esp)+8 /\ R 2 (esp) = R 1 (esp)- 12 R’(ebp) = M 1 (R 1 (ebp)) /\ R 2 (ebp) = R 1 (ebp) /\ R’(esp)=R 1 (esp)+8 /\ R 2 (esp) = R 1 (esp)- 12 R’(ebp) = M 2 (R 2 (ebp)) /\ R’(esp)=R 1 (esp)+20 R’(ebp) = M 2 (R 2 (ebp)) /\ R’(esp)=R 1 (esp)+20 R2R2 R2R2 g0g0 g0g0 g1g1 g1g1 g2g2 g2g2 The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic. The method: 1.Get state relation by rule of operational semantics; 2.Use the g of previous program point; 3.Do substitution and arithmetic.
Low-level Intermediate Language
Potential Benefits Hide some machine-specific things; Some optimizations could be done (optional); Make the implementation simple and reusable – (*Note that, this level is just a helper to generate code and proof.*) – Only add codes for translating from this level when targeting different assembly logic
The Language Loc. l ::= r | s Int. o,b ::= n (integer) Slot. s ::= local(o) | incoming(o) | outgoing(o) Reg. r ::= r 1 | r 2 | r 3 | … //infinite pseudo-registers Instr. i ::= bop(bop, l 1,l 2, l) | uop(uop, l 1, l) | load(r, o, l) | store(l, r, o) | getstack(s, r) | setstack(r, s) | call(id, l) | return r | malloc(r) | free(r) | goto b | label (b) | cond(l 1, cmp,l 2, b true ) BinOp. bop::= add | sub | mul | … UnOp. Uop::= minus | … Comp. cmp::= gt | ge | eq | ne | lt | le
Code Generation (optional) Do some optimizations which do no affect proof, such as: – Branch tunneling – Dead code elimination Future optimizations – Other low-level optimizations may be done here