Modular Machine Code Verification Zhaozhong Ni Advisor: Zhong Shao Committee: Zhong Shao, Paul Hudak Carsten Schürmann, David Walker Department of Computer Science, Yale University Nov. 29, 2006 PhD Thesis Defense
2 19 Lines of Code on Every PC swapcontext: ; store old context mov eax, [esp+4] mov [eax+0], OK mov [eax+4], ebx mov [eax+8], ecx mov [eax+12], edx mov [eax+16], esi mov [eax+20], edi mov [eax+24], ebp mov [eax+28], esp ; load new context mov eax, [esp+8] mov esp, [eax+28] mov ebp, [eax+24] mov edi, [eax+20] mov esi, [eax+16] mov edx, [eax+12] mov ecx, [eax+8] mov ebx, [eax+4] mov eax, [eax+0] ret
3 19 Lines of Code in Every ms swapcontext: Runs thousands of time per second Used by assembly, C, MSIL, JVML, etc. Basis of multi-tasking, OS, and software Safety and correctness taken for granted
4 swapcontext: old 19 Lines of Code Looks Simple eax ebx ecx edx esi edi ebp esp retp … … call swapcontext … retp’ … ……………… a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 b1b1 b2b2 b3b3 b4b4 b5b5 b6b6 b7b7 b8b8 OK new a8a8
5 19 Lines of Code Proven Hard swapcontext: Simple code, complex reasoning! stack / heap / memory mutation procedure call / first-class code pointer protection / polymorphism Lack specification and verification that are formal (machine checkable in sound logic) general (allows all possible usage of context) realistic (usable from assembly and C level)
6 Outline Introduction The XCAP Framework Mini Thread Library Connect XCAP to TAL Conclusion
7 Software Reliability Bugs are costly Especially important for mission-critical software consumer electronics software internet software
8 Test-Patch Approach Works most of the time Gives no guarantee Could make things worse test pre-release? create patch debug no yes
9 Language-based Approach Uses types and other formal specifications Excludes all bugs in certain categories illegal command, overflow, dangling pointer, etc. Successful and popular ML, Java, C#, etc. Reached virtual machine code level JVML, MSIL, TIL, TAL, etc. Meta-theorems can make guarantees
10 Traditional Assumptions Types are for application software you can not write OS without (void *) Types are for high-level languages not much to talk about B CD 15 Types are only for “no blue screen” how about “variable x is a prime number” Type safety are bad for performance turn off array-bound checking before release
11 Program Specification bool prime (int n) { assert (n > 0); for (int i = 2; i < n; i ++) // n mod 2,…,i-1 ≠ 0 if (n % i == 0) return false; // n mod 2,…,n-1 ≠ 0 return true; } syntactic types machine-logical specifications meta-logical specifications
12 Machine Code Verification Motivations everything goes down to binary high-level safety efforts lost in compilation critical code directly written in low level Challenges Expressiveness Modularity Goals both user and system level code modular specification + certification
13 Proof-Carrying Code CodeProof Checker Meta theory Specification Proposed 10 years ago [Necula & Lee] machine code machine checkable proof
14 Foundational PCC CodeProof Checker Meta theory Specification Proposed by [Appel] mathematic logic checkermathematic logic theory
15 Approaches to PCC Type-based PCC TAL [Morrisett98] Touchstone PCC [Colby00] Syntactic FPCC [Hamid02] FTAL [Crary03] LTAL [Chen03] … Modular Generate proof easily Type safety Logic-based PCC Original PCC [Necula98] Semantic FPCC [Appel01] CAP [Yu03] Open Verifier [Chang05] CCAP/CMAP [Yu04, Feng05] … Expressive Advanced properties Good interoperability
16 PCC After 10 Years In principle, can verify any machine code! In reality, many programs are not verified. For some code, we do not know HOW! CodeProof Checker Meta theory Specification
17 User-level Code: List Append Adapted from [Reynolds02] ……
18 User-level Code: List Append Adapted from [Reynolds02] ……
19 Type-basedLogic-based Inductive definitions (correctness of list append) -+ Strong update (Separation logic) (allocation, de-allocation, mutation) -+ Embedded code pointers (continuation) +- Impredicative polymorphisms (closure) +- Adapted from [Reynolds02] User-level Code: List Append
20 ECP Problem w. Hoare Logic Embedded code pointers (ECP) Examples: computed GOTOs, higher-order functions, indirect jumps, continuations, return addresses “… are difficult to describe in … Hoare logic” [Reynolds02] Previous approaches Ignore ECP [Necula98, Yu04] Limit ECP specifications to types [Hamid04] Sacrifice modularity [Yu03] Use complex indexed semantic models [Appel01]
21 Outline Introduction The XCAP Framework Mini Thread Library Connect XCAP to TAL Conclusion
22 The XCAP Framework [POPL’06] A logic-based PCC framework modular verification of machine code supports ECP without compromise Support both system and user code Consists of target machine (not fixed) assertion language (consistency) inference rules (soundness)
23 Target Machine
24 Dynamic Semantics
25 Hoare logic in CPS Use general predicate logic for assertions example: Mechanized in a proof assistant (Coq) Extensions made: CCAP, CMAP, etc. Certified Assembly Programming [Yu03, Hamid04, Yu04, Feng05]
26 How CAP Certify Instructions
27 How CAP Certify Programs …
28 The ECP Problem cptr(f, a) = ?
29 Internalize Hoare-derivation for ECP Previous Approach Circularity! Stratification [OHearn97, Naumann01] Works for simple case Hard for assembly Hard for polymorphism Step-Indexing [Appel01, Appel02, Schneck03] Works for polymorphism Heavyweight Not standard Hoare logic
30 CAP’s Approach Specify ECP by checking against code spec Verify all code specs are indeed valid Modularity problem
31 The XCAP Approach Specify ECP independent of code spec Check ECP against global code spec Verify global code spec is indeed valid
32 Extended Propositions
33 XCAP Rules
34 How XCAP Works with ECP (SEQ) (ECP) (JMP) (JD)
35 Verification of append()
36 Impredicative Polymorphisms Important for ECP Naïve interpretation function fails
37 New Interpretation Soundness of interpretation Interpretation Consistency
38 Recursive Specification Simple recursive data structures linked list, queue, stack, tree, etc. supported via inductive definition of Prop Complex recursive structures with ECP object (self refers to the entire object) threading invariant (each thread assumes others) Recursive specification
39 Memory Mutation Strong update special conjunction (p * q) in separation logic directly definable in Prop and PropX explicit alias control, popular in system level Weak update (general reference) mutable reference (int ref) in ML managed data pointers (int __gc*) in.NET rely on GC to recycle memory popular in user level
40 Weak Update Reference cell Interpretation Record macro
41 Implementation in Coq PropX can share similar tactics with Prop Target machine 341 lines PropX, interpretation, and consistency1733 lines XCAP with soundness 444 lines CAP with soundness 402 lines CAP to XCAP translation with proof 543 lines Separation logic and lemmas 300 lines append() example1718 lines
42 Outline Introduction The XCAP Framework Mini Thread Library Connect XCAP to TAL Conclusion
43 Why Thread Library? Concurrent verification primitives’ correctness is assumed primitives are not really “primitive”! poor portability due to lack of formal spec Core of OS kernel assignment 1 of OS course written in C and Assembly requires both safety and efficiency
44
45 A Mini Thread Library Modeled after Pth Non-preemptive user level threads Written in (subset of) x86 assembly
46 Threading Model
47 Modules and Interfaces
48 Verify That 19 Lines of Code Step 1: specify machine context Step 2: specify function call/return Step 3: specify swapcontext() Step 4: prove it!
49 Machine Context … … … … retv bx cx dx si di bp sp cs mctx public private typedef struct mctx_st *mctx_t; struct mctx_st {int eax,int ebx,int ecx,int edx, int esi, int edi, int ebp,int esp }; ret
50 Function Call / Return local storage return address argument 1 argument 2 … argument n caller frames excess space esp
51 swapcontext() void swapcontext (mctx_t old, mctx_t new); mov eax, [esp+4] mov [eax+ 0], OK mov [eax+ 4], ebx mov [eax+ 8], ecx mov [eax+12], edx mov [eax+16], esi mov [eax+20], edi mov [eax+24], ebp mov [eax+28], esp mov eax, [esp+8] mov esp, [eax+28] mov ebp, [eax+24] mov edi, [eax+20] mov esi, [eax+16] mov edx, [eax+12] mov ecx, [eax+ 8] mov ebx, [eax+ 4] mov eax, [eax+ 0] ret
52 Other Context Routines void loadcontext (mctx_t mctx); void makecontext (mctx_t mctx, char *sp, void *lnk, void *func, void *arg);
53 Thread Control Block typedef struct mth_st *mth_t; struct mth_st {mth_t next, mth_state_t state, mctx_st mctx}; mthnext state machine context q NULL state machine context next state machine context
54 Threading Invariant scheduler context mctx_sched sched st mth_cur cur ready threads mth_rq …
55 Threading Routines void mth_yield (void); mth_t mth_spawn (int stacksize, void *(*func)(void *), void *arg); void mth_scheduler (void);
56 Implementation 40,000 lines of Coq code Where comes the complexity? lemma library: large and reusable x86 machine: finite integer embedding: de Burijin indices engineering: limited proof re-use target code: this is the kernel of software!
57 Outline Introduction The XCAP Framework Mini Thread Library Connect XCAP to TAL Conclusion
58 Typed Assembly Language TAL [Morrisett et al] Top-level typing judgment Target of type-preserving compilation For user and simple system level code
59 TAL to XCAP Translation (1) Translation of value types
60 TAL to XCAP Translation (2) Translation of preconditions Translation of code heap types Translation of data heap types
61 Typing Preservation
62 Application Scenario device driver OS kernel firmware user application library TAL XCAP
63 Outline Introduction The XCAP Framework Mini Thread Library Connect XCAP to TAL Conclusion
64 Summarizing XCAP Support user-level machine code demonstrated by type-preserving translation Support system-level machine code demonstrated by mini thread library Support modular machine code verification modular as type expressive as logic
65 Other Work A syntactic approach to FPCC [LICS’02] Simple type safety, no need of indexed model Stack-based control abstractions [PLDI’06] utilizes the fixed ECP pattern to simplify things An open framework for FPCC [TLDI’07] allows different verification styles in a system
66 Some Future Directions Add logic power to higher level languages C and C#, certifying compilation Certify those safe “unsafe” code garbage collector, preemptive thread library, device driver, etc. Consider other properties correctness, liveness, security, etc. Build tools for productivity concrete syntax and parser, large lemma libraries, etc.
67 Thank You!