Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen)

 Compositional specification and verification of high- level behavioural properties of low-level systems code  Previous work of Benton et al employed idealized machine code  Simple design  Infinite memory; pointers are natural numbers  It’s time to get real(ish): hence, x86 The big picture

 Modelling x86: bits, bytes, instructions, execution  Generating x86: assembling & compiling  Reasoning about x86: logic & proofs  Discussion Overview of talk

 Clean slate: trusted base is just hardware and its model in Coq. †  No dependencies on legacy code, languages, compilers, or software architectures  Verify everything – including (at some point) loader-verifier  Do everything in Coq, making effective use of computation, notation, type classes, tactics, etc.  No dependencies on external tools  Coq as “world’s best macro assembler” Our approach † And a small boot loader

Modelling x86

 x86 has a bad reputation  On first glance at manuals, wholly justified!  By picking a subset, we avoid most of the messiness  If we can do x86, we can do anything! Modelling x86

 We want to compute correctly and efficiently inside Coq  Proper modelling of n-bit words, arithmetic with carry, sign, overflow, rotates, shifts, padding, the lot, all O(n)  Generic over word-length, so index type by n : nat  We also want to reason soundly inside Coq  Associativity, commutativity, order properties, etc Bits, bytes and words Compute here: n-tuples of bools Compute here: n-tuples of bools Reason here: 'Z_(2^n) from ssreflect library, reuse lemmas

Example: definition of addition Effective use of dependent types Definition is very algorithmic: so we can compute! Performance inside Coq? On this machine, about 2000 additions a second

Example: proofs about addition 1. Deal with n=0 case 4. Apply ssreflect “ring” lemma for 'Z_(2^n) 2. Apply injectivity of toZp to work in 'Z_(2^n): forall x y, toZp x = toZp y -> x = y 3. Rewrite using homomorphism lemmas e.g. toZp (addB p1 p2) = (toZp p1 + toZp p2)%R

Machine state

 x86 is notoriously large and baroque (instruction set manual alone is 1640 pages long)  Subset only: no legacy 16-bit mode, flat memory model (no segment nonsense), no floating point, no SIMD instructions, no protected-mode instructions, no 64-bit mode (yet) Actually: not too bad, possible to factor so that Coq datatype is “total” (no junk) X86 instructions

Addressing modes e.g. ADD EBX, EDI + [EDX*4] + 12

 Manuals don’t reveal much “structure” – such as it is – in instruction format  But it can be discerned – and utilitised for concise decoding functions Instruction format

Instruction decoding Uses monadic syntax, reader reads from memory and advances pointer Note: there may be many instruction formats for the same instruction

 Currently, a partial function from State to State.  Implemented in monadic style, using “primitive” operations of r/w register, r/w flag, r/w memory, etc.  Factored to re-use common patterns e.g. evalMemSpec, evalSrc Instruction execution Example fragment: call and return

Non-determinism & under-specification

 For sequential x86, for the subset we care about, almost completely deterministic  Flags are the main issue.  Introduce “undefined” state for flags  Instructions that depend on a flag whose value is undefined (e.g. branch-on-carry) then has unspecified behaviour  An alternative would be to set flags non- deterministically (cf RockSalt) Representing non-determinism and under-specification

Generating x86: Assembling and Compiling

 Directly represent encoding by list of bytes  Note: encoding is position-dependent  In future we might mirror decoding using a monadic style Instruction encoding

 Targets of jumps and branches are just absolute addresses in the Instr type. To write assembler code we want labels – for this we use a kind of HOAS type: Jumps and labels

 Cute use of notation in Coq: can write assembler code more-or-less using syntax of real assemblers!  But also make use of Coq definitions, and “macros” Syntax matters While macro Label Label binding

 Given an assembler program and an address to locate it, we can produce a sequence of bytes in the usual “two- pass” way: Assembling

 Statement of correctness uses overloaded “points-to” predicate, to be described later Round-trip theorem Memory between offset and endpos contains bytes Memory between offset and endpos decodes to prog

 Instead of trusting – or modelling – existing languages such as C, we plan to develop little languages inside Coq.  We have experimented with a tiny imperative language and its “compiler”, proved correct in Coq Little languages

Code demo!

Reasoning about x86: Logic and Proof

 Assertion logic: predicate on partial states, usual connectives + separating conjunction  Specification logic over this, incorporates step-indexing and framing, with corresponding later and frame connectives  Safety specification used to give rules for instructions, in CPS style, packaged as Hoare-style triples for non-jumpy instructions  Treatment of labels makes for elegant definition and rules for macros (e.g. while, if) Big picture

 Partiality denotes partial description, as usual for separation logic  Not to be confused with use of partiality for flags (undefined state) and memory (un-mapped or inaccessible) Partial states

 Assertions (= SPred) are predicates on partial states Assertion logic  We define a separation logic of assertions, with usual connectives. Example rules:  Points-to predicate for memory is overloaded for different “decoders” of memory Core definition: memory from p to q “decodes” to value x x could be a BYTE, a DWORD, a seq BYTE or even an Instr

 Machine code does not “finish” and so standard Hoare triple does not suit; also, code is mixed up with store. So we define safe k P to mean “runs without faulting for k steps from any state satisfying P.” Safety  Example: tight loop  Example: jmp

Specification logic

Connectives for spec logic  It gives us a “frame rule” for specs, and distributes over other connectives

 Given our definitions of safety and points-to for instructions, we can mimic Hoare-style triples for basic blocks: Basic blocks  We can then derive familiar rules such as framing:  This is useful when proving straight-line machine code

Rules for instructions (I) No control flow Use Hoare-like triple

Rules for instructions (II) Control flow Explicit CPS-like use of safe Two possible continuations

 We overload “points-to” on assembler programs, so (roughly) Reasoning with labels

 Our representation of scoped labels makes it easy to define macros that make use of labels internally – and derive rules for them. Macros

Putting it together: A spec for a memory allocator

Trivial implementation of allocator

 Very painful to work with assertions and specs using only primitive rules  We have built Coq tactic support for  Basic simplification of formulae (AC of *, etc.)  Pulling out existential quantifiers automatically  Greatly simplifies proving! Proof support

Proof demo!

 We can generate and prove correct tiny programs written in “Coq” assembler and a small while-language  Binary generated by Coq can be run on “raw metal” (booted off a CD!)  Next steps  Model of I/O e.g. screen/keyboard; currently our “observable” is just “faulting”  High-level model of processes  Build and verify OS components such as scheduler, allocator, loaded  Eventual aim: process isolation theorem Status

Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen)

Similar presentations

Presentation on theme: "Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen)

Similar presentations

Presentation on theme: "Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen)"— Presentation transcript:

Similar presentations

About project

Feedback