Download presentation
Presentation is loading. Please wait.
Published byJanice Davidson Modified over 9 years ago
1
Using Coq to generate and reason about x86 systems code Andrew Kennedy & Nick Benton (MSR Cambridge) Jonas Jensen (ITU Copenhagen)
2
Compositional specification and verification of high- level behavioural properties of low-level systems code Previous work of Benton et al employed idealized machine code Simple design Infinite memory; pointers are natural numbers It’s time to get real(ish): hence, x86 The big picture
3
Modelling x86: bits, bytes, instructions, execution Generating x86: assembling & compiling Reasoning about x86: logic & proofs Discussion Overview of talk
4
Clean slate: trusted base is just hardware and its model in Coq. † No dependencies on legacy code, languages, compilers, or software architectures Verify everything – including (at some point) loader-verifier Do everything in Coq, making effective use of computation, notation, type classes, tactics, etc. No dependencies on external tools Coq as “world’s best macro assembler” Our approach † And a small boot loader
5
Modelling x86
6
x86 has a bad reputation On first glance at manuals, wholly justified! By picking a subset, we avoid most of the messiness If we can do x86, we can do anything! Modelling x86
7
We want to compute correctly and efficiently inside Coq Proper modelling of n-bit words, arithmetic with carry, sign, overflow, rotates, shifts, padding, the lot, all O(n) Generic over word-length, so index type by n : nat We also want to reason soundly inside Coq Associativity, commutativity, order properties, etc Bits, bytes and words Compute here: n-tuples of bools Compute here: n-tuples of bools Reason here: 'Z_(2^n) from ssreflect library, reuse lemmas
8
Example: definition of addition Effective use of dependent types Definition is very algorithmic: so we can compute! Performance inside Coq? On this machine, about 2000 additions a second
9
Example: proofs about addition 1. Deal with n=0 case 4. Apply ssreflect “ring” lemma for 'Z_(2^n) 2. Apply injectivity of toZp to work in 'Z_(2^n): forall x y, toZp x = toZp y -> x = y 3. Rewrite using homomorphism lemmas e.g. toZp (addB p1 p2) = (toZp p1 + toZp p2)%R
10
Machine state
11
x86 is notoriously large and baroque (instruction set manual alone is 1640 pages long) Subset only: no legacy 16-bit mode, flat memory model (no segment nonsense), no floating point, no SIMD instructions, no protected-mode instructions, no 64-bit mode (yet) Actually: not too bad, possible to factor so that Coq datatype is “total” (no junk) X86 instructions
12
Addressing modes e.g. ADD EBX, EDI + [EDX*4] + 12
13
Manuals don’t reveal much “structure” – such as it is – in instruction format But it can be discerned – and utilitised for concise decoding functions Instruction format
14
Instruction decoding Uses monadic syntax, reader reads from memory and advances pointer Note: there may be many instruction formats for the same instruction
15
Currently, a partial function from State to State. Implemented in monadic style, using “primitive” operations of r/w register, r/w flag, r/w memory, etc. Factored to re-use common patterns e.g. evalMemSpec, evalSrc Instruction execution Example fragment: call and return
16
Non-determinism & under-specification
18
For sequential x86, for the subset we care about, almost completely deterministic Flags are the main issue. Introduce “undefined” state for flags Instructions that depend on a flag whose value is undefined (e.g. branch-on-carry) then has unspecified behaviour An alternative would be to set flags non- deterministically (cf RockSalt) Representing non-determinism and under-specification
19
Generating x86: Assembling and Compiling
20
Directly represent encoding by list of bytes Note: encoding is position-dependent In future we might mirror decoding using a monadic style Instruction encoding
21
Targets of jumps and branches are just absolute addresses in the Instr type. To write assembler code we want labels – for this we use a kind of HOAS type: Jumps and labels
22
Cute use of notation in Coq: can write assembler code more-or-less using syntax of real assemblers! But also make use of Coq definitions, and “macros” Syntax matters While macro Label Label binding
23
Given an assembler program and an address to locate it, we can produce a sequence of bytes in the usual “two- pass” way: Assembling
24
Statement of correctness uses overloaded “points-to” predicate, to be described later Round-trip theorem Memory between offset and endpos contains bytes Memory between offset and endpos decodes to prog
25
Instead of trusting – or modelling – existing languages such as C, we plan to develop little languages inside Coq. We have experimented with a tiny imperative language and its “compiler”, proved correct in Coq Little languages
26
Code demo!
27
Reasoning about x86: Logic and Proof
28
Assertion logic: predicate on partial states, usual connectives + separating conjunction Specification logic over this, incorporates step-indexing and framing, with corresponding later and frame connectives Safety specification used to give rules for instructions, in CPS style, packaged as Hoare-style triples for non-jumpy instructions Treatment of labels makes for elegant definition and rules for macros (e.g. while, if) Big picture
29
Partiality denotes partial description, as usual for separation logic Not to be confused with use of partiality for flags (undefined state) and memory (un-mapped or inaccessible) Partial states
30
Assertions (= SPred) are predicates on partial states Assertion logic We define a separation logic of assertions, with usual connectives. Example rules: Points-to predicate for memory is overloaded for different “decoders” of memory Core definition: memory from p to q “decodes” to value x x could be a BYTE, a DWORD, a seq BYTE or even an Instr
31
Machine code does not “finish” and so standard Hoare triple does not suit; also, code is mixed up with store. So we define safe k P to mean “runs without faulting for k steps from any state satisfying P.” Safety Example: tight loop Example: jmp
32
Specification logic
33
Connectives for spec logic It gives us a “frame rule” for specs, and distributes over other connectives
34
Given our definitions of safety and points-to for instructions, we can mimic Hoare-style triples for basic blocks: Basic blocks We can then derive familiar rules such as framing: This is useful when proving straight-line machine code
35
Rules for instructions (I) No control flow Use Hoare-like triple
36
Rules for instructions (II) Control flow Explicit CPS-like use of safe Two possible continuations
37
We overload “points-to” on assembler programs, so (roughly) Reasoning with labels
38
Our representation of scoped labels makes it easy to define macros that make use of labels internally – and derive rules for them. Macros
39
Putting it together: A spec for a memory allocator
40
Trivial implementation of allocator
41
Very painful to work with assertions and specs using only primitive rules We have built Coq tactic support for Basic simplification of formulae (AC of *, etc.) Pulling out existential quantifiers automatically Greatly simplifies proving! Proof support
42
Proof demo!
43
We can generate and prove correct tiny programs written in “Coq” assembler and a small while-language Binary generated by Coq can be run on “raw metal” (booted off a CD!) Next steps Model of I/O e.g. screen/keyboard; currently our “observable” is just “faulting” High-level model of processes Build and verify OS components such as scheduler, allocator, loaded Eventual aim: process isolation theorem Status
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.