Code-Carrying Theory Aytekin Vargun Rensselaer Polytechnic Institute
Outline Introduction Proof-Carrying Code (PCC) Code-Carrying Theory (CCT) Generic Proofs Organizing Theorems and Proofs Conclusions and Future Work
Potential Problems to be Solved Memory Safety illegal operations or illegal access to memory Security unauthorized access to data or system resources Functional Correctness whether the code does correctly what it is formally required to do
Two Solutions Proof-Carrying Code (PCC) Code-Carrying Theory (CCT)
Proof-Carrying Code (PCC) Developed by Necula and Lee [1996] at CMU. Basic Idea: Use machine-checkable proofs as certificates. Proof construction is harder than proof checking Code producer provides the proof Code consumer checks it
Code-Carrying Theory (CCT) The consumer gives the specification of the function The producer starts with axioms that define functions The form of axioms is such that it is easy to extract executable code from them. Prove that the defined functions obey certain requirements. Termination Consistency Correctness
Code-Carrying Theory (CCT) The producer transmits Axioms, Theorems, and Proofs No explicit code transmission The consumer checks proofs to see if the theorems are proved If proof checking succeeds, the consumer applies the code extractor to the axioms and obtain the executable code
PCC/CCT Differences PCC starts from code and assertions, CCT starts from assertions only and later extracts code from them PCC concentrates on safety properties which are relatively easy to prove fully automatically, we have concentrated on functional correctness properties which are more difficult We concentrate more on the proof issues with these more challenging types of properties, and less on programming language issues or other issues that PCC deals with more directly
Code-Carrying Theory (CCT) Proving Termination: Use TCGEN to produce the termination condition (TC) and termination axioms Prove TC Proving Consistency: Use CCGEN to produce the consistency condition(CC) Prove it Prove Correctness: Prove correctness conditions (CTC) given by the consumer
Application Specific Requirements Code Producer Code Consumer CPU Termination Condition (TC) Application Specific Requirements Consistency Condition (CC) General Requirements CCTC Proofs Both CC and TC Proved? Assert FDA General Requirements Prove Correctness (CTC) CTC Proved? Axioms(FDA) Proofs Proof Checker Proof Checker TC CC Both Proofs Check? Assert FDA Check Proof of CTC CTC Proved? Check Proof of CC Check Proof of TC Proof Checker CODE
Code Producer Code Consumer Axioms(FDA) Proofs Different TC Different CC Both Proofs Check? Assert FDA Check Proof of CTC CTC Proved? Check Proof of CC Check Proof of TC Proof Checker CCTC Proofs Hacker Axioms (FDA) Proofs CPU CODE
Code Producer Code Consumer Axioms(FDA) Proofs TC CC Both Proofs Check? Assert FDA Check Proof of CTC CTC Proved? Check Proof of CC Check Proof of TC Proof Checker CCTC Proofs Hacker Proofs Axioms(FDA) CPU CODE
Code Producer Code Consumer CPU Axioms(FDA) Proofs Different TC Different CC Both Proofs Check? Assert FDA Check Proof of CTC CTC Proved? Check Proof of CC Check Proof of TC Proof Checker CODE CCTC Proofs Hacker Axioms (FDA) Proofs
Issues Encoding axioms and proofs Proof Checking Implementation of CCGEN TCGEN CODEGEN
ATHENA Implemented by K.Arkoudas A language for both: Ordinary Computation Logical Deduction
ATHENA Ordinary Computation Language Provides higher-order functions Has primitive functions for Unification Matching Substitution
ATHENA Logical Language Special Deductive Forms dcheck, dseq, assume, … Primitive Deduction Methods mp, both, left-and, … Declarations structure, declare, … Directives load-file, clear-assumption-base, … Calls to external automatic resolution theorem provers like SPASS and Vampire
ATHENA Advantages Better Proof Readability Machine checkable proofs Makes it possible to formulate and write proofs as methods Good for writing generic proofs write the proof once and instantiate it to prove specific cases But:
ATHENA No built-in rewriting methods We added the following methods to be able to use equational rewriting: (setup c t) : initializes c with t (reduce c u E) : attempts to transform the term t in c to be identical with the given term u by using theorem E as a left-to-right rewriting rule (expand c u E) : attempts to transform the term t in c to be identical with the given term u by using theorem E as a right-to-left rewriting rule (combine left right) : deduces (= t u) if left contains (= t t’), right contains (= u u’), and if t’ and u’ are identical terms.
CCT - Tools Small trusted computing base TCGEN + CCGEN + CODEGEN ≈1000 lines Tested with hundreds of axioms/theorems and more than lines of proofs
Termination of a function Termination is undecidable But it can be solved in special cases Does a measure of arguments decrease in the ordering with each recursive call of the function? This requires an ordering relation to be defined every time
TCGEN Termination of a function Our approach is similar but does not use an ordering relation We construct the proof of termination as a proof by induction that mirrors the recursion structure in the axioms We generate a termination axiom for each axiom Construct a termination condition Prove the termination condition using the termination axioms
Function-defining Axioms: (forall ?x (= (power ?x zero) one)) (forall ?x ?n (= (power ?x (succ ?n)) (Times ?x (power ?x ?n)))) Function-defining Axioms: (forall ?x (= (power ?x zero) one)) (forall ?x ?n (= (power ?x (succ ?n)) (Times ?x (power ?x ?n)))) Termination Axioms: (forall ?x (power_t ?x zero)) (forall ?x ?n (if (and (power_t ?x ?n) (Times_t ?x (power ?x ?n))) (power_t ?x (succ ?n)))) Termination Axioms: (forall ?x (power_t ?x zero)) (forall ?x ?n (if (and (power_t ?x ?n) (Times_t ?x (power ?x ?n))) (power_t ?x (succ ?n)))) Steps: Rename power to power_t Check the right hand sides. If the rhs is a constant, eliminate it if there are nested function applications in rhs, conjunct them Construct an implication from new lhs and rhs ``if rhs lhs’’ Eliminate the applications of known total functions Assert these and prove the termination condition one is a constant Termination Axioms: (forall ?x (power_t ?x zero)) (forall ?x ?n (if (power_t ?x ?n) (power_t ?x (succ ?n)))) Termination Axioms: (forall ?x (power_t ?x zero)) (forall ?x ?n (if (power_t ?x ?n) (power_t ?x (succ ?n)))) Termination Condition: (forall ?x ?n (power_t ?x ?n)) Termination Condition: (forall ?x ?n (power_t ?x ?n)) Times_t is total
CCGEN Consistency of axioms Input is function-defining axioms Output is a predicate (the consistency condition) It states that it is possible to define a function that satisfies the axioms: For every tuple of values of the function domain, there exists a range value y
Function-defining Axioms: (forall ?y (= (f ?y zero) one))) (forall ?x ?y (if (not (= ?y zero)) (= (f ?x ?y) two))) Function-defining Axioms: (forall ?y (= (f ?y zero) one))) (forall ?x ?y (if (not (= ?y zero)) (= (f ?x ?y) two))) (forall ?x ?w (if (= ?w zero) (= (f ?x ?w) one))) (forall ?x ?w (if (not (= ?w zero)) (= (f ?x ?w) two))) (forall ?x ?w (if (= ?w zero) (= (f ?x ?w) one))) (forall ?x ?w (if (not (= ?w zero)) (= (f ?x ?w) two))) Consistency Condition is: (forall ?x ?w (exists ?y (and (if (= ?w zero) (= ?y one)) (if (not (= ?w zero)) (= ?y two))))) Consistency Condition is: (forall ?x ?w (exists ?y (and (if (= ?w zero) (= ?y one)) (if (not (= ?w zero)) (= ?y two))))) Steps: Rename ?y to ?w Add or update conditions Replace (f ?x ?w) with ?y, conjunct the propositions, and add ``exists ?y’’
Proving Correctness
Application-specific Requirements (from the consumer) (define (= Nil) zero)) (define (forall ?L ?x (= (Cons ?x ?L)) (Plus ?x ?L))))) Application-specific Requirements (from the consumer) (define (= Nil) zero)) (define (forall ?L ?x (= (Cons ?x ?L)) (Plus ?x ?L))))) Correctness Condition: (define sum-list-correctness (forall ?L (= (sum-list ?L) ?L)))) Correctness Condition: (define sum-list-correctness (forall ?L (= (sum-list ?L) ?L)))) Function-defining Axioms (Producer) (define sum-list-empty (= (sum-list Nil) zero)) (define sum-list-nonempty (forall ?L ?x (= (sum-list (Cons ?x ?L)) (sum-list-compute ?L ?x)))) (define sum-list-compute-empty (forall ?x (= (sum-list-compute Nil ?x) ?x))) (define sum-list-compute-nonempty (forall ?L ?x ?y (= (sum-list-compute (Cons ?y ?L) ?x) (sum-list-compute ?L (Plus ?x ?y))))) Function-defining Axioms (Producer) (define sum-list-empty (= (sum-list Nil) zero)) (define sum-list-nonempty (forall ?L ?x (= (sum-list (Cons ?x ?L)) (sum-list-compute ?L ?x)))) (define sum-list-compute-empty (forall ?x (= (sum-list-compute Nil ?x) ?x))) (define sum-list-compute-nonempty (forall ?L ?x ?y (= (sum-list-compute (Cons ?y ?L) ?x) (sum-list-compute ?L (Plus ?x ?y))))) Correctness Proof (Producer) (by-induction sum-list-correctness (Nil (dseq (!setup left (sum-list Nil)) (!setup right Nil)) (!reduce left zero sum-list-empty) (!reduce right zero (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (Cons x L))) (!reduce left (sum-list-compute L x) sum-list-nonempty) (!reduce right (sum-list-compute L x) sum-list-compute-relation) (!combine left right)))))) Correctness Proof (Producer) (by-induction sum-list-correctness (Nil (dseq (!setup left (sum-list Nil)) (!setup right Nil)) (!reduce left zero sum-list-empty) (!reduce right zero (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (Cons x L))) (!reduce left (sum-list-compute L x) sum-list-nonempty) (!reduce right (sum-list-compute L x) sum-list-compute-relation) (!combine left right)))))) Note: Executable but inefficient code can be extracted from these axioms Define an efficient function
Application-specific Requirements (from the consumer) (define reverse-range-Correctness (forall ?i ?j (if (valid (range ?i ?j)) (forall ?M (= (access-range (reverse-range M (range i j)) (range i j)) (reverse (access-range M (range i j)))))))) Application-specific Requirements (from the consumer) (define reverse-range-Correctness (forall ?i ?j (if (valid (range ?i ?j)) (forall ?M (= (access-range (reverse-range M (range i j)) (range i j)) (reverse (access-range M (range i j)))))))) (define reverse-empty-range-axiom (forall ?i ?M (= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M))) (define reverse-nonempty-range-axiom1 (forall ?i ?j ?M (if (and (not (= ?i ?j)) (= (++ ?i) ?j)) (= (reverse-range ?M (range ?i ?j)) ?M)))) (define reverse-nonempty-range-axiom2 (forall ?i ?j ?M (if (and (valid (range ?i ?j)) (and (not (= ?i ?j)) (not (= (++ ?i) ?j)))) (= (reverse-range ?M (range ?i ?j)) (reverse-range (swap ?M (* ?i) (* (-- ?j))) (range (++ ?i) (-- ?j))))))) (define reverse-empty-range-axiom (forall ?i ?M (= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M))) (define reverse-nonempty-range-axiom1 (forall ?i ?j ?M (if (and (not (= ?i ?j)) (= (++ ?i) ?j)) (= (reverse-range ?M (range ?i ?j)) ?M)))) (define reverse-nonempty-range-axiom2 (forall ?i ?j ?M (if (and (valid (range ?i ?j)) (and (not (= ?i ?j)) (not (= (++ ?i) ?j)))) (= (reverse-range ?M (range ?i ?j)) (reverse-range (swap ?M (* ?i) (* (-- ?j))) (range (++ ?i) (-- ?j))))))) Note: Specification is not executable. The correctness condition itself is a specification. Note: Proof is by range induction Basis cases: Empty range: (range i i) Range of one element: (range i (++ i)) Induction Step: Assume for (range (++ i) (-- j)) Show that it is true for (range i j)
CODEGEN Code Extraction Quantified Equations and Conditional Equations These are clauses of a recursive function definition CODEGEN has to be able to combine these into a recursive function Target language is currently Oz Oz has pattern matching Possible to extract efficient code: Oz has ``last call optimization’’. Executes tail-recursive functions in constant stack size
CODEGEN Code Extraction Can extract both: Memory-observing (examines data structures but doesn’t make any changes) access, access-range, sum-list, find, find-if, power Memory-updating functions (makes in-place changes) assign, assign-range, swap, reverse-range, rotate, copy Does optimizations when necessary
Function-defining Axioms (Producer) (define sum-list-empty (= (sum-list Nil) zero)) (define sum-list-nonempty (forall ?L ?x (= (sum-list (Cons ?x ?L)) (sum-list-compute ?L ?x)))) (define sum-list-compute-empty (forall ?x (= (sum-list-compute Nil ?x) ?x))) (define sum-list-compute-nonempty (forall ?L ?x ?y (= (sum-list-compute (Cons ?y ?L) ?x) (sum-list-compute ?L (Plus ?x ?y))))) Function-defining Axioms (Producer) (define sum-list-empty (= (sum-list Nil) zero)) (define sum-list-nonempty (forall ?L ?x (= (sum-list (Cons ?x ?L)) (sum-list-compute ?L ?x)))) (define sum-list-compute-empty (forall ?x (= (sum-list-compute Nil ?x) ?x))) (define sum-list-compute-nonempty (forall ?L ?x ?y (= (sum-list-compute (Cons ?y ?L) ?x) (sum-list-compute ?L (Plus ?x ?y))))) Code Extraction (Consumer) fun {SumList L} case L of nil then 0 [] X|L then {SumListCompute L X} end End fun {SumListCompute L X} case L of nil then X [] Y|L then {SumListCompute L (X + Y)} end Code Extraction (Consumer) fun {SumList L} case L of nil then 0 [] X|L then {SumListCompute L X} end End fun {SumListCompute L X} case L of nil then X [] Y|L then {SumListCompute L (X + Y)} end Note: There are two variables but: ``case [L X]’’ has been optimized to ``case L’’ by CODEGEN
(define reverse-empty-range-axiom (forall ?i ?M (= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M))) (define reverse-nonempty-range-axiom1 (forall ?i ?j ?M (if (and (not (= ?i ?j)) (= (++ ?i) ?j)) (= (reverse-range ?M (range ?i ?j)) ?M)))) (define reverse-nonempty-range-axiom2 (forall ?i ?j ?M (if (and (valid (range ?i ?j)) (and (not (= ?i ?j)) (not (= (++ ?i) ?j)))) (= (reverse-range ?M (range ?i ?j)) (reverse-range (swap ?M (* ?i) (* (-- ?j))) (range (++ ?i) (-- ?j))))))) (define reverse-empty-range-axiom (forall ?i ?M (= (reverse-range ?M (range ?i ?i)) Function-defining Axioms (Producer) ?M))) (define reverse-nonempty-range-axiom1 (forall ?i ?j ?M (if (and (not (= ?i ?j)) (= (++ ?i) ?j)) (= (reverse-range ?M (range ?i ?j)) ?M)))) (define reverse-nonempty-range-axiom2 (forall ?i ?j ?M (if (and (valid (range ?i ?j)) (and (not (= ?i ?j)) (not (= (++ ?i) ?j)))) (= (reverse-range ?M (range ?i ?j)) (reverse-range (swap ?M (* ?i) (* (-- ?j))) (range (++ ?i) (-- ?j))))))) Note: CODEGEN optimizes it fun {ReverseRange M R } Code needs to be optimized case R of range(I I ) then M [] range(I J ) then if {And {Not (I == J )} {Not ({`++` I } == J )} } then {ReverseRange {Swap M {`*` I } {`*` {`--` J } } } range({`++` I } {`--` J } ) } elseif {And {Not (I == J )} ({`++` I } == J )} then M end fun {ReverseRange M R } Code needs to be optimized case R of range(I I ) then M [] range(I J ) then if {And {Not (I == J )} {Not ({`++` I } == J )} } then {ReverseRange {Swap M {`*` I } {`*` {`--` J } } } range({`++` I } {`--` J } ) } elseif {And {Not (I == J )} ({`++` I } == J )} then M end
fun {ReverseRange M R } Code needs to be optimized case R of range(I I ) then M [] range(I J ) then if {And {Not (I == J )} {Not ({`++` I } == J )} } then {ReverseRange {Swap M {`*` I } {`*` {`--` J } } } range({`++` I } {`--` J } ) } elseif {And {Not (I == J )} ({`++` I } == J )} then M end fun {ReverseRange M R } Code needs to be optimized case R of range(I I ) then M [] range(I J ) then if {And {Not (I == J )} {Not ({`++` I } == J )} } then {ReverseRange {Swap M {`*` I } {`*` {`--` J } } } range({`++` I } {`--` J } ) } elseif {And {Not (I == J )} ({`++` I } == J )} then M end fun {ReverseRange M R } Optimized Code case R of range(I I ) then M [] range(I J ) then if {Not (I == J )} then if ({`++` I } == J ) then M else {ReverseRange {Swap M {`*` I } {`*` {`--` J } } } range({`++` I } {`--` J } ) } end fun {ReverseRange M R } Optimized Code case R of range(I I ) then M [] range(I J ) then if {Not (I == J )} then if ({`++` I } == J ) then M else {ReverseRange {Swap M {`*` I } {`*` {`--` J } } } range({`++` I } {`--` J } ) } end
We have been working on simple functions. But: In analogy to STL, it is useful to have a library of simple functions from which more complex functions can be composed, especially if the functions are generic It is possible for CODEGEN to extract complex functions composed of such simple functions CODEGEN Code Extraction
Generic Proof Writing Proofs are very large Generic Proofs might be a solution No need to develop and transmit the similar proofs to the consumer It is harder to write generic proofs but, Once the consumer has the generic proofs, he can instantiate them with many different ways Athena is a higher order language: We can express generic functions and proofs
Generic Proof Writing Generic property definitions and proofs are constructed in the form of programs that are parameterized with operator mappings Generic theorem: it is a generic property contains a single property, for which there is an associated generic proof Provide functions which perform operator mappings Instantiate the generic proof with a particular operator mapping later
(let ((Plus (ops 'Plus)) (Zero (ops 'Zero))) (let ((Plus (ops 'Plus)) (Zero (ops 'Zero))) (match name (= Nil) Zero)) (forall ?L ?y (= (Cons ?y ?L)) (Plus ?y ?L)))))))) (match name (= Nil) Zero)) (forall ?L ?y (= (Cons ?y ?L)) (Plus ?y ?L)))))))) (define name ops) Local Declarations Name and parameter list Generic Axiom or Theorems (match name ('sum-list-compute-relation (forall ?L ?x (= (Cons ?x ?L)) (sum-list-compute ?L ?x)))))) (match name ('sum-list-compute-relation (forall ?L ?x (= (Cons ?x ?L)) (sum-list-compute ?L ?x)))))) (define (sum-list-compute-relation name ops) Name and parameter list Axiom or Theorems Generic Property Definitions in CCT
(dlet ((Zero (ops 'Zero)) (left (cell true)) (right (cell true)) (prop (method (name) (!property name ops Sum-list-theory))) (theorem (sum-list-correctness name ops))) (dlet ((Zero (ops 'Zero)) (left (cell true)) (right (cell true)) (prop (method (name) (!property name ops Sum-list-theory))) (theorem (sum-list-correctness name ops))) (by-induction theorem (Nil (dseq (!setup left (sum-list Nil)) (!setup right Nil)) (!reduce left Zero (!prop 'sum-list-empty)) (!reduce right Zero (!prop (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (Cons x L))) (!reduce left (sum-list-compute L x) (!prop 'sum-list-nonempty)) (!reduce right (sum-list-compute L x) (!prop 'sum-list-compute-relation)) (!combine left right)))))) (by-induction theorem (Nil (dseq (!setup left (sum-list Nil)) (!setup right Nil)) (!reduce left Zero (!prop 'sum-list-empty)) (!reduce right Zero (!prop (!combine left right))) ((Cons x L) (dseq (!setup left (sum-list (Cons x L))) (!setup right (Cons x L))) (!reduce left (sum-list-compute L x) (!prop 'sum-list-nonempty)) (!reduce right (sum-list-compute L x) (!prop 'sum-list-compute-relation)) (!combine left right)))))) (define (sum-list-correctness-proof name ops) Local Declarations Name and parameter list Generic Proof A Generic Proof method in CCT
(let ((Plus (ops 'Plus)) (Zero (ops 'Zero))) (let ((Plus (ops 'Plus)) (Zero (ops 'Zero))) (match name (= Nil) Zero)) (forall ?L ?y (= (Cons ?y ?L)) (Plus ?y ?L)))))))) (match name (= Nil) Zero)) (forall ?L ?y (= (Cons ?y ?L)) (Plus ?y ?L)))))))) (define name ops) Instantiation of a Generic Axiom Operator Mappings: (define (Monoid-ops op) (match op ('Plus Plus) ('Zero zero))) (define (Times-ops op) (match op ('Plus Times) ('Zero one))) (define (Monoid-ops op) (match op ('Plus Append) ('Zero Nil))) Instantiated Axioms: (= Nil) zero) (forall ?L ?y (= (Cons ?y ?L)) (Plus ?y ?L)))) Instantiated Axioms: (= Nil) zero) (forall ?L ?y (= (Cons ?y ?L)) (Plus ?y ?L)))) Instantiated Axioms: (= Nil) one) (forall ?L ?y (= (Cons ?y ?L)) (Times ?y ?L)))) Instantiated Axioms: (= Nil) one) (forall ?L ?y (= (Cons ?y ?L)) (Times ?y ?L)))) Instantiated Axioms: (= Nil) Nil) (forall ?L ?y (= (Cons ?y ?L)) (Append ?y ?L)))) Instantiated Axioms: (= Nil) Nil) (forall ?L ?y (= (Cons ?y ?L)) (Append ?y ?L))))
Conclusions CCT provides strong assurance for correctness Only very small examples so far, but a basis for tackling larger examples Readable proofs Generic proof writing Tools for organizing theorems and proofs
Future Work Test CCT with more examples, including the ones that are larger and more complex Complete the extension of CODEGEN to check preconditions where necessary Use CCT to prove safety properties A Really Longer Term Goal: Verifying Compiler – Tony Hoare’s grand challenge problem
Organizing Theorems and Proofs We have a few hundred axioms, theorems, and proofs Prove some lemmas and use them in the proofs of other theorems Main idea: Group the related properties under the same theories Searches for a stored theorem are faster
Organizing Theorems and Proofs We define a structured theory as an abstract data type with the following functions theory: creates a structured theory from a generic property function containing axioms evolve: extends an existing structured theory with a new generic theorem and its proof; refine: creates a new structured theory as a composition of one or more existing structured theories and a generic property function. property: retrieves an instance of a generic property function, and its corresponding proof
Iterator Theory ++, - -, *, I-, I+, I-I Range Theory valid, range Memory Theory Access, Assign, Swap Memory Range Theory Access-range, Assign-range Naturals zero, succ Lists Nil, Cons ++ preincrement -- predecrement I- iterator subtraction I+ iterator addition I-I iterator difference