State your reasons or how to keep proofs while optimizing code Ando Saabas, Institute of Cybernetics EXCS kick-off meeting, 18. September 2008
Outline Background – extensible systems Proof carrying code – general overview Proof compilation
Background Mobile code and extensible systems popular (and increasingly more so) Code Host Device driver Applet Loaded procedures Operating system Web browser Database server
What should the extensions guarantee? Security is a concern Safety properties: Accesses only its own memory Doesn’t leak sensitive data Uses resources properly Doesn’t eat up all the memory Holds a limited number of locks Functional properties Provides the functionality it promises
Runtime monitoring Monitor A monitor detects attempts to violate the safety policy and stops the execution Relatively simple; effective for many properties Inflexible (no guarantees on functional properties) Computationally expensive
Digital signatures “Company X produced this software” PKI “Company X produced this software” Simple, well established techniques No direct connection with program semantics
Proof Carrying Code Proof-Carrying Code is based on the idea that the code producer should provide some evidence that the program she distributes is safe and/or functionally correct. The program is shipped with a certificate that attests that it has the desired properties. Before running a program, the code user checks this certificate and only runs the code if it is safe
Proof carrying code Proof Proof Checker
PCC: An analogy The user doesn’t try to solve a problem, only check a solution
PCC vs Digital Signatures A digital signature identifies the origin of the program A PCC certificate identifies the meaning of the program A digital signature is a syntactic checksum A PCC certificate is a semantic checksum
Where would proofs come from? For basic safety properties, they can be inferred automatically For more complex safety and/or functional correctness properties, the code producer would use some verification environment to prove the source program correct But programs are distributed in compiled form
PCC framework ? Code producer Code user Binary Proof Proof Checker Specification Source program Program verification environment Binary Compiler + Proof compiler ? Proof Proof Checker Program proof
Proof compilation For non-optimizing compilers it is easy: proof compilation is (almost) identity Not so if optimizations take place
Dead code elimination f 9 p : s = c ¤ n ^ g f s = c ¤ n ^ p g f 9 p : Precondition f 9 p : s = ^ 1 i g f s = ^ p 1 i g while i < n s = s + c; skip; i++; while i < n s = s + c; p = p * c; i++; Invariant f 9 p : s = c ¤ i ^ · n g f s = c ¤ i ^ p · n g Postcondition f 9 p : s = c ¤ n ^ g f s = c ¤ n ^ p g
Constant propagation f s = c ¤ n ^ p g f s = ^ p 1 i g f s = 5 ¤ i ^ p Precondition f s = ^ p 1 i g c = 5; while i < n s = s + c; p = p * c; i++; c = 5; while i < n s = s + 5; p = p * 5; i++; Invariant f s = 5 ¤ i ^ p · n g f s = c ¤ i ^ p · n 5 g f s = c ¤ i ^ p < n g Postcondition f s = c ¤ n ^ p g
Proof compilation For non-optimizing compilers it is easy: proof compilation is (almost) identity Not so if optimizations take place Many different optimizations, each have their own particular effect on the proof Need a systematic approach for dealing with this
State your reasons There is always a reason why certain parts of code can be modified during optimization These reasons should be stated – recorded – in the assertions. But how do we know exactly where and what is to be recorded? c = 5; while i < n s = s + 5; p = p * 5; i++; c = 5; while i < n s = s + c; p = p * c; i++; We know c is always 5 in the loop Invariant f s = c ¤ i ^ p · n g f s = c ¤ i ^ p · n 5 g
Enter type systems Optimizations are mostly based on dataflow analyses Dataflow analyses can be described as type systems Type systems can have an optimization component Type annotations can show us what needs to be stated where when transforming proofs
PCC framework ? Code producer Code user Proof Proof Checker Source Specification Source program Program verification environment Compiler + Proof compiler ? Proof Proof Checker Program proof
PCC framework Proof Source program Program optimizer Type Analyzer derivation Program optimizer Proof optimizer Proof Program proof
Type system for dead code elimination
How applicable is the approach? The approach works on all classical program optimizations Scales to complicated, code structure changing optimizations such as partial redundancy elimination Can be used for optimization which require bidirectional analyses – many non-trivial bytecode transformations Can be applied to both high level and CFG based program and analysis descriptions Implementation for Java bytecode analyses: dead store elimination load pop pair elimination store load pair elimination etc
Conclusions It is important to get the basic notions and tools right – type systems are exactly the right tool when trying to describe what optimizations are doing They lend their hand for formal reasoning about optimizations – proving soundess, certain optimality results etc Soundness of the optimization makes it possible to transform a program’s proof along the program guided by its analysis type derivation – record and exploit what you know, and it will come out right
Acknowledgements I would like to thank Estonian Doctoral School in ICT, EITSA Tiger University Plus programme and the Estonian Association of Information Technology and Telecommunications (ITL) for their financial support.