A Trustworthy Proof Checker

Slides:



Advertisements
Similar presentations
Operating Systems Components of OS
Advertisements

GCSE Computing Lesson 5.
Comparing Semantic and Syntactic Methods in Mechanized Proof Frameworks C.J. Bell, Robert Dockins, Aquinas Hobor, Andrew W. Appel, David Walker 1.
Certified Typechecking in Foundational Certified Code Systems Susmit Sarkar Carnegie Mellon University.
Foundational Certified Code in a Metalogical Framework Karl Crary and Susmit Sarkar Carnegie Mellon University.
March 4, 2005Susmit Sarkar 1 A Cost-Effective Foundational Certified Code System Susmit Sarkar Thesis Proposal.
Compilers and Language Translation
Chapter 5: Elementary Data Types Properties of types and objects –Data objects, variables and constants –Data types –Declarations –Type checking –Assignment.
ISBN Chapter 3 Describing Syntax and Semantics.
An Introduction to Proof-Carrying Code David Walker Princeton University (slides kindly donated by George Necula; modified by David Walker)
The Design and Implementation of a Certifying Compiler [Necula, Lee] A Certifying Compiler for Java [Necula, Lee et al] David W. Hill CSCI
Code-Carrying Proofs Aytekin Vargun Rensselaer Polytechnic Institute.
CLF: A Concurrent Logical Framework David Walker Princeton (with I. Cervesato, F. Pfenning, K. Watkins)
Proof-system search ( ` ) Interpretation search ( ² ) Main search strategy DPLL Backtracking Incremental SAT Natural deduction Sequents Resolution Main.
1 A Dependently Typed Assembly Language Hongwei Xi University of Cincinnati and Robert Harper Carnegie Mellon University.
Programmability with Proof-Carrying Code George C. Necula University of California Berkeley Peter Lee Carnegie Mellon University.
Language-Based Security Proof-Carrying Code Greg Morrisett Cornell University Thanks to G.Necula & P.Lee.
JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.
A Type System for Expressive Security Policies David Walker Cornell University.
Describing Syntax and Semantics
Extensible Untrusted Code Verification Robert Schneck with George Necula and Bor-Yuh Evan Chang May 14, 2003 OSQ Retreat.
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Proof Carrying Code Zhiwei Lin. Outline Proof-Carrying Code The Design and Implementation of a Certifying Compiler A Proof – Carrying Code Architecture.
1 MVD 2010 University of Iowa New York University Comparing Proof Systems for Linear Real Arithmetic Using LFSC Andrew Reynolds September 17, 2010.
Formal Verification Lecture 9. Formal Verification Formal verification relies on Descriptions of the properties or requirements Descriptions of systems.
Towards Automatic Verification of Safety Architectures Carsten Schürmann Carnegie Mellon University April 2000.
© Andrew IrelandDependable Systems Group On the Scalability of Proof Carrying Code for Software Certification Andrew Ireland School of Mathematical & Computer.
Part 1.  Intel x86/Pentium family  32-bit CISC processor  SUN SPARC and UltraSPARC  32- and 64-bit RISC processors  Java  C  C++  Java  Why Java?
Mobility, Security, and Proof-Carrying Code Peter Lee Carnegie Mellon University Lecture 2 July 11, 2001 Overview of PCC and Safety Policies Lipari School.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Semantics In Text: Chapter 3.
Syntax and Semantics CIS 331 Syntax: the form or structure of the expressions, statements, and program units. Semantics: the meaning of the expressions,
Secure Compiler Seminar 4/11 Visions toward a Secure Compiler Toshihiro YOSHINO (D1, Yonezawa Lab.)
Operand Addressing And Instruction Representation Cs355-Chapter 6.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
SAFE KERNEL EXTENSIONS WITHOUT RUN-TIME CHECKING George C. Necula Peter Lee Carnegie Mellon U.
Programming Language Concepts (CIS 635) Elsa L Gunter 4303 GITC NJIT,
Overview of Compilation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Principles Lecture 2.
Review A program is… a set of instructions that tell a computer what to do. Programs can also be called… software. Hardware refers to… the physical components.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
INTERMEDIATE LANGUAGES SUNG-DONG KIM DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Compiler Chapter 9. Intermediate Languages Sung-Dong Kim Dept. of Computer Engineering, Hansung University.
Advanced Computer Systems
Programming Languages and Compilers (CS 421)
Visit for more Learning Resources
Assembly language.
A Closer Look at Instruction Set Architectures
Lecture 1: Introduction to JAVA
Compiler Chapter 9. Intermediate Languages
课程名 编译原理 Compiling Techniques
Computer Programming Machine and Assembly.
Logical architecture refinement
Intermediate Representations
CSc 453 Interpreters & Interpretation
Lesson Objectives Aims Key Words Compiler, interpreter, assembler
An overview of Coq Xinyu Feng USTC.
Intermediate Representations
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Operating Systems Lecture 3.
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Introduction to Microprocessor Programming
Intermediate Code Generation
Chapter 10: Compilers and Language Translation
Compiler Construction
A Level Computer Science Topic 5: Computer Architecture and Assembly
CSc 453 Interpreters & Interpretation
An overview of Coq.
Presentation transcript:

A Trustworthy Proof Checker . 2/5/2019 A Trustworthy Proof Checker Andrew W. Appel Aaron Stump Neophytos G. Michael Stanford University Roberto Virga Princeton University FCS & VERIFY, July 2002 A trustworthy proof checker for proofs of properties of machine-code programs. 2/5/2019

Trusted Computing Base . 2/5/2019 Trusted Computing Base Theorem: Operating System: an + bn  cn gcc emacs Proof netscape rogomatic make Axioms Kernel Trusted Base 2/5/2019

The problem: Mobile Code Security . 2/5/2019 The problem: Mobile Code Security Code Producer Code Consumer Code Source Program Compiler Execute load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 ? Private files Network access Launch control etc. 2/5/2019

Existing Practice: Hardware VM protection . 2/5/2019 Existing Practice: Hardware VM protection Code Producer Code Consumer Machine Code Machine Code Source Program Compiler Execute load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Operating System virtual memory Protected resources Disadvantages: Large trusted code base of O.S. Clumsy, slow interfaces between trusted & untrusted code 2/5/2019

Existing Practice: Bytecode Verification . 2/5/2019 Existing Practice: Bytecode Verification Code Producer Code Consumer ByteCode Java Program Bytecode Verifier Compiler load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Trusted Computing Base Advantage: Clean, fast, O-O interface between trusted & untrusted code Disadvantage: Huge trusted computing base: JIT OK Just-in-time Compiler Native code Execute 2/5/2019

Foundational Proof-Carrying Code . 2/5/2019 Foundational Proof-Carrying Code Code Producer Code Consumer Native Code Source Program Compiler Execute load r3, 4(r2) add r2,r4,r1 store 1, 0(r7) store r1, 4(r7) add r7,0,r3 add r7,8,r7 beq r3, .-20 Hints Trusted Computing Base Machine Spec + Policy Machine Spec + Policy Safety Proof $-i( -i(... -r ( ...) ) Prover Checker OK 2/5/2019

Trusted Computing Base . 2/5/2019 Trusted Computing Base The minimal set of code that must be trusted Our goal: make TCB as small as possible TCB consists of two pieces: The safety policy (a predicate in Higher-Order Logic that characterizes whether a program is safe to execute) The proof-checker (a small C program that checks safety proofs) 2/5/2019

Trusted Computing Base (cont.) 2/5/2019 Trusted Computing Base (cont.) Safety Policy Choose a logical framework (programming language for logic) Choose an object logic (axioms, inference rules) Represent our theorem in the object logic Proof Checking Build a proof-checker for the logical framework Safety Policy We choose LF We choose Higher-Order Logic We will explain... Proof Checking We use Twelf to prove theorems, but for checking we want something smaller and simpler . . . 2/5/2019

LF, Twelf, and Higher Order Logic . 2/5/2019 Harper et al. 1993 LF, Twelf, and Higher Order Logic What is LF? A Logical Framework for defining and presenting logics Based on a general treatment of syntax, rules, and proofs by means of a typed first-order -calculus Its type system has three levels of terms: Objects Types -- that classify objects Kinds -- that classify families of types. Equality is taken as -conversion The judgments-as-types principle We use the Twelf implementation of LF (Pfenning et al. 99) We implement a standard HOL with arithmetic 2/5/2019

Programming in Twelf Define formula constructors (an LF signature): . 2/5/2019 Programming in Twelf Define formula constructors (an LF signature): num : type. form : type. imp : form -> form -> form. . . . Define proof constructors (axioms): pf : form -> type. imp_i : (pf A -> pf B) -> pf (A imp B). imp_e : pf (A imp B) -> pf A -> pf B. 2/5/2019

Theorems, proof checking in HOL . 2/5/2019 Theorems, proof checking in HOL Proof of logical transitivity: imp_trans: pf (A imp B) -> pf (B imp C) -> pf (A imp C) = [p1 : pf (A imp B)] [p2 : pf (B imp C)] imp_i [p3 : pf A] imp_e p2 (imp_e p1 p3). This shows the general form of a Twelf definition: name :  = exp. 2/5/2019

The safety policy “This program accesses memory only in range 0-1000” . 2/5/2019 The safety policy “This program accesses memory only in range 0-1000” “This program never executes an illegal instruction.” Step I: define access predicates readable(x) = 0  x  1000 writable(x) = 0  x  1000 Step II: define legal instructions . . . 2/5/2019

Machine states, step relation . 2/5/2019 Machine states, step relation Machine State = Register bank + memory (r,m)  (r’,m’ ) : the step relation is a map between machine states 1 2 3 psr pc r m 1 2 3 psr pc r’ m’ 7 8  2/5/2019

Machine instruction = step relation . 2/5/2019 Machine instruction = step relation add r1:=r2+r3  m’=m, r’(1)=r(2)+r(3), r’(pc)=1+r(pc), i i  1  i  pc  r’(i)=r(i) 1 2 3 psr pc r m 1 2 3 psr pc r’ m’ 7 2 6 8 2 6  2/5/2019

Instruction decoding; memory policy . 2/5/2019 Instruction decoding; memory policy (r,m)  (r’,m’ )   w,i,j,k m (r (pc)) = w  w = 3212 + i28 + j24 + k  m’ = m  readable (r ( j) + k )  r’ (i) = m (r ( j)+ k)  r’ (pc) = 1+ r’ (pc)  x xi  xpc  r’ (x)=r (x) load ri := m(rj+k)  ( . . . )  ( . . . )  . . . op d s1 s2 w = 3 i j k 1 2 3 psr pc r m 7 w 2/5/2019

Making the specification concise & trustworthy Described in [Michael & Appel 2000] Separate syntax from semantics Factor the semantics Use “New Jersey Machine-Code Toolkit” to describe syntax Automatically translate NJMCT descriptions into concise and readable higher-order logic 2/5/2019

Specifying safe execution . 2/5/2019 Specifying safe execution  relation includes only the legal instructions Safety means, “no matter how many instructions you execute, the next instruction is legal” The program is meant to be loaded at some start address loaded(m,start,prog) = i dom(prog). m(start+i) = prog(i) Example: loaded(m,100, (9017;4214;8099;4010;6231;1008)) 9017 4214 8099 4010 6231 1008 100: 2/5/2019

Safety theorem safe(prog) = r,m,start. 2/5/2019 Safety theorem safe(prog) = r,m,start. loaded(m,start,prog)  r(pc)=start  r’,m’. r,m  r’,m’   r’’,m’’. r’,m’  r’’,m’’ Trusted Computing Base r m start: 9017 4214 8099 4010 6231 1008 ? Theorem to be proved: safe(9017;4214;8099;4010;6231;1008) pc: start 2/5/2019

Size of Safety Specification (Sparc) . 2/5/2019 Size of Safety Specification (Sparc) 2/5/2019

Representation Issues in the Specification . 2/5/2019 Representation Issues in the Specification Eliminating Redundancy in LF terms Dealing with Arithmetic Representation of Axioms and Trusted Definitions: Encoding Higher-Order Logic in LF Polymorphic programming in Twelf Explicit versus implicit programming in Twelf - Avoiding term reconstruction 2/5/2019

Eliminating Redundancy . 2/5/2019 Eliminating Redundancy LF signatures contain lots of redundant information imp_i : {A: form}{B: form} (pf A -> pf B) -> pf (A imp B). Twelf’s answer: parameters can be “declared” implicit imp_i : (pf A -> pf B) -> pf (A imp B). Implicit parameters in the TCB means type reconstruction in the checker Algorithm is large and complex It relies on higher-order unification which is undecidable (some valid proofs may fail) 2/5/2019

Eliminating Redundancy (cont.) 2/5/2019 Eliminating Redundancy (cont.) On the TCB side: We write axioms & trusted definitions in fully explicit style On the proving side: Implicit versus explicit LF term sizes Other approaches to this problem: Necula’s LFi, Oracle based checking We represent proofs as DAGs with structure sharing of common sub-expressions Proof-size blowup is avoided The checker does not need to parse proofs But constant factor is not so good, though A tradeoff: TCB size versus Proof Size 2/5/2019

Term Reconstruction in the Prover Twelf’s term reconstruction algorithm (a.k.a. “type inference”) is extremely useful in writing proofs Outside TCB, write “compatibility lemmas” to interface with proofs that are written in implicit style. 2/5/2019

The Proof Checker A small C program (~ 803 lines, 1/3 of the TCB) . 2/5/2019 The Proof Checker A small C program (~ 803 lines, 1/3 of the TCB) Type checks explicit LF proofs and loads and executes only safe programs Makes no use of libraries except: read, and _exit 2/5/2019

Why do we need a parser? Not for proofs -- they are transmitted to checker in DAG form For axioms! Humans can’t read axioms and trusted definitions in DAG form, therefore can’t trust them. (see Pollack ‘98, “How to believe a machine-checked proof”) 2/5/2019

DAG representation of proofs & types Each DAG node is 5 words Entire DAG is transmitted as a single block op arg1 arg2 type match opcode left child right child computed type weak head normal form op arg1 arg2 type match op arg1 arg2 type match 2/5/2019

Proof-checking measurements In the paper, we report a time of 74 seconds to check a benchmark proof (~ 6,000 lines) We have improved this to 0.48 seconds Checker marks closed terms Avoid traversing closed terms during substitutions Adds 20 lines to the Proof Checker op cl arg1 arg2 type match 2/5/2019

Smallest possible TCB Open-source JVM, Highly optimizing . 2/5/2019 Smallest possible TCB Open-source JVM, non-optimizing JIT Highly optimizing Java Compiler optimizing compiler PCC system, Foundational PCC Our System: 2/5/2019

Future Work Machine Descriptions for other CPUs (Mips, Sparc so far) . 2/5/2019 Future Work Machine Descriptions for other CPUs (Mips, Sparc so far) TCB is really small but proof sizes are large. Work on finding the right tradeoff between TCB size and proof size Compress DAG in some way Use another compressed form of the LF syntactic notation Add a simple Prolog interpreter to the TCB that “rediscovers” the proof based on the sequence of TAL instructions given to the checker TCB no longer minimal but proof sizes greatly reduced 2/5/2019