Bernd Fischer RW714: SAT/SMT-Based Bounded Model Checking of Software.

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Jeremy Morse, Mikhail Ramalho, Lucas Cordeiro, Denis Nicole, Bernd Fischer ESBMC 1.22 (Competition Contribution)
Finding bugs: Analysis Techniques & Tools Symbolic Execution & Constraint Solving CS161 Computer Security Cho, Chia Yuan.
Delta Debugging and Model Checkers for fault localization
Intermediate Code Generation
Satisfiability Modulo Theories (An introduction)
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Adapted from Scott, Chapter 6:: Control Flow Programming Language Pragmatics Michael L. Scott.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Overview of Previous Lesson(s) Over View  Front end analyzes a source program and creates an intermediate representation from which the back end generates.
© 2012 Carnegie Mellon University Introduction to CBMC Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel November.
INF 212 ANALYSIS OF PROG. LANGS Type Systems Instructors: Crista Lopes Copyright © Instructors.
Optimization Compiler Baojian Hua
Dr. Yang, Qingxiong (with slides borrowed from Dr. Yuen, Joe) LT4: Control Flow - Loop CS2311 Computer Programming.
Tuning SAT-checkers for Bounded Model-Checking A bounded guided tour Ofer Strichman Carnegie Mellon University.
Compiler Construction
Chapter 9 Imperative and object-oriented languages 1.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
1 Predicate Abstraction of ANSI-C Programs using SAT Edmund Clarke Daniel Kroening Natalia Sharygina Karen Yorav (modified by Zaher Andraus for presentation.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Counterexample Generation for Separation-Logic-Based Proofs Arlen Cox Samin Ishtiaq Josh Berdine Christoph Wintersteiger.
Search in the semantic domain. Some definitions atomic formula: smallest formula possible (no sub- formulas) literal: atomic formula or negation of an.
Algorithm Analysis CS 201 Fundamental Structures of Computer Science.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
CS 267: Automated Verification Lecture 13: Bounded Model Checking Instructor: Tevfik Bultan.
Formal Verification of SpecC Programs using Predicate Abstraction Himanshu Jain Daniel Kroening Edmund Clarke Carnegie Mellon University.
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Precision Going back to constant prop, in what cases would we lose precision?
Pointers (Continuation) 1. Data Pointer A pointer is a programming language data type whose value refers directly to ("points to") another value stored.
Language Evaluation Criteria
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Imperative Programming
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
SAT and SMT solvers Ayrat Khalimov (based on Georg Hofferek‘s slides) AKDV 2014.
Programming Language C++ Xulong Peng CSC415 Programming Languages.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
Pointers OVERVIEW.
Predicate Abstraction of ANSI-C Programs Using SAT By Edmund Clarke, Daniel Kroening, Natalia Sharygina, Karen Yorav Presented by Yunho Kim Provable Software.
Parallelizing MiniSat I-Ting Angelina Lee Justin Zhang May 05, Final Project Presentation.
1 Records Record aggregate of data elements –Possibly heterogeneous –Elements/slots are identified by names –Elements in same fixed order in all records.
ECE 103 Engineering Programming Chapter 47 Dynamic Memory Alocation Herbert G. Mayer, PSU CS Status 6/4/2014 Initial content copied verbatim from ECE 103.
CJAdviser: SMT-based Debugging Support for ContextJ* Shizuka Uchio(Kyushu University, Japan) Naoyasu Ubayashi(Kyushu University, Japan) Yasutaka Kamei(Kyushu.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
An Undergraduate Course on Software Bug Detection Tools and Techniques Eric Larson Seattle University March 3, 2006.
Java Basics Opening Discussion zWhat did we talk about last class? zWhat are the basic constructs in the programming languages you are familiar.
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
Software Model Checking Moonzoo Kim. Operational Semantics of Software A system execution  is a sequence of states s 0 s 1 … – A state has an environment.
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
Recursion A recursive definition is one which uses the word or concept being defined in the definition itself Example: “A computer is a machine.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
© 2006 Carnegie Mellon University Introduction to CBMC: Part 1 Software Engineering Institute Carnegie Mellon University Pittsburgh, PA Arie Gurfinkel,
Bernd Fischer RW714: SAT/SMT-Based Bounded Model Checking of Software.
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
SAT for Software Model Checking Introduction to SAT-problem for newbie
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
SS 2017 Software Verification Bounded Model Checking, Outlook
Matching Logic An Alternative to Hoare/Floyd Logic
Solving Linear Arithmetic with SAT-based MC
Introduction to Software Verification
Over-Approximating Boolean Programs with Unbounded Thread Creation
CS 201 Fundamental Structures of Computer Science
NASA Secure Coding Rules
Software Model Checking
CUTE: A Concolic Unit Testing Engine for C
Presentation transcript:

Bernd Fischer RW714: SAT/SMT-Based Bounded Model Checking of Software

SAT/SMT-based BMC tools for C CBMC (C Bounded Model Checker) – –SAT-based (MiniSat) “workhorse” –also SystemC frontend ESBMC (Embedded Systems Bounded Model Checker) – –SMT-based (Z3, Boolector) –branched off CBMC, also (rudimentary) C++ frontend LLBMC (Low-level Bounded Model Checker) – –SMT-based (Boolector, Z3) –uses LLVM intermediate language ⇒ share common high-level architecture

SAT/SMT-based BMC tools for C Typical features: full language support –bit-precise operations, structs, arrays,... –heap-allocated memory –concurrency built-in safety checks –overflow, div-by-zero, array out-of-bounds indexing,... –memory safety: nil pointer deref, memory leaks,... –deadlocks, race conditions user-specified assertions and error labels non-deterministic modelling –nondeterministic assignments –assume-statements

SAT/SMT-based BMC tools for C High-level architecture: Parser Static Analysis CNF-gen Solver CEX-gen C Program SAFE UNSAFE + CEX SAT UNSAT CNF (bit blasting) intermediate program equations (path and safety conditions)

SAT/SMT-based BMC tools for C General approach: 1.Simplify control flow 2.Unwind all of the loops 3.Convert into single static assignment (SSA) form 4.Convert into equations and simplify 5.(Bit-blast) 6.Solve with a SAT/SMT solver 7.Convert SAT assignment into a counterexample

Control flow simplifications remove all side effects –e.g., j = ++i; becomes i = i+1; j = i; simplify all control flow structures into core forms –e.g., replace for, do while by while –e.g., replace case by if make control flow explicit –e.g., replace continue, break by goto –e.g., replace if, while by goto

Control flow simplifications Demo: esbmc --show-goto-functions example-1.c Demo: esbmc example-1.c Demo: esbmc example-3.c Demo: esbmc example-5.c int main() { int i,j; for(i=0; i<6; i++) { j=i; } assert(j==i); return j; } main (c::main): int i; int j; i = 0; 1: IF !(i < 6) THEN GOTO 2 j = i; i = i + 1; GOTO 1 2: ASSERT j == i RETURN: j END_FUNCTION

Loop unwinding all loops are “unwound”, i.e., replaced by several guarded copies of the loop body –same for backward goto s and recursive functions –can use different unwinding bounds for different loops ⇒ each statement is executed at most once to check whether unwinding is sufficient special “unwinding assertion” claims are added ⇒ if a program satisfies all of its claims and all unwinding assertions then it is correct!

Loop unwinding void f(...) {... while(cond) { Body; } Remainder; } void f(...) {... while(cond) { Body; } Remainder; }

Loop unwinding void f(...) {... if(cond) { Body; while(cond) { Body; } Remainder; } void f(...) {... if(cond) { Body; while(cond) { Body; } Remainder; } unwind one iteration

Loop unwinding void f(...) {... if(cond) { Body; if(cond) { Body; while(cond) { Body; } Remainder; } void f(...) {... if(cond) { Body; if(cond) { Body; while(cond) { Body; } Remainder; } unwind one iteration

Loop unwinding void f(...) {... if(cond) { Body; if(cond) { Body; if(cond) { Body; while(cond) { Body; } Remainder; } void f(...) {... if(cond) { Body; if(cond) { Body; if(cond) { Body; while(cond) { Body; } Remainder; } unwind one iteration unwind one iteration…

Loop unwinding void f(...) {... if(cond) { Body; if(cond) { Body; if(cond) { Body; assert(!cond); } Remainder; } void f(...) {... if(cond) { Body; if(cond) { Body; if(cond) { Body; assert(!cond); } Remainder; } unwinding assertion unwind one iteration unwind one iteration… unwinding assertion –inserted after last unwound iteration –violated if program runs longer than bound permits ⇒ if not violated: (real) correctness result!

Loop unwinding void f(...) {... for(i=0; i<N; i++) {... b[i]=a[i];... };... for(i=0; i<N; i++) {... assert(b[i]-a[i]>0);... };... Remainder; } void f(...) {... for(i=0; i<N; i++) {... b[i]=a[i];... };... for(i=0; i<N; i++) {... assert(b[i]-a[i]>0);... };... Remainder; } unwinding assertion –inserted after last unwound iteration –violated if program runs longer than bound permits ⇒ if not violated: (real) correctness result! ⇒ what about multiple loops? –use --partial-loops to suppress insertion ⇒ unsound

Safety conditions Built-in safety checks converted into explicit assertions: e.g., array safety: a[i]=...; ⇒ assert(0 <= i && i <= N); a[i]=...; ⇒ sometimes easier at intermediate representation or formula level e.g., word-aligned pointer access, overflow,...

SAT/SMT-based BMC tools for C High-level architecture: Parser Static Analysis CNF-gen Solver CEX-gen C Program SAFE UNSAFE + CEX SAT UNSAT CNF (bit blasting) intermediate program equations (path and safety conditions)

Transforming straight-line programs into equations simple if each variable is assigned only once: still simple if variables are assigned multiple times: introduce fresh copy for each occurrence (static single assignment (SSA) form) x = a; y = x + 1; z = y – 1; x = a; y = x + 1; z = y – 1; program constraints x = a&& y = x + 1&& z = y – 1 x = a&& y = x + 1&& z = y – 1 x = a; x = x + 1; x = x – 1; x = a; x = x + 1; x = x – 1; program x 0 = a; x 1 = x 0 + 1; x 2 = x 1 – 1; x 0 = a; x 1 = x 0 + 1; x 2 = x 1 – 1; program in SSA-form

Transforming loop-free programs into equations But what about control flow branches (if-statements)? if(v) x = y; else x = z; w = x; if(v) x = y; else x = z; w = x; if(v 0 ) x 0 = y 0 ; else x 1 = z 0 ; w 1 = x ?? ; if(v 0 ) x 0 = y 0 ; else x 1 = z 0 ; w 1 = x ?? ; which x?

Transforming loop-free programs into equations But what about control flow branches (if-statements)? for each control flow join point, add a new variable with guarded assignment as definition –also called ϕ -function if(v) x = y; else x = z; w = x; if(v) x = y; else x = z; w = x; if(v 0 ) x 0 = y 0 ; else x 1 = z 0 ; x 2 = v 0 ? x 0 : x 1 ; w 1 = x 2 ; if(v 0 ) x 0 = y 0 ; else x 1 = z 0 ; x 2 = v 0 ? x 0 : x 1 ; w 1 = x 2 ; introduce & use new variable

Bit-blasting Conversion of equations into SAT problem: simple assignments: |[ x = y ]| ≙ ⋀ i x i ⇔ y i ⇒ static analysis must approximate effective bitwidth well ϕ -functions: |[ x = v ? y : z ]| ≙ (v ⇒ |[ x = y ]|) ⋀ (¬ v ⇒ |[ x = z ]|) Boolean operations: |[ x = y | z ]| ≙ ⋀ i x i ⇔ (y i ⋁ z i ) Exercise: relational operations effective bitwidth

Bit-blasting arithmetic operations Build circuits that implement the operations! 1-bit addition: Full adder as CNF:

Bit-blasting arithmetic operations Build circuits that implement the operations! ⇒ adds w variables, 6*w clauses ⇒ multiplication / division much more complicated

Handling arrays Arrays can be replaced by individual variables, with a “demux” at each access: ⇒ surprisingly effective (for N<1000) because value of i can often be determined statically – due to constant propagation int a[10];... x = a[i]; int a[10];... x = a[i]; int a 0, a 1, a 2,... a 9 ;... x = (i==0 ? a 0 : (i==1 ? a 1 : (i==2 ? a 2 :...); int a 0, a 1, a 2,... a 9 ;... x = (i==0 ? a 0 : (i==1 ? a 1 : (i==2 ? a 2 :...);

Handling arrays with theories Arrays can be seen as ADT with two operations: read:Array x Index → Element write:Array x Index x Element → Array Axioms describe intended semantics: ⇒ requires support by SMT-solver “select” “update”... a[i]=a[i]+1;... a[i]=a[i]+1;... a 1 =write(a 0,i,read(a 0,i)+1);... a 1 =write(a 0,i,read(a 0,i)+1);...

Handling arrays with λ-terms How to handle memset and memcpy ? void *memset(void *dst, int c, size_t n); void *memcpy(void *dst, const void *src, size_t n); not scalable for large constants need to encode as loop for non-constant block sizes –same problems for normal array-copy operations... memcpy(a,b,4);... memcpy(a,b,4);... a 1 =write(a 0,0,read(b,0)); a 2 =write(a 1,1,read(b,1)); a 3 =write(a 2,2,read(b,2)); a 4 =write(a 3,3,read(b,3));... a 1 =write(a 0,0,read(b,0)); a 2 =write(a 1,1,read(b,1)); a 3 =write(a 2,2,read(b,2)); a 4 =write(a 3,3,read(b,3));...

Handling arrays with λ-terms How to handle memset and memcpy ? void *memset(void *dst, int c, size_t n); void *memcpy(void *dst, const void *src, size_t n); similar for memset and array-copy loops additional axiom describes intended semantics ⇒ requires integration into SMT-solver... memcpy(a,b,4);... memcpy(a,b,4);... a 1 = λ i(0<=i && i<4) ? read(a 0,i):read(b,i));... a 1 = λ i(0<=i && i<4) ? read(a 0,i):read(b,i));...

Handling arrays with λ-terms

SAT vs. SMT BMC tools use both propositional satisfiability (SAT) and satisfiability modulo theories (SMT) solvers: SAT solvers require encoding everything in CNF –limited support for high-level operations –easier to reflect machine-level semantics –can be extremely efficient (SMT falls back to SAT) SMT solvers support built-in theories –equality, free function symbols, arithmetics, arrays,... –sometimes even quantifiers –very flexible, extensible, front-end easier –requires extra effort to enforce precies semantics –can be slower

Memory handling Simple idea: 1.heap is an array of bytes 2.pointer is index into the heap array –pointer arithmetic becomes almost trivial 3.malloc returns non-deterministic integer –needs extra bit-array to model free lists –need to model alignment restrictions ⇒ most precise model, but difficult to get efficiency

Memory handling Alternative idea: 1.each allocated object is modeled as individual “blob” 2.pointer dereferencing gives underlying object (or byte array) –pointer arithmetic across objects becomes almost impossible –requires points-to analysis 3.malloc returns fresh blob –use extra bit to model allocation of blobs –need to model alignment restrictions ⇒ most efficient model, but difficult to get precision

Modeling with non-determinism Extend C with three modeling features: assert(e) : aborts execution when e is false, no-op otherwise nondet_int() : returns non-deterministic int-value assume(e) : “ignores” execution when e is false, no-op otherwise void assert (_Bool b) { if (!b) exit(); } int nondet_int () { int x; return x; } void assume (_Bool e) { while (!e) ; }

Modeling with non-determinism General approach: use C program to set up structure and deterministic computations use non-determinism to set up search space use assumptions to constrain search space use failing assertion to start search int main() { int x=nondet_int(),y=nondet_int(),z=nondet_int(); __ESBMC_assume(x > 0 && y > 0 && z > 0); __ESBMC_assume(x < && y < && z < 16384); assert(x*x + y*y != z*z); return 0; }

Modelling with non-determinism Exercise: implement a fuzzy shortest path search use a (non-deterministic) matrix to represent driving times between two locations use assumptions on entries to constrain the times –0 ≙ no direct connection –allow “takes longer than”, “between 3 and 5 hours”,... use vector (with non-det size) to represent path –use assumptions to ensure that path can be taken –iterate over path to compute time –iterate over times and path lengths