Download presentation
Presentation is loading. Please wait.
Published byHector Richardson Modified over 9 years ago
1
Modeling Data in Formal Verification Bits, Bit Vectors, or Words Karam AbdElkader Based on: Presentations form Randal E. Bryant - Carnegie Mellon University Decision Procedures An Algorithmic Point of View D.Kroening – Oxsoford Unversity, O.Strichman - Technion
2
– 2 – Agenda Overview and Examples. Introduction to Bit-Vector Logic Syntax Semantics Decision procedures for Bit-Vector Logic Flattening Bit-Vector Logic Incremental Flattening Bit-Vector Arithmetic With Abstraction 2
3
– 3 – Issue How should data be modeled in formal analysis? Verification, test generation, security analysis, …Approaches Bits: Every bit is represented individually Basis for most CAD, model checking Words: View each word as arbitrary value E.g., unbounded integers Historic program verification work Bit Vectors: Finite precision words Captures true semantics of hardware and software More opportunities for abstraction than with bits Over View
4
– 4 – Data Path Com. Log. 1 Com. Log. 2 Bit-Level Modeling Represent Every Bit of State Individually Behavior expressed as Boolean next-state over current state Historic method for most CAD, testing, and verification tools E.g., model checkers Control Logic
5
– 5 – Bit-Level Modeling in Practice Strengths Allows precise modeling of system Well developed technology BDDs & SAT for Boolean reasoningLimitations Every state bit introduces two Boolean variables Current state & next state Overly detailed modeling of system functions Don’t want to capture full details of FPU Making It Work Use extensive abstraction to reduce bit count Hard to abstract functionality
6
– 6 – Word-Level Abstraction #1: Bits → Integers View Data as Symbolic Words Arbitrary integers No assumptions about size or encoding Classic model for reasoning about software Can store in memories & registers x0x0 x1x1 x2x2 x n-1 x
7
– 7 – Data Path Com. Log. 1 Com. Log. 2 Abstracting Data Bits Control Logic Data Path Com. Log. 1 Com. Log. 1 ?? What do we do about logic functions?
8
– 8 – Word-Level Abstraction #2: Uninterpreted Functions For any Block that Transforms or Evaluates Data: Replace with generic, unspecified function Only assumed property is functional consistency: a = x b = y f (a, b) = f (x, y) ALUALU f
9
– 9 – Abstracting Functions For Any Block that Transforms Data: Replace by uninterpreted function Ignore detailed functionality Conservative approximation of actual system Data Path Control Logic Com. Log. 1 Com. Log. 1 F1F1 F2F2
10
– 10 – Word-Level Modeling: History Historic Used by theorem provers More Recently Burch & Dill, CAV ’94 Verify that pipelined processor has same behavior as unpipelined reference model Use word-level abstractions of data paths and memories Use decision procedure to determine equivalence Bryant, Lahiri, Seshia, CAV ’02 UCLID verifier Tool for describing & verifying systems at word level
11
– 11 – Pipeline Verification Example Pipelined Processor Reference Model
12
– 12 – Abstracted Pipeline Verification Pipelined Processor Reference Model
13
– 13 – Experience with Word-Level Modeling Powerful Abstraction Tool Allows focus on control of large-scale system Can model systems with very large memories Hard to Generate Abstract Model Hand-generated: how to validate? Automatic abstraction: limited success Andraus & Sakallah, DAC 2004 Realistic Features Break Abstraction E.g., Set ALU function to A+0 to pass operand to outputDesire Should be able to mix detailed bit-level representation with abstracted word-level representation
14
– 14 – Bit Vectors: Motivating Example #1 Do these functions produce identical results? Strategy Represent and reason about bit-level program behavior Specific to machine word size, integer representations, and operations int abs(int x) { int mask = x>>31; return (x ^ mask) + ~mask + 1; } int test_abs(int x) { return (x < 0) ? -x : x; }
15
– 15 – Motivating Example #2 Is there an input string that causes value 234 to be written to address a 4 a 3 a 2 a 1 ? void fun() { char fmt[16]; fgets(fmt, 16, stdin); fmt[15] = '\0'; printf(fmt); } Answer Yes: " a 1 a 2 a 3 a 4 %230g%n" Depends on details of compilation But no exploit for buffer size less than 8 [Ganapathy, Seshia, Jha, Reps, Bryant, ICSE ’05]
16
– 16 – Motivating Example #3 Is there a way to expand the program sketch to make it match the spec? bit[W] popSpec(bit[W] x) { int cnt = 0; for (int i=0; i<W; i++) { if (x[i]) cnt++; } return cnt; } Answer W=16: [Solar-Lezama, et al., ASPLOS ‘06] bit[W] popSketch(bit[W] x) { loop (??) { x = (x&??) + ((x>>??)&??); } return x; } x = (x&0x5555) + ((x>>1)&0x5555); x = (x&0x3333) + ((x>>2)&0x3333); x = (x&0x0077) + ((x>>8)&0x0077); x = (x&0x000f) + ((x>>4)&0x000f);
17
– 17 – Motivating Example #4 Is pipelined microprocessor identical to sequential reference model? Strategy Represent machine instructions, data, and state as bit vectors Compatible with hardware description language representation Verifier finds abstractions automatically Pipelined Microprocessor Sequential Reference Model
18
– 18 – Decision Procedures for System-Level Software What kind of logic do we need for system-level software? We need bit-vector logic - with bit-wise operators, arithmetic overflow We want to scale to large programs - must verify large formulas
19
– 19 – Decision Procedures for System-Level Software System-Level Software What kind of logic do we need for system-level software? We need bit-vector logic - with bit-wise operators, arithmetic overflow We want to scale to large programs - must verify large formulas Examples of program analysis tools that generate bit-vector formulas: CBMC SATABS SATURN (Stanford, Alex Aiken) EXE (Stanford, Dawson Engler, David Dill) Variants of those developed at IBM, Microsoft
20
– 20 – Bit-Vector Logic: Syntax
21
– 21 – formula : formula ∨ formula | ¬formula | atom atom: term rel term | Boolean-Identifier | term[ constant ] rel := | < term: term op term | identifier | ∼ term | constant | atom?term:term | term[ constant : constant ] | ext ( term ) op:+| − | · |/| > | & | | | ⊕ | ◦ ∼ x: bit-wise negation of x ext (x): sign- or zero-extension of x x << d: left shift with distance d x ◦ y: concatenation of x and y Bit-Vector Logic: Syntax
22
– 22 – Semantics Danger! (x − y > 0) if and only if (x > y) Valid over R/N, but not over the bit-vectors. (Many compilers have this sort of bug)
23
– 23 – Width and Encoding The meaning depends on the width and encoding of the variables. 7
24
– 24 – The meaning depends on the width and encoding of the variables. Typical encodings: Binary encoding Two’s complement But maybe also fixed-point, floating-point,... 7 Width and Encoding
25
– 25 – Examples
26
– 26 – Width and Encoding Notation to clarify width and encoding:
27
– 27 – Bit-vectors Made Formal Definition (Bit-Vector) A bit-vector is a vector of Boolean values with a given length l: b : {0,...,l − 1} → {0,1}
28
– 28 – Bit-vectors Made Formal Definition (Bit-Vector) The value of bit number i of x is x(i). We also write for Definition (Bit-Vector) A bit-vector is a vector of Boolean values with a given length l: b : {0,...,l − 1} → {0,1}
29
– 29 – Lambda-Notation for Bit-Vectors λ expressions are functions without a name
30
– 30 – Examples: The vector of length l that consists of zeros: A function that inverts (flips all bits in) a bit-vector: A bit-wise OR: ⇒ we now have semantics for the bit-wise operators. Lambda-Notation for Bit-Vectors λ expressions are functions without a name
31
– 31 – Semantics for Arithmetic Expressions What is the output of the following program? unsigned char number = 200; number = number + 100; printf("Sum: %d\n", number);
32
– 32 – Semantics for Arithmetic Expressions What is the output of the following program? unsigned char number = 200; number = number + 100; printf("Sum: %d\n", number); On most architectures, this is 44! 11001000= 200 +01100100= 100 =00101100= 44 Semantics for Arithmetic Expressions
33
– 33 – Semantics for Arithmetic Expressions What is the output of the following program? unsigned char number = 200; number = number + 100; printf("Sum: %d\n", number); On most architectures, this is 44! 11001000= 200 +01100100= 100 =00101100= 44 Semantics for Arithmetic Expressions ⇒ Bit-vector arithmetic uses modular arithmetic!
34
– 34 – Semantics for addition, subtraction: Semantics for Arithmetic Expressions
35
– 35 – Semantics for addition, subtraction: Semantics for Arithmetic Expressions We can even mix the encodings:
36
– 36 – Semantics for Relational Operators Semantics for <, ≤, ≥, and so on: Mixed encodings: Note that most compilers don’t support comparisons with mixed encodings.
37
– 37 – Complexity Satisfiability is undecidable for an unbounded width, even without arithmetic. Complexity
38
– 38 – Complexity Satisfiability is undecidable for an unbounded width, even without arithmetic. It is NP-complete otherwise.
39
– 39 – Decision Procedures Core technology for formal reasoning Boolean SAT Pure Boolean formula SAT Modulo Theories (SMT) Support additional logic fragments Example theories Linear arithmetic over reals or integers Functions with equality Bit vectors Combinations of theories Formula Decision Procedure Satisfying solution Unsatisfiable (+ proof)
40
– 40 – SAT made a progress…
41
– 41 – BV Decision Procedures: Some History B.C. (Before Chaff) String operations (concatenate, field extraction) Linear arithmetic with bounds checking Modular arithmeticLimitations Cannot handle full range of bit-vector operations
42
– 42 – BV Decision Procedures: Using SAT SAT-Based “Bit Blasting” Generate Boolean circuit based on bit-level behavior of operations Convert to Conjunctive Normal Form (CNF) and check with best available SAT checker Handles arbitrary operations Effective in Many Applications CBMC [Clarke, Kroening, Lerda, TACAS ’04] Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05] CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]
43
– 43 – A Simple Decision Procedure Transform Bit-Vector Logic to Propositional Logic Most commonly used decision procedure Also called ’bit-blasting’
44
– 44 – A Simple Decision Procedure Transform Bit-Vector Logic to Propositional Logic Most commonly used decision procedure Also called ’bit-blasting’ Bit-Vector Flattening 1 2 3 1. Convert propositional part as before 2. Add a Boolean variable for each bit of each sub-expression (term) 3. Add constraint for each sub-expression We denote the new Boolean variable for bit i of term t by. 17 A Simple Decision Procedure
45
– 45 – What constraints do we generate for a given term? Bit-vector Flattening
46
– 46 – What constraints do we generate for a given term? This is easy for the bit-wise operators. Bit-vector Flattening Example for (read x = y over bits as x y)
47
– 47 – What constraints do we generate for a given term? This is easy for the bit-wise operators. We can transform this into CNF using Tseitin’s method. Bit-vector Flattening Example for (read x = y over bits as x y)
48
– 48 – Bit-vector Flattening
49
– 49 – Flattening Bit-Vector Arithmetic How to flatten a + b?
50
– 50 – Flattening Bit-Vector Arithmetic How to flatten a + b? → we can build a circuit that adds them! The full adder in CNF: Flattening Bit-Vector Arithmetic
51
– 51 – Flattening Bit-Vector Arithmetic Ok, this is good for one bit! How about more?
52
– 52 – Ok, this is good for one bit! How about more? 8-Bit ripple carry adder (RCA) Also called carry chain adder Adds l variables Adds 6 · l clauses Flattening Bit-Vector Arithmetic
53
– 53 – Bit-vector Flattening
54
– 54 – Multipliers Multipliers result in very hard formulas Example: CNF: About 11000 variables, unsolvable (Hard) for current SAT solvers Similar problems with division, modulo Q: Why is this hard?
55
– 55 – Multipliers Multipliers result in very hard formulas Example: CNF: About 11000 variables, unsolvable (Hard) for current SAT solvers Similar problems with division, modulo Q: Why is this hard? Q: How do we fix this?
56
– 56 – Multipliers
57
– 57 – Incremental Flattening ϕ sk : Boolean part of ϕ F: set of terms that are in the encoding
58
– 58 – Incremental Flattening ? ϕ f := ϕ sk, F := ∅ No! ? UNSAT Incremental Flattening ϕ sk : Boolean part of ϕ F: set of terms that are in the encoding
59
– 59 – Incremental Flattening ? ϕ f := ϕ sk, F := ∅ Is ϕ f SAT? Yes! - compute I No! UNSAT I: set of terms that are inconsistent with the current assignment Incremental Flattening ϕ sk : Boolean part of ϕ F: set of terms that are in the encoding
60
– 60 – Incremental Flattening ? ϕ f := ϕ sk, F := ∅ ? Is ϕ f SAT? Yes! - compute I No! I = ∅ ? ? UNSAT SAT ϕ sk : Boolean part of ϕ F: set of terms that are in the encoding I: set of terms that are inconsistent with the current assignment Incremental Flattening
61
– 61 – Incremental Flattening ? ϕ f := ϕ sk, F := ∅ Pick F ′ ⊆ (I \ F ) F := F ∪ F ′ ϕ f := ϕ f ∧ Constraint(F) ? Is ϕ f SAT?Yes! - No! ? UNSAT 6 I = ∅ compute I I = ∅ ? SAT ϕ sk : Boolean part of ϕ F: set of terms that are in the encoding I: set of terms that are inconsistent with the current assignment Incremental Flattening
62
– 62 – Incremental Flattening Idea: add ’easy’ parts of the formula first Only add hard parts when needed ϕ f only gets stronger - use an incremental SAT solver Incremental Flattening
63
– 63 – Incremental Flattening
64
– 64 – Incomplete Assignments Hey: initially, we only have the skeleton! How do we know what terms are inconsistent with the current assignment if the variables aren’t even in ϕ f ?
65
– 65 – Incomplete Assignments Solution: guess some values for the missing variables. If you guess right, it’s good. Incomplete Assignments Hey: initially, we only have the skeleton! How do we know what terms are inconsistent with the current assignment if the variables aren’t even in ϕ f ?
66
– 66 – Bit-Vector Challenge Is there a better way than bit blasting? Requirements Provide same functionality as with bit blasting Find abstractions based on word-level structure Improve on performance of bit blastingObservation Must have bit blasting at core Only approach that covers full functionality Want to exploit special cases Formula satisfied by small values Simple algebraic properties imply unsatisfiability Small unsatisfiable core Solvable by modular arithmetic …
67
– 67 – Iterative Approximation Idea Iterative Approximation UCLID: Bryant, Kroening, Ouaknine, Seshia, Strichman, Brady, TACAS ’07 Use bit blasting as core technique Apply to simplified versions of formula Successive approximations until solve or show unsatisfiable
68
– 68 – Iterative Approach Background: Approximating Formula Example Approximation Techniques Underapproximating Restrict word-level variables to smaller ranges of values Overapproximating Replace subformula with Boolean variable Original Formula + + Overapproximation ++ More solutions: If unsatisfiable, then so is Underapproximation − −− Fewer solutions: Satisfying solution also satisfies
69
– 69 – Starting Iterations Initial Underapproximation (Greatly) restrict ranges of word-level variables Intuition: Satisfiable formula often has small-domain solution 1−1−
70
– 70 – First Half of Iteration SAT Result for SAT Result for 1 − Satisfiable Then have found solution for Unsatisfiable Use UNSAT proof to generate overapproximation 1 + (Described later) 1−1− If SAT, then done 1+1+ UNSAT proof: generate overapproximation
71
– 71 – Second Half of Iteration SAT Result for SAT Result for 1 + Unsatisfiable Then have shown unsatisfiable Satisfiable Solution indicates variable ranges that must be expanded Generate refined underapproximation 1−1− If UNSAT, then done 1+1+ SAT: Use solution to generate refined underapproximation 2−2−
72
– 72 – Example := ( x = y+2 ) ^ ( x 2 > y 2 ) 1 − := ( x [1] = y [1] +2) ^( x [1] 2 > y [1] 2 ) 2 − := ( x [2] = y [2] +2) ^ ( x [2] 2 > y [2] 2 ) 1 + := ( x = y +2) SAT, done. UNSAT Look at proof SAT x = 2, y = 0
73
– 73 – Iterative Behavior Underapproximations Successively more precise abstractions of Allow wider variable rangesOverapproximations No predictable relation UNSAT proof not unique 1−1− 1+1+ 2−2− k−k− 2+2+ k+k+
74
– 74 – Overall Effect Soundness Only terminate with solution on underapproximation Only terminate as UNSAT on overapproximationCompleteness Successive underapproximations approach Finite variable ranges guarantee termination In worst case, get k − 1−1− 1+1+ 2−2− k−k− 2+2+ k+k+ SAT UNSAT
75
– 75 – Generating Over approximation Given Underapproximation 1 − Bit-blasted translation of 1 − into Boolean formula Proof that Boolean formula unsatisfiableGenerate Overapproximation 1 + If 1 + satisfiable, must lead to refined underapproximation 1−1− 1+1+ UNSAT proof: generate overapproximation 2−2−
76
– 76 – Bit-Vector Formula Structure DAG representation to allow shared subformulas x + 2 z 1 x % 26 = v w & 0xFFFF = x x = y a
77
– 77 – Structure of Underapproximation Linear complexity translation to CNF Each word-level variable encoded as set of Boolean variables Additional Boolean variables represent subformula values x + 2 z 1 x % 26 = v w & 0xFFFF = x x = y a −− Range Constraints w x y z Æ
78
– 78 – Encoding Range Constraints Explicit View as additional predicates in formulaImplicit Reduce number of variables in encoding ConstraintEncoding 0 w 80 0 0 ··· 0 w 2 w 1 w 0 −4 x 4x s x s x s ··· x s x s x 1 x 0 Yields smaller SAT encodings Range Constraints w x −4 x 4 0 w 8 −4 x 4
79
– 79 – Range Constraints w x y z Æ UNSAT Proof Subset of clauses that is unsatisfiable Clause variables define portion of DAG Sub graph that cannot be satisfied with given range constraints x + 2 z 1 x % 26 = v w & 0xFFFF = x x = y a Ç Æ Æ Ç Ç:
80
– 80 – Extracting Circuit from UNSAT Proof Subgraph that cannot be satisfied with given range constraints Even when replace rest of graph with unconstrained variables x + 2 z 1 x = y a Æ Æ Ç Ç: b1b1 b2b2 Range Constraints w x y z Æ UNSAT
81
– 81 – Generated Over Approximation Remove range constraints on word-level variables Creates overapproximation Ignores correlations between values of subformulas x + 2 z 1 x = y a Æ Æ Ç Ç: b1b1 b2b2 1+1+
82
– 82 – Generated Over Approximation Algorithm
83
– 83 – Refinement Property Claim 1 + has no solutions that satisfy 1 −’s range constraints Because 1 + contains portion of 1 − that was shown to be unsatisfiable under range constraints x + 2 z 1 x = y a Æ Æ Ç Ç: b1b1 b2b2 Range Constraints w x y z Æ UNSAT 1+1+
84
– 84 – Refinement Property (Cont.) Consequence Solving 1 + will expand range of some variables Leading to more exact underapproximation 2 − x + 2 z 1 x = y a Æ Æ Ç Ç: b1b1 b2b2 1+1+
85
– 85 – Effect of Iteration Each Complete Iteration Expands ranges of some word-level variables Creates refined underapproximation 1−1− 1+1+ SAT: Use solution to generate refined underapproximation 2−2− UNSAT proof: generate overapproximation
86
– 86 – Approximation Methods So Far Range constraints Underapproximate by constraining values of word-level variables Subformula elimination Overapproximate by assuming subformula value arbitrary General Requirements Systematic under- and over-approximations Way to connect from one to another Goal: Devise Additional Approximation Strategies
87
– 87 – Function Approximation Example : Prohibit Via Additional Range Constraints §: Prohibit Via Additional Range Constraints Gives underapproximation Restricts values of (possibly intermediate) terms : Abstract as f (x,y) §: Abstract as f (x,y) Overapproximate as uninterpreted function f Value constrained only by functional consistency * x y x 01else y 0000 101x 0y§
88
– 88 – Function Approximation Example * x y x 01else y 0000 101x 0y§
89
– 89 – Results: UCLID BV vs. Bit-blasting UCLID always better than bit blasting Generally better than other available procedures SAT time is the dominating factor [results on 2.8 GHz Xeon, 2 GB RAM]
90
– 90 – Challenges with Iterative Approximation Formulating Overall Strategy Which abstractions to apply, when and where How quickly to relax constraints in iterations Which variables to expand and by how much? Too conservative: Each call to SAT solver incurs cost Too lenient: Devolves to complete bit blasting. Predicting SAT Solver Performance Hard to predict time required by call to SAT solver Will particular abstraction simplify or complicate SAT? Combination Especially Difficult Multiple iterations with unpredictable inner loop
91
– 91 – Summary: Modeling Levels Bits Limited ability to scale Hard to apply functional abstractionsWords Allows abstracting data while precisely representing control Overlooks finite word-size effects Bit Vectors Realistic semantic model for hardware & software Captures all details of actual operation Detects errors related to overflow and other artifacts of finite representation Can apply abstractions found at word-level
92
– 92 – Areas of Agreement SAT-Based Framework Is Only Logical Choice SAT solvers are good & getting better Want to Automatically Exploit Abstractions Function structure Arithmetic properties E.g., associativity, commutativty Arithmetic reductions E.g., LU decomposition Base Level Should Be SAT Semantically complete approach
93
– 93 – Thank you.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.