Download presentation
Presentation is loading. Please wait.
1
Assume/Guarantee Reasoning using Abstract Interpretation Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv
2
Limitations of Whole Program Analysis Complexity of Chaotic Iterations Not all the source code is available –Large libraries –Software components No interaction with the client –Program design
3
A Motivating Example List rev(List x) { if (x ==null) return null ; return append(rev(x next), x); } List append(List x, List y) { List e; if (x == null) return y; e = malloc(…); e data = x data; e next = append(x next, y); } List rev(List x) requires acyclic(x) ensures $$=reverse(x) List append(List x, List y) requires acyclic(x) acyclic(y) ensures $$= x || y Contract Can also used for runtime testing
4
Challenges in A/G Reasoning Specifying procedure contracts Performing abstract interpretation using contracts
5
Specifying Contracts Executable specifications –assert –Can use loops –Expressive –Natural –But what about side-effects Declarative specifications –Types –First order logic –Z Hybrid –Larch –Java Modeling Language
6
Procedure Contracts and Modularity The postcondition does not reveal the whole story void foo(List x, List z) { List y, t ; y = rev(x); t = rev(z); } List rev(List x) requires acyclic(x) ensures $$=reverse(x) List foo(List x) requires acyclic(x) acyclic(y) ensures true
7
Procedure Contracts and Modularity Specify parts of the state which may be modified But difficult to define potential side-effects Can use abstract interpretation void foo(List x, List z) { List y, t ; y = rev(x); t = rev(z) } List rev(List x) requires acyclic(x) ensures $$=reverse(x) List foo(List x) requires acyclic(x) acyclic(y) ensures true
8
Issues in Specifying Contracts Expressible Conciseness Natural Reuse Cost of dynamic check (model checking) Decidability Cost of abstract interpretation
9
Plan CSSV: A tool for verifying absence of buffer overruns (N. Dor) An algorithm for performing abstract interpretation in the most precise way using specification
10
CSSV: Towards a Realistic Tool for Statically Detecting All Buffer Overflows in C Nurit Dor, Michael Rodeh, Mooly Sagiv DAEDALUS project
11
/* from web2c [strpascal.c] */ void foo(char *s) { while ( *s != ‘ ‘ ) s++; *s = 0; } Vulnerabilities of C programs Null dereference Dereference to unallocated storage Out of bound pointer arithmeticOut of bound update
12
Is it common? General belief – yes! FUZZ study –Test reliability by random input –Tens of applications on 9 different UNIX systems –18% – 23% hang or crash CERT advisory –Up to 50% of attacks are due to buffer overflow COMMON AND DANGEROUS
13
CSSV’s Goals Efficient conservative static checking algorithm –Verify the absence of buffer overflow not just finding bugs –All C constructs Pointer arithmetic, casting, dynamic memory, … –Real programs –Minimum false alarms
14
Verifying Absence of Buffer Overflow is non-trivial void safe_cat(char *dst, int size, char *src ) { if ( size > strlen(src) + strlen(dst) ) { dst = dst + strlen(dst); strcpy(dst, src); } {string(src) alloc(dst) > len(src)} {string(src) string(dst) alloc(dst+len(dst)) > len(src)} string(src) string(dst) (size > len(src)+len(dst)) alloc(dst+len(dst)) > len(src))
15
Can this be done for real programs? Complex linear relationships Pointer arithmetic Loops Procedures Use Polyhedra[CH78] Pointer analysis Widening Procedure contracts Very few false alarms!
16
Linear Relation Analysis Cousot and Halbwachs, 78 Statically analyze program variable relations: a 1 * var 1 + a 2 * var 2 + … + a n * var n b Polyhedron y 1 x + y 3 -x + y 1 0 1 2 3 x 0 1 2 3 y V = { (1,2) (2,1) } R = { (1,0) (1,1) }
17
C String Static Verifier Detects string violations –Buffer overflow (update beyond bounds) –Unsafe pointer arithmetic –References beyond null termination –Unsafe library calls Handles full C –Multi-level pointers, pointer arithmetic, structures, casting, … Applied to real programs –Public domain software –C code from Airbus
18
Plan Semantics for C program Contract language Static analysis algorithm Implementation
19
Standard C Semantics void safe_cat(char *dst, int size, char *src ) { if ( size > strlen(src) + strlen(dst) ) { dst = dst + strlen(dst); strcpy(dst, src); } } src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 0x6000009
20
Instrumented C Semantics src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 4 130 baseasize 4 4 245 0x6000009
21
Instrumented C Semantics src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 4 130 baseasize 4 4 245 0x6000009 0 offset 9 0x6000000
22
The instrumented semantics checks validity of C expressions ANSI C Cleanness dst = dst + i Safety offset(dst) + i asize(base(dst)) dst offset(dst) base(dst) asize(base(dst)) i
23
Contracts Defined in the instrumented semantics Specify string behavior of procedures (C expressions) –Precondition –Postcondition Use of values at procedure entry –Side-effects Can be approximated from pointed information No need to specify pointer information –Not aiming for modular pointer analysis
24
Contracts’ Advantages Modular analysis –Use contracts on call statements –Not all the code is available –Enable more expensive analyses User control of the verification –Detect errors at point of logical error –Improve the precision of the analysis Check additional properties –Beyond ANSI-C
25
Example char* strcpy(char* dst, char* src) requires mod ensures ( string(src) alloc(dst) > len(src) ) ( len(dst) = = [len(src)] pre return = = [dst] pre ) dst
26
safe_cat’s contract void safe_cat(char* dst, int size, char* src) requires mod ensures ( string(src) string(dst) alloc(dst) == size ) ( len(dst) <= [len(src)] pre + [len(dst)] pre len(dst) >= [len(dst)] pre ) dst
27
Contracts and Soundness All errors are detected –Violation of statement’s precondition …a[i]… –Violation of procedure’s precondition Call –Violation of procedure's postcondition Return Violation messages depend on the contracts But may lead to more false alarms (e.g., trivial contracts)
28
CSSV Static Analysis 1.Inline contracts Expose behavior of called procedures 2.Pointer analysis (global) Find relationship between base addresses 3.Integer analysis Compute offset information
29
Step 1: Inliner void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } void safe_cat( char *dst, int size, char *src ) requires ( string(src) string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)] pre + [len(dst)] pre ) char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)] pre return = = [dst] pre )
30
Step 1: Inliner void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } void safe_cat( char *dst, int size, char *src ) requires ( string(src) string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)] pre + [len(dst)] pre ) char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)] pre return = = [dst] pre ) assume assert
31
Step 1: Inliner void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } void safe_cat( char *dst, int size, char *src ) requires ( string(src) string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)] pre + [len(dst)] pre ) char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)] pre return = = [dst] pre ) assume assert
32
Step 2: Compute Pointer Information Required for reasoning about pointers Every base address is abstracted by an abstract location Relationships between base addresses is computed (points-to) Global analysis –Scalable –Imprecise Flow insensitive (Almost) Context insensitive
33
Global Points-To main() { char s[10], t[20],r; char *p1, *p2; … p1= r + i; safe_cat(s,10,p1); p2 = r + j; safe_cat(t,10,p2); … } str p2 dstsrc safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } p1
34
Procedural Points-to (PPT) “Project” pointer information on visible variables of the procedure Introduce abstract locations for formal parameters Allow destructive updates through formal parameters (well behaved programs) Can decrease precision in some procedures
35
PPT Param #1Param # 2 dstsrc safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … }
36
Step 3: Static Analysis Prove linear inequalities on string indices Abstract string properties using constraint variables Use abstract interpretation to conservatively interpret program statements Verify safety preconditions
37
Back to Semantics src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 4 130 baseasize 4 4 245 0x6000009 0 offset 9 0x6000000
38
Abstract Representation src dst size n1n1 n2n2 Base address relationship src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 0x6000009 0x6000000
39
Constraint Variables For every abstract location a.offset src.offset = 9 src
40
Constraint Variables For every integer abstract location a.val size.val = 125 size
41
Constraint Variables For every abstract location a.is_nullt a.len a.asize n1n1 n 1.len n 1.asize 0
42
Abstract Representation src dst size n1n1 n2n2 dst.offset < n 1.len size.val+ dst.offset = n 1.asize n 1.is_nullt = true n 2.is_nullt = true
43
What does it represent? dst size ? ? n 1.is_nullt = true 0 ? dst.offset < n 1.len n 1. len dst.offset size.val + dst.offset = n 1.asize size.val n 1. asize
44
Abstract Interpretation dst.offset < n 1.len size.val = n 1.asize - dst.offset dst = dst + strlen(dst); dst.offset = n 1.len size.val = n 1.asize - dst.offset + n 1.len
45
Verify Safety Condition dst = dst + i dst offset(dst) base(dst) asize(base(dst)) i offset(dst) + i asize(base(dst)) concrete semantics abstract semantics dst.offset + i.val n 1.asize n1n1 dst.offset n 1.asize dst i
46
The Assume-Operation Use two copies of constraint variables Set modified values to ⊤ Meet the post
47
CSSV Implementation C files Pre Mod Post C files contracts Procedure name Pointer Analysis Procedure ’ s Pointer info Inliner C files C ’ files C2IP Integer Procedure Potential Error Messages Integer Analysis
48
Used Software ASToolKit [Microsoft] Core C [TAU - Greta Yorsh] GOLF [Microsoft - Manuvir Das] New Polka [Inria - Bertrand Jeannet]
49
Applications Verified string library from Airbus with 6 false alarms –Could be avoided by analyzing correlated conditions Found 8 real errors in another string intensive application with 2 false alarms –In one case safety depends on correctness –Could be avoided by defensive programming 1 - 206 CPU seconds per procedure –No optimizations Very few false alarms
50
Related Work Non-Conservative Wagner et. al. [NDSS’00] LCLint’s extension [USENIX’01] Eau Claire [IEEE Oakland 02] Conservative Polyspace verifier Dor, Rodeh and Sagiv [SAS’01]
51
Further work Derive contracts Improve efficiency Interprocedural
52
CSSV: Summary Semantics –Safety checking –Full C –Enables abstractions Contract language –String behavior –Omit pointer aliasing Procedural points-to –Scalable –Improve precision Static analysis –Tracks important string properties –Utilizes integer analysis
53
Foundation of A/G abstract interpretation Greta Yorsh www.cs.tau.ac.il/~gretay
54
Assume-Guarantee Reasoning using AI T bar(); void foo() { T p;... p = bar();... } {pre bar, post bar } {pre foo, post foo } assume[pre foo ]; assert[pre bar ]; ----------- assume[post bar ]; assert[post foo ]; Is (a) ? assert[ ](a) assume[ ](a) <⊤><⊤> ( (a) ⋂ ) a ⋂ ( )
55
Goals Generic algorithms for assert & assume Effective Efficient Allow natural specifications Rather precise verification
56
Motivation New approach to using symbolic techniques in abstract interpretation –for shape analysis –for other analyses What does it mean to harness a decision procedure for use in static analysis? –what are the requirements ? –what does it buy us ?
57
What are the requirements ? Formulas S ∈ (a) ⇔ S (a) ^ AbstractConcrete a ^ Is (a) empty? Is (a) satisfiable? ^ ⇔ (a)
58
[x 0, y 0, z 0] [x 0, y 1, z 0] [x 0, y 2, z 0] [x 0, y , z 0] AbstractConcreteFormulas (x=0) (z=0) ^ S ⊧ (a) ⇔ S ∈ (a) ^
59
FormulasConcrete Values Abstract Values u1u1 x u x... x v1,v2 : node u1 (v1) node u (v2) v1 ≠ v2 v : node u1 (v) node u (v) ...
60
What does it buy us ? Guarantee the most-precise result w.r.t. to the abstraction –best transformer –other abstract operations Modular reasoning –assume-guarantee reasoning –scalability
61
AbstractConcrete The assume[ ](a) Operation a = ( (a) ) Formulas (a) ^ X (a)(a) ^ ( (a) ) ^ ^ assume[ ](a) X
62
Formulas AbstractConcrete The abstraction operation ( ) ^ ^ a1a1 a2a2
63
Assume-Guarantee Reasoning using AI T bar(); void foo() { T p;... p = bar();... } {pre bar, post bar } {pre foo, post foo } assume[pre foo ]; assert[pre bar ]; ----------- assume[post bar ]; assert[post foo ]; ^ Is (a) ? assert[ ](a) assume[ ](a) <⊤><⊤> ( ) ( (a) ⋀ ) ^ ^
64
Formulas AbstractConcrete Computing ( ) ^ ^ ans ⊤ a1a1
65
3-Valued Logical Structures Relation meaning over {0, 1, ½} Kleene – 1: True – 0: False – ½ : Unknown A join semi-lattice: 0 ⊔ 1 = ½ ½
66
Canonical Abstraction x u1u1 u2u2 u3u3 u4u4 c,r x x u1u1 u2u2 x ∃ v 1,v 2 :node u1 (v 1 ) ⋀ node u2 (v 2 ) ⋀∀ w: node u1 (w) ⋁ node u2 (w) ⋀ ∀ w 1,w 2 :node u1 (w 1 ) ⋀ node u1 (w 2 ) ⇒ (w 1 =w 2 ) ⋀⌝ n(w 1,w 2 ) ⋀∀ v:r x (v) ⇔∃ v1: x(v1) ⋀ n*(v1,v) ⋀∀ v:c(v) ⇔∃ v1:n(v,v1) ⋀ n*(v1,v) ⋀∀ v1,v2:x(v1) ⋀ x(v2) ⇒ v1=v2 ⋀ ∀ v,v1,v2:n(v,v1) ⋀ n(v,v2) ⇒ v1=v2 FO TC (a) ≜ ^
67
y == x->n Formulas Concrete ⊤ ans ≜ ∀ v 1 :y(v 1 ) ↔ ∃ v 2 : x(v 2 ) ⋀ n(v 2, v 1 ) Abstract x u1u1 u2u2 yy x u1u1 uyuy y x u1u1 u2u2 uyuy y x (()(() ^
68
Example - Materialization x u1u1 u2u2 yy x u1u1 u2u2 y y y(u 2 )=0 materialization u 2 u y, u 2 y(u y ) = 1, y(u 2 ) =0 u2u2 x u1u1 uyuy y y y y(u 2 )=1 x u1u1 u2u2 y y Is (a) satisfiable ? ^ y == x->n
69
Example – Refinement x u1u1 uyuy y u2u2 n(u y,u 2 ) = 0 u1u1 uyuy y u2u2 x n(u y,u 2 ) = 1 u1u1 uyuy y u2u2 x u1u1 uyuy y u2u2 x n(u y,u 2 ) = ½ ∀ concrete stores ∃ two pairs of nodes n(a 1, a 2 ) = 1 and n(b 1, b 2 ) = 0 ∀ concrete stores ∀ pair of nodes n(a 1, a 2 ) = 1 or n(a 1, a 2 ) = 0 y == x->n Is (a) satisfiable ? ^
70
Abstract Operations ( ) – best abstract value that represents What does it buy us ? assume[ ](a) = ( (a) ⋀ ) –assume-guarantee reasoning –pre- and post-conditions specified by logical formulas BT(t,a) = ( ( extend (a)) ⋀ t ) –best abstract transformer –parametric abstractions meet(a 1, a 2 ) = ( (a 1 ) ⋀ (a 2 ) ) ^ ^ ^ ^ ^ ^ ^ ^
71
SPASS Experience Handles arbitrary FO formulas Can diverge –use timeout Converges in our examples –Captures older shape analysis algorithms How to handle FO TC ? –Overapproximations lead to too many structures
72
Decidable Transitive-closure Logic Neil Immerman (UMASS), Alexander Rabinovich (TAU) ∃∀ (TC,f) is subset of FO TC –exist-forall form –arbitrary unary relations –single function f Decidable for satisfiability –NEXPTIME-complete Any “reasonable” extension is undecidable Rather limited
73
Simulation Technique – CAV’04 Neil Immerman (UMASS), Alexander Rabinovich (TAU) Simulate realistic data structures using decidable logic over tractable structures –Singly linked list - shared/cyclic/nested –Doubly linked list –Trees Preserved under mutations Abstract interpretation, Hoare-style verification
74
Further Work Implementation Decidable logic for shape analysis Assume-guarantee of “real” programs –case study: Java Collection (B. Livshits, Noam) –Estimate side-effects (A. Skidanov) –specification language –write procedure specifications Extend to other domains –Infinite-height Tune the abstraction based on specification
75
Summary A/G Approach can scale program analysis/verification But requires some effort –Language designers –Programmers –Abstract interpretation –Efficient runtime testing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.