Download presentation
Presentation is loading. Please wait.
Published byHilda Evans Modified over 9 years ago
1
Page 1 5/2/2007 Kestrel Technology LLC A Tutorial on Abstract Interpretation as the Theoretical Foundation of CodeHawk Arnaud Venet Kestrel Technology arnaud@kestreltechnology.com Please see Notes View for annotations.
2
Page 2 5/2/2007 Kestrel Technology LLC Static Analysis Code Static Analysis Tool Parameters List of defects Confirmed property violations Indeterminates (possible violations)
3
Page 3 5/2/2007 Kestrel Technology LLC Can we find all defects? Classic undecidability results in Computer Science apply (halting problem) We require soundness (no defects are missed) –A conservative approach is acceptable Abstract Interpretation is the enabling theory in CodeHawk –Sound –Tunably precise –Scalable –“Generatively general”
4
Page 4 5/2/2007 Kestrel Technology LLC Intuition SW Defects All lines in the code Abstract Interpretation
5
Page 5 5/2/2007 Kestrel Technology LLC Abstract Interpretation Computes an envelope of all data in the program Mathematical assurance Static analyzers based on Abstract interpretation are difficult to engineer KT’s expertise: building scalable and effective abstract interpreters
6
Page 6 5/2/2007 Kestrel Technology LLC Properties vs. defects An application might be free of programming language defects but not carry the desired property –resource issues (memory, execution time) –separation –range of output data Abstract Interpretation can address application-specific of properties as well as language defects
7
Page 7 5/2/2007 Kestrel Technology LLC Objectives Go over a detailed example –Understand how the technology works Achievements and challenges in the engineering of abstract interpreters –What it means to build an analyzer based on Abstract Interpretation
8
Page 8 5/2/2007 Kestrel Technology LLC Detailed example
9
Page 9 5/2/2007 Kestrel Technology LLC Buffer overflow for(i = 0; i < 10; i++) { if(message[i].kind == SHORT_CMD) allocate_space (channel, 1000); else allocate_space (channel, 2000); } Can we exceed the channel’s buffer capacity?
10
Page 10 5/2/2007 Kestrel Technology LLC Control Flow Graph i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop
11
Page 11 5/2/2007 Kestrel Technology LLC i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop i = 0 i < 10 ? i++ message[i].kind == SHORT_CMD ? allocate_space (channel, 2000) allocate_space (channel, 1000) stop Analytic model of the code i < 10 ? i = 0 i++ M = M + 2000 M = M + 1000 stop M = 0
12
Page 12 5/2/2007 Kestrel Technology LLC Analysis process intuition We mimic the execution of the program We collect all possible data values for all variables for all possible traces
13
Page 13 5/2/2007 Kestrel Technology LLC Analyzing the model i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
14
Page 14 5/2/2007 Kestrel Technology LLC Initially i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
15
Page 15 5/2/2007 Kestrel Technology LLC Loop initialization i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
16
Page 16 5/2/2007 Kestrel Technology LLC Loop entry i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
17
Page 17 5/2/2007 Kestrel Technology LLC Analyzing a branching (1) i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
18
Page 18 5/2/2007 Kestrel Technology LLC Analyzing a branching (2) i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M ?
19
Page 19 5/2/2007 Kestrel Technology LLC Accumulating all possible values i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
20
Page 20 5/2/2007 Kestrel Technology LLC Abstraction of point clouds We want the analysis to terminate in reasonable time We need a tractable representation of point clouds in arbitrary dimensions Abstract Interpretation offers a broad choice of such representations Example: convex polyhedra –Compute the convex hull of a point cloud
21
Page 21 5/2/2007 Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M
22
Page 22 5/2/2007 Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
23
Page 23 5/2/2007 Kestrel Technology LLC Iterating the loop analysis i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M
24
Page 24 5/2/2007 Kestrel Technology LLC Building the loop invariant i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
25
Page 25 5/2/2007 Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
26
Page 26 5/2/2007 Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M
27
Page 27 5/2/2007 Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
28
Page 28 5/2/2007 Kestrel Technology LLC Building the loop invariant i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M
29
Page 29 5/2/2007 Kestrel Technology LLC Keeping iterating… i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
30
Page 30 5/2/2007 Kestrel Technology LLC Passing to the limit We want this iterative process to end at some point We need to converge when analyzing loops After some iteration steps, we use a widening operation at loop entry to enforce convergence
31
Page 31 5/2/2007 Kestrel Technology LLC Widening Let a 1, a 2, …a n, … be a sequence of polyhedra, then the sequence –w 1 = a 1 –w n+1 = w n a n+1 is ultimately stationary The widening is a join operation i.e., a a b & b a b
32
Page 32 5/2/2007 Kestrel Technology LLC Widening for intervals [a, b] [c, d] = [if c < a then - else a, if b < d then + else b] Example: [10, 20] [11, 30] = [10, + ]
33
Page 33 5/2/2007 Kestrel Technology LLC Widening for polyhedra We eliminate the faces of the computed convex envelope that are not stable Convergence is reached in at most N steps where N is the number of faces of the polyhedron at loop entry
34
Page 34 5/2/2007 Kestrel Technology LLC Widening i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M
35
Page 35 5/2/2007 Kestrel Technology LLC After widening i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M
36
Page 36 5/2/2007 Kestrel Technology LLC Detecting convergence Abstract iteration sequence –F 1 = P (initial polyhedron) –F n+1 = F n if S(F n ) F n F n S(F n ) otherwise where S is the semantic transformer associated to the loop body Theorem: if there exists N such that F N+1 F N, then F n = F N for n > N.
37
Page 37 5/2/2007 Kestrel Technology LLC Convergence i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M The computation has converged
38
Page 38 5/2/2007 Kestrel Technology LLC We are not done yet… The analyzer has just proven that 1000 * i ≤ M ≤ 2000 * i But we have lost all information about the termination condition 0 ≤ i ≤ 10 Since we have obtained an envelope of all possible values of the variables, if we run the computation again we still get such an envelope The point is that this new envelope can be smaller This refinement step is called narrowing
39
Page 39 5/2/2007 Kestrel Technology LLC Refinement i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ M i i M 9
40
Page 40 5/2/2007 Kestrel Technology LLC Analyzing a branching i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i i M 9 i i M 9
41
Page 41 5/2/2007 Kestrel Technology LLC Convex hull i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M 9
42
Page 42 5/2/2007 Kestrel Technology LLC Back to loop entry i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M i M 10 1
43
Page 43 5/2/2007 Kestrel Technology LLC Narrowing i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ i M 10 M i
44
Page 44 5/2/2007 Kestrel Technology LLC Refined loop invariant i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ M i 10
45
Page 45 5/2/2007 Kestrel Technology LLC Invariant at loop exit i = 0 i < 10 ? M = M + 2000 M = M + 1000 M = 0 stop i++ M i 10 20,000 10,000 i 10
46
Page 46 5/2/2007 Kestrel Technology LLC [ ] Interpretation of the results 5,000 15,000 25,000 10,000 20,000 Space allocated in the buffer Buffer size: Certain buffer overflow Possible buffer overflow No buffer overflow
47
Page 47 5/2/2007 Kestrel Technology LLC Achievements and challenges
48
Page 48 5/2/2007 Kestrel Technology LLC Assured static analyzers for NASA C Global Surveyor: verified array bound compliance for NASA mission-critical software –Mars Exploration Rovers: 550K LOC –Deep Space 1: 280K LOC –Mars Path Finder: 140K LOC Pointer analysis: –International Space Station payload software (major bug found)
49
Page 49 5/2/2007 Kestrel Technology LLC What we observe No scalable, precise, and general-purpose abstract interpreter PolySpace: –Handles all kinds of runtime errors –Decent precision (<20% indeterminates) –Doesn’t scale (topped out at 40K LOC) Customization is the key –Specialized for a property or a class of applications –Manually crafted by experts
50
Page 50 5/2/2007 Kestrel Technology LLC CodeHawk™ Abstract Interpretation development platform/ static analyzer generator Automated generation of customized static analyzers –Leverage from pre-built analyzers –Directly tunable by the end-user
51
Page 51 5/2/2007 Kestrel Technology LLC Where CodeHawk stands Application: architecture environment input range usage protocol … Abstract Interpretation: application independent abstract model algorithms difficult to implement needs a lot of expertise CodeHawk
52
Page 52 5/2/2007 Kestrel Technology LLC Recent Applications Malware detection –Customized analyzers for specific kinds of malware –Naturally resistant to complex obfuscating transformations –Evaluated on DoD-supplied test case Library/Component analysis –Proof of absence of buffer overflow in OpenSSH’s dynamic buffer library Shared variables –Protection policies for shared variables –Evaluated on a code from a large DoD prime contractor
53
Page 53 5/2/2007 Kestrel Technology LLC Conclusions Promising and proven technology –key distinction for assurance: no false negatives –can verify application properties as well as detect defects –can be tailored for various domains (e.g., malware) Not a silver bullet..rather a bullet generator –each modeled domain offers leverage –required expertise for broad use must still be reduced
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.