Download presentation
Presentation is loading. Please wait.
1
Program Analysis via Graph Reachability
Thomas Reps University of Wisconsin Joint work with S. Horwitz, M. Sagiv, G. Rosay, D. Melski, M. Benedikt, and P. Godefroid
2
1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998
3
More Recently Flow-insensitive points-to analysis
An undecidability result context-sensitive structure-transmitted data-dependence analysis Model checking of recursive hierarchical finite-state machines “infinite”-state systems CFL-reachability/circularity queries linear-, quadratic-, and cubic-time algorithms
4
Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs
5
Backward slice with respect to statement “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”
6
Backward slice with respect to statement “printf(“%d\n”,i)”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”
7
Backward slice with respect to statement “printf(“%d\n”,i)”
Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”
8
Forward slice with respect to statement “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”
9
Forward slice with respect to statement “sum = 0”
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”
10
What Are Slices Useful For?
Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?
11
Control Flow Graph int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i
12
Flow Dependence Graph Flow dependence p q Value of variable
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i
13
Control Dependence Graph
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i
14
Program Dependence Graph (PDG)
int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i
15
Program Dependence Graph (PDG)
int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i
16
Backward Slice int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i
17
Backward Slice (2) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i
18
Backward Slice (3) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i
19
Backward Slice (4) int main() { int sum = 0; int i = 1;
while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i
20
Slice Extraction int main() { int i = 1; while (i < 11) {
i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i
21
CodeSurfer
25
Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”
26
Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”
27
Interprocedural Slice
int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]
28
System Dependence Graph (SDG)
Enter main Call p Call p Enter p
29
SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1
Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x
30
Interprocedural Backward Slice
Enter main Call p Call p Enter p
31
Interprocedural Backward Slice (2)
Enter main Call p Call p Enter p
32
Interprocedural Backward Slice (3)
Enter main Call p Call p Enter p
33
Interprocedural Backward Slice (4)
Enter main Call p Call p Enter p
34
Interprocedural Backward Slice (5)
Enter main Call p Call p Enter p
35
Interprocedural Backward Slice (6)
Enter main Call p Call p ) ( [ ] Enter p
36
Matched-Parenthesis Path
) ( ) [
37
Interprocedural Backward Slice (6)
Enter main Call p Call p Enter p
38
Interprocedural Backward Slice (7)
Enter main Call p Call p Enter p
39
Slice Extraction Enter main Call p Enter p
40
Slice of the Sum Program
Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x
41
CFL-Reachability [Yannakakis 90]
G: Graph L: A context-free language L-path from s to t iff Running time: O(N 3)
42
Ordinary Graph Reachability
( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s
43
CFL-Reachability via Dynamic Programming
Graph Grammar A B C B C A
44
Interprocedural Slicing via CFL-Reachability
Graph: System dependence graph L: L(matched) Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n
45
Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94]
CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)
46
Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs
47
Precise Intraprocedural Analysis
start n
48
( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b)
return from p return from p printf(y) printf(b) exit main exit p
49
Precise Interprocedural Analysis
ret start n ( ) [Sharir & Pnueli 81]
50
Representing Dataflow Functions
b c Identity Function a b c Constant Function
51
Representing Dataflow Functions
b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function
52
x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p
53
Composing Dataflow Functions
b c a b c a a b c
54
( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3
Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p
55
Off Limits! matched matched matched
| (i matched )i i CallSites | edge | stack ) ( stack Off Limits!
56
Off Limits! unbalLeft matched unbalLeft
| (i unbalLeft i CallSites | stack ) ( stack Off Limits! (
57
Interprocedural Dataflow Analysis via CFL-Reachability
Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from
58
Asymptotic Running Time [Reps, Horwitz, & Sagiv 95]
CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N, hence O(ED3) l O(ND3) “Gen/kill” problems: O(ED)
59
Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs
60
Exhaustive Versus Demand Analysis
Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest Demand analysis: Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold?
61
All “appropriate” demands
x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p
62
Experimental Results [Horwitz , Reps, & Sagiv 1995]
53 C programs (200-6,700 lines) For a single fact of interest: Demand algorithm always better than exhaustive algorithm All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .
63
Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs
64
Structure-Transmitted Dependences [Reps1995]
McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x
65
Dependences + Matched Paths?
Enter main x y hd hd-1 ( ) w=cons(x,y) Call p Call p w w Enter p w v = car(w)
66
Undecidable! [Reps, TOPLAS 00]
hd hd-1 ( ) Interleaved Parentheses!
67
Outline Interprocedural slicing Interprocedural dataflow analysis
Demand-driven analysis Linear cubic undecidable Model-checking of recursive HFSMs
68
Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)]
Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs T-reachability/circularity queries Recursive HFSMs Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity Single-entry/multi-exit [or multi-entry/single-exit] Deterministic, multi-entry/multi-exit
69
T-Cyclicity in Hierarchical Kripke Structures
SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k3|) rec: ? SN/SX SN/MX MN/SX MN/MX O(|k|) O(|k|) O(|k|) O(|k3|) O(|k2|) [lin rec] O(|k|) [det]
70
Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL* O(|k2|) [L2] bad ? bad
71
Recursive HFSMs: Data Complexity
SN/SX SN/MX MN/SX MN/MX LTL O(|k|) O(|k|) O(|k|) O(|k3|) O(|k2|) [lin rec] O(|k|) [det] CTL O(|k|) bad O(|k|) bad CTL* O(|k|) bad O(|k|) bad Not Dual Problems!
72
CFL-Reachability: Scope of Applicability
Static analysis Slicing, DFA, structure-transmitted dep., points-to analysis Formal-language theory CF-, 2DPDA-, 2NPDA-recognition Attribute-grammar analysis Verification Model-checking recursive HFSMs “Ping-pong” protocols [Dolev, Even, & Karp 83]
73
CFL-Reachability: Benefits
Algorithms Demand & exhaustive Complexity Linear-, quadratic-, cubic-time algorithms PTIME-completeness Variants that are undecidable Complementary to Equations Set constraints Types . . .
74
But . . . Model checking Dataflow analysis
Huge graphs (10100 reachable states) Reachability/circularity queries Represent implicitly (OBDDs) Dataflow analysis Large graphs e.g., Stmts Vars ( 1011) CFL-reachability queries [Reps,Horwitz,Sagiv 95] OBDDs blew up [Siff & Reps 95 (unpub.)] . . . yes, we tried the usual tricks . . .
75
Most Significant Contributions: 1987-2000
Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms All “appropriate” demands may beat exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 60,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.) CFL-reachability as unifying conceptual model [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88], . . . Identifies fundamental bottlenecks (e.g., cubic-time “barrier”)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.