Program Analysis via Graph Reachability

Slides:



Advertisements
Similar presentations
Overview Structural Testing Introduction – General Concepts
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
 Program Slicing Long Li. Program Slicing ? It is an important way to help developers and maintainers to understand and analyze the structure.
A Fixpoint Calculus for Local and Global Program Flows Swarat Chaudhuri, U.Penn (with Rajeev Alur and P. Madhusudan)
Program Analysis via Graph Reachability
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Introduction to Program Slicing Presenter: M. Amin Alipour Software Design Laboratory
Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000
Interprocedural Slicing using Dependence Graphs Susan Horwitz, Thomas Reps, and David Binkley University of Wisconsin-Madison.
1 Lecture 06 – Inter-procedural Analysis Eran Yahav.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
12/06/20051 Software Configuration Management Configuration management (CM) is the process that controls the changes made to a system and manages the different.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Loops Guo, Yao.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Overview of program analysis Mooly Sagiv html://
Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.
Survey of program slicing techniques
From last class. The above is Click’s solution (PLDI 95)
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Precise Interprocedural Dataflow Analysis via Graph Reachibility
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
Formal Methods Program Slicing & Dataflow Analysis February 2015.
Regression Testing. 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed.
Bug Localization with Machine Learning Techniques Wujie Zheng
1 Graph Coverage (4). Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Section
1 Program Slicing Amir Saeidi PhD Student UTRECHT UNIVERSITY.
Program Analysis and Verification
1 The System Dependence Graph and its use in Program Slicing.
Introduction to Software Testing (2nd edition) Chapter 7.4 Graph Coverage for Design Elements Paul Ammann & Jeff Offutt
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Interprocedural Path Profiling David Melski Thomas Reps University of Wisconsin.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Reachability Analysis for Callbacks 北京大学 唐浩
1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.
Computational Divided Differencing Thomas Reps University of Wisconsin Joint work with Louis B. Rall.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Data Flow Analysis Suman Jana
Program Analysis and Verification
Dataflow analysis.
Harry Xu University of California, Irvine & Microsoft Research
Program Analysis via Graph Reachability
Interprocedural Analysis Chapter 19
SwE 455 Program Slicing.
White-Box Testing.
Amir Kamil and Katherine Yelick
Software Construction and Evolution - CSSE 375 Program Understanding
How is a PDG Created? Control Flow Graph (CFG) PDG is union of:
Topic 10: Dataflow Analysis
Program Slicing Baishakhi Ray University of Virginia
University Of Virginia
White-Box Testing.
1. Reaching Definitions Definition d of variable v: a statement d that assigns a value to v. Use of variable v: reference to value of v in an expression.
Program Analysis via Graph Reachability
Data Flow Analysis Compiler Design
Slicing Java Programs that Throw and Catch Exceptions
Program Analysis via Graph Reachability
Amir Kamil and Katherine Yelick
Presentation transcript:

Program Analysis via Graph Reachability Thomas Reps University of Wisconsin http://www.cs.wisc.edu/~reps/ See http://www.cs.wisc.edu/wpis/papers/tr1386.ps

PLDI 00 Registration Form Tutorial (morning): …………… $ ____ Tutorial (afternoon): ………….. $ ____ Tutorial (evening): ……………. $ – 0 –

1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

Applications Program optimization Software engineering Program understanding Reengineering Static bug-finding Security (information flow)

Collaborators Susan Horwitz Mooly Sagiv Genevieve Rosay David Melski David Binkley Michael Benedikt Patrice Godefroid

Themes Harnessing CFL-reachability Exhaustive alg.  Demand alg.

Backward slice with respect to “printf(“%d\n”,i)” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Backward slice with respect to “printf(“%d\n”,i)” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Backward slice with respect to “printf(“%d\n”,i)” Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Backward slice with respect to “printf(“%d\n”,i)”

Forward slice with respect to “sum = 0” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to “sum = 0”

Forward slice with respect to “sum = 0” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to “sum = 0”

What Are Slices Useful For? Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?

Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Character-Count Program void char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Line-Character-Count Program void line_char_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line(FILE *f, BOOL *bptr, int *iptr); scan_line(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Line-Count Program void line_count(FILE *f) { int lines = 0; int chars; BOOL eof_flag = FALSE; int n; extern void scan_line2(FILE *f, BOOL *bptr, int *iptr); scan_line2(f, &eof_flag, &n); chars = n; while(eof_flag == FALSE){ lines = lines + 1; chars = chars + n; } printf(“lines = %d\n”, lines); printf(“chars = %d\n”, chars);

Specialization Via Slicing wc -lc wc -c wc -l

Control Flow Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i

Control Dependence Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Flow Dependence Graph Flow dependence p q Value of variable int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i

Program Dependence Graph (PDG) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Program Dependence Graph (PDG) int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (2) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (3) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (4) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i

CodeSurfer

Browsing a Dependence Graph Pretend this is your favorite browser What does clicking on a link do? You get a new page Or you move to an internal tag

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

System Dependence Graph (SDG) Enter main Call p Call p Enter p

SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1 Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

Interprocedural Backward Slice Enter main Call p Call p Enter p

Interprocedural Backward Slice (2) Enter main Call p Call p Enter p

Interprocedural Backward Slice (3) Enter main Call p Call p Enter p

Interprocedural Backward Slice (4) Enter main Call p Call p Enter p

Interprocedural Backward Slice (5) Enter main Call p Call p Enter p

Interprocedural Backward Slice (6) Enter main Call p Call p ) ( [ ] Enter p

Matched-Parenthesis Path ) ( ) [

Interprocedural Backward Slice (6) Enter main Call p Call p Enter p

Interprocedural Backward Slice (7) Enter main Call p Call p Enter p

Slice Extraction Enter main Call p Enter p

Slice of the Sum Program Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

CFL-Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A context-free language L-path from s to t iff Running time: O(N 3)

Interprocedural Slicing via CFL-Reachability Graph: System dependence graph L: L(matched) [roughly] Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94] CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)

Ordinary Graph Reachability ( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s

Regular-Language Reachability [Yannakakis 90] G: Graph (N nodes, E edges) L: A regular language L-path from s to t iff Running time: O(N+E) Ordinary reachability (= transitive closure) Label each edge with e L is e* vs. O(N3)

CFL-Reachability via Dynamic Programming Graph Grammar A  B C B C A

Degenerate Case: CFL-Recognition exp  id | exp + exp | exp * exp | ( exp )  “(a + b) * c”  L(exp) ? ) ( a c b + * s t

Degenerate Case: CFL-Recognition exp  id | exp + exp | exp * exp | ( exp ) “a + b) * c +”  L(exp) ? * a + ) b c s t

Program Chopping Given source S and target T, what program points transmit effects from S to T? S T Intersect forward slice from S with backward slice from T, right?

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Forward slice with respect to “sum = 0”  Backward slice with respect to “printf(“%d\n”,i)”

Non-Transitivity and Slicing int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; }  Chop with respect to “sum = 0” and “printf(“%d\n”,i)”

Non-Transitivity and Slicing Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout ( ] Enter add x = xin y = yin x = x + y xout = x

“Precise interprocedural chopping” Program Chopping Given source S and target T, what program points transmit effects from S to T? S T “Precise interprocedural chopping” [Reps & Rosay FSE 95]

Dataflow Analysis Goal: For each point in the program, determine a superset of the “facts” that could possibly hold during execution Examples Constant propagation Reaching definitions Live variables Possibly uninitialized variables

Possibly Uninitialized Variables {} Start {w,x,y} x = 3 {w,y} if . . . {w,y} y = x {w,y} y = w {w} w = 8 {w,y} {} printf(y) {w,y}

Precise Intraprocedural Analysis start n pfp = fk  fk-1  …  f2  f1 MOP[n] =  pfp(C) pPathsTo[n]

( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Precise Interprocedural Analysis ret start n ( ) MOMP[n] =  pfp(C) pMatchedPathsTo[n] [Sharir & Pnueli 81]

Representing Dataflow Functions b c Identity Function a b c Constant Function

Representing Dataflow Functions b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function

x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Composing Dataflow Functions b c a b c a f2  f1({a,c}) = a b c

( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3 Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Off Limits! matched  matched matched | (i matched )i 1  i  CallSites | edge |  stack ) ( stack Off Limits!

Off Limits! unbalLeft  matched unbalLeft | (i unbalLeft 1  i  CallSites |  stack ) ( stack Off Limits! (

Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

Asymptotic Running Time [Reps, Horwitz, & Sagiv 95] CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N, hence O(ED3) l O(ND3) “Gen/kill” problems: O(ED)

Why Bother? “We’re only interested in million-line programs” Know thy enemy! “Any” algorithm must do these operations Avoid pitfalls (e.g., claiming O(N2) algorithm) The essence of “context sensitivity” Special cases “Gen/kill” problems: O(ED) Compression techniques Basic blocks SSA form, sparse evaluation graphs Demand algorithms

Unifying Conceptual Model for Dataflow-Analysis Literature Linear-time gen-kill [Hecht 76], [Kou 77] Path-constrained DFA [Holley & Rosen 81] Linear-time GMOD [Cooper & Kennedy 88] Flow-sensitive MOD [Callahan 88] Linear-time interprocedural gen-kill [Knoop & Steffen 93] Linear-time bidirectional gen-kill [Dhamdhere 94] Relationship to interprocedural DFA [Sharir & Pneuli 81], [Knoop & Steffen 92]

Themes Harnessing CFL-reachability Exhaustive alg.  Demand alg.

Exhaustive Versus Demand Analysis Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest

Exhaustive Versus Demand Analysis Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold? Demand analysis via CFL-reachability single-source/single-target CFL-reachability single-source/multi-target CFL-reachability multi-source/single-target CFL-reachability

All “appropriate” demands x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p

Experimental Results [Horwitz , Reps, & Sagiv 1995] 53 C programs (200-6,700 lines) For a single fact of interest: demand always better than exhaustive All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .

A Related Result [Sagiv, Reps, & Horwitz 1996] [Uses a generalized analysis technique] 38 C programs (300-6,000 lines) copy-constant propagation linear-constant propagation All “appropriate” demands always beats exhaustive factor of 1.14 to about 6

Exhaustive Versus Demand Analysis Demand algorithms for Interprocedural dataflow analysis Set constraints Points-to analysis

Most Significant Contributions: 1987-2000 Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms Interprocedural dataflow analysis [CC94,FSE95] All “appropriate” demands beats exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 75,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.)

References Papers by Reps and collaborators: CFL-reachability http://www.cs.wisc.edu/~reps/ CFL-reachability Yannakakis, M., Graph-theoretic methods in database theory, PODS 90. Reps, T., Program analysis via graph reachability, Inf. and Softw. Tech. 98.

References Slicing, chopping, etc. Dataflow analysis Horwitz, Reps, & Binkley, TOPLAS 90 Reps, Horwitz, Sagiv, & Rosay, FSE 94 Reps & Rosay, FSE 95 Dataflow analysis Reps, Horwitz, & Sagiv, POPL 95 Horwitz, Reps, & Sagiv, FSE 95, TR-1283