Program Analysis via Graph Reachability

Slides:



Advertisements
Similar presentations
R O O T S Field-Sensitive Points-to-Analysis Eda GÜNGÖR
Advertisements

Overview Structural Testing Introduction – General Concepts
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Demand-driven Alias Analysis Implementation Based on Open64 Xiaomi An
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Building a Better Backtrace: Techniques for Postmortem Program Analysis Ben Liblit & Alex Aiken.
Program Slicing; Andreas Linder eXtreme Programming lab course 2004.
Graph Coverage for Design Elements 1.  Use of data abstraction and object oriented software has increased importance on modularity and reuse.  Therefore.
A Fixpoint Calculus for Local and Global Program Flows Swarat Chaudhuri, U.Penn (with Rajeev Alur and P. Madhusudan)
Program Analysis via Graph Reachability
1 Program Slicing Purvi Patel. 2 Contents Introduction What is program slicing? Principle of dependences Variants of program slicing Slicing classifications.
Introduction to Program Slicing Presenter: M. Amin Alipour Software Design Laboratory
Program Analysis via Graph Reachability Thomas Reps University of Wisconsin PLDI 00 Tutorial, Vancouver, B.C., June 18, 2000
Interprocedural Slicing using Dependence Graphs Susan Horwitz, Thomas Reps, and David Binkley University of Wisconsin-Madison.
1 Lecture 06 – Inter-procedural Analysis Eran Yahav.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Speeding Up Dataflow Analysis Using Flow- Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft.
12/06/20051 Software Configuration Management Configuration management (CM) is the process that controls the changes made to a system and manages the different.
Program Slicing for Refactoring Advanced SW Tools Seminar Jan 2005Yossi Peery.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Projects. Dataflow analysis Dataflow analysis: what is it? A common framework for expressing algorithms that compute information about a program Why.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Improving the Precision of Abstract Simulation using Demand-driven Analysis Olatunji Ruwase Suzanne Rivoire CS June 12, 2002.
Survey of program slicing techniques
Verifying Concurrent Message- Passing C Programs with Recursive Calls Sagar Chaki, Edmund Clarke, Nicholas Kidd, Thomas Reps, and Tayssir Touili.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Example x := read() v := a + b x := x + 1 w := x + 1 a := w v := a + b z := x + 1 t := a + b.
Precise Interprocedural Dataflow Analysis via Graph Reachibility
Introduction to Software Testing Chapter 2.4 Graph Coverage for Design Elements Paul Ammann & Jeff Offutt
Software (Program) Analysis. Automated Static Analysis Static analyzers are software tools for source text processing They parse the program text and.
Languages of nested trees Swarat Chaudhuri University of Pennsylvania (with Rajeev Alur and P. Madhusudan)
Regression Testing. 2  So far  Unit testing  System testing  Test coverage  All of these are about the first round of testing  Testing is performed.
Bug Localization with Machine Learning Techniques Wujie Zheng
1 Graph Coverage (4). Reading Assignment P. Ammann and J. Offutt “Introduction to Software Testing” ◦ Section
1 Program Slicing Amir Saeidi PhD Student UTRECHT UNIVERSITY.
Program Analysis and Verification
1 The System Dependence Graph and its use in Program Slicing.
Introduction to Software Testing (2nd edition) Chapter 7.4 Graph Coverage for Design Elements Paul Ammann & Jeff Offutt
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Pointer Analysis Survey. Rupesh Nasre. Aug 24, 2007.
Interprocedural Path Profiling David Melski Thomas Reps University of Wisconsin.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
1 Software Testing & Quality Assurance Lecture 13 Created by: Paulo Alencar Modified by: Frank Xu.
Phoenix Based Dynamic Slicing Debugging Tool Eric Cheng Lin Xu Matt Gruskin Ravi Ramaseshan Microsoft Phoenix Intern Team (Summer '06)
Paul Ammann & Jeff Offutt
Data Flow Analysis Suman Jana
Program Analysis and Verification
Control Flow Testing Handouts
Dataflow analysis.
The minimum cost flow problem
Program Analysis via Graph Reachability
PDAs Accept Context-Free Languages
Outline of the Chapter Basic Idea Outline of Control Flow Testing
How is a PDG Created? Control Flow Graph (CFG) PDG is union of:
Algorithm design and Analysis
Program Slicing Baishakhi Ray University of Virginia
Paul Ammann & Jeff Offutt
Paul Ammann & Jeff Offutt
University Of Virginia
Another example: constant prop
Program Analysis via Graph Reachability
Program Analysis via Graph Reachability
Demand-Driven Context-Sensitive Alias Analysis for Java
Deterministic PDAs - DPDAs
Slicing Java Programs that Throw and Catch Exceptions
Presentation transcript:

Program Analysis via Graph Reachability Thomas Reps University of Wisconsin Joint work with S. Horwitz, M. Sagiv, G. Rosay, D. Melski, M. Benedikt, and P. Godefroid

1987 Slicing & Applications CFL Reachability 1993 Dataflow Analysis Demand Algorithms 1994 Structure- Transmitted Dependences 1995 Set Constraints 1996 1997 1998

More Recently Flow-insensitive points-to analysis An undecidability result context-sensitive structure-transmitted data-dependence analysis Model checking of recursive hierarchical finite-state machines “infinite”-state systems CFL-reachability/circularity queries linear-, quadratic-, and cubic-time algorithms

Outline Interprocedural slicing Interprocedural dataflow analysis Demand-driven analysis Linear . . . cubic . . . undecidable Model-checking of recursive HFSMs

Backward slice with respect to statement “printf(“%d\n”,i)” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

Backward slice with respect to statement “printf(“%d\n”,i)” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

Backward slice with respect to statement “printf(“%d\n”,i)” Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Backward slice with respect to statement “printf(“%d\n”,i)”

Forward slice with respect to statement “sum = 0” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”

Forward slice with respect to statement “sum = 0” int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Forward slice with respect to statement “sum = 0”

What Are Slices Useful For? Understanding Programs What is affected by what? Restructuring Programs Isolation of separate “computational threads” Program Specialization and Reuse Slices = specialized programs Only reuse needed slices Program Differencing Compare slices to identify changes Testing What new test cases would improve coverage? What regression tests must be rerun after a change?

Control Flow Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter F sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T sum = sum + i i = i + i

Flow Dependence Graph Flow dependence p q Value of variable int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Flow dependence p q Value of variable assigned at p may be used at q. Enter sum = 0 i = 1 while(i < 11) printf(sum) printf(i) sum = sum + i i = i + i

Control Dependence Graph int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence q is reached from p if condition p is true (T), not otherwise. p q T Similar for false (F). p q F Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Program Dependence Graph (PDG) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Control dependence Flow dependence Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Program Dependence Graph (PDG) int main() { int i = 1; int sum = 0; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Opposite Order Same PDG Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (2) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (3) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Backward Slice (4) int main() { int sum = 0; int i = 1; while (i < 11) { sum = sum + i; i = i + 1; } printf(“%d\n”,sum); printf(“%d\n”,i); Enter T T T T T T sum = 0 i = 1 while(i < 11) printf(sum) printf(i) T T sum = sum + i i = i + i

Slice Extraction int main() { int i = 1; while (i < 11) { i = i + 1; } printf(“%d\n”,i); Enter T T T T i = 1 while(i < 11) printf(i) T i = i + i

CodeSurfer

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Backward slice with respect to statement “printf(“%d\n”,i)”

Interprocedural Slice int main() { int sum = 0; int i = 1; while (i < 11) { sum = add(sum,i); i = add(i,1); } printf(“%d\n”,sum); printf(“%d\n”,i); int add(int x, int y) { return x + y; } Superfluous components included by Weiser’s slicing algorithm [TSE 84] Left out by algorithm of Horwitz, Reps, & Binkley [PLDI 88; TOPLAS 90]

System Dependence Graph (SDG) Enter main Call p Call p Enter p

SDG for the Sum Program xin = sum yin = i sum = xout xin = i yin= 1 Enter main sum = 0 i = 1 while(i < 11) printf(sum) printf(i) Call add Call add xin = sum yin = i sum = xout xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

Interprocedural Backward Slice Enter main Call p Call p Enter p

Interprocedural Backward Slice (2) Enter main Call p Call p Enter p

Interprocedural Backward Slice (3) Enter main Call p Call p Enter p

Interprocedural Backward Slice (4) Enter main Call p Call p Enter p

Interprocedural Backward Slice (5) Enter main Call p Call p Enter p

Interprocedural Backward Slice (6) Enter main Call p Call p ) ( [ ] Enter p

Matched-Parenthesis Path ) ( ) [

Interprocedural Backward Slice (6) Enter main Call p Call p Enter p

Interprocedural Backward Slice (7) Enter main Call p Call p Enter p

Slice Extraction Enter main Call p Enter p

Slice of the Sum Program Enter main i = 1 while(i < 11) printf(i) Call add xin = i yin= 1 i = xout Enter add x = xin y = yin x = x + y xout = x

CFL-Reachability [Yannakakis 90] G: Graph L: A context-free language L-path from s to t iff Running time: O(N 3)

Ordinary Graph Reachability ( e [ ] ) matched | e | [ matched ] | ( matched ) | matched matched CFL-Reachability ( t ) e [ ] e e [ e ] [ ] e e s t Ordinary Graph Reachability s t s t s

CFL-Reachability via Dynamic Programming Graph Grammar A B C B C A

Interprocedural Slicing via CFL-Reachability Graph: System dependence graph L: L(matched) Node m is in the slice w.r.t. n iff there is an L(matched)-path from m to n

Asymptotic Running Time [Reps, Horwitz, Sagiv, & Rosay 94] CFL-reachability System dependence graph: N nodes, E edges Running time: O(N 3) System dependence graph Special structure Running time: O(E + CallSites % MaxParams3)

Outline Interprocedural slicing Interprocedural dataflow analysis Demand-driven analysis Linear . . . cubic . . . undecidable Model-checking of recursive HFSMs

Precise Intraprocedural Analysis start n

( ) ] ( start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Precise Interprocedural Analysis ret start n ( ) [Sharir & Pnueli 81]

Representing Dataflow Functions b c Identity Function a b c Constant Function

Representing Dataflow Functions b c “Gen/Kill” Function a b c Non-“Gen/Kill” Function

x y a b start p(a,b) start main if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Composing Dataflow Functions b c a b c a a b c

( ) ( ] YES! NO! x y start p(a,b) a b start main if . . . x = 3 Might y be uninitialized here? Might b be uninitialized here? b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) exit main exit p

Off Limits! matched  matched matched | (i matched )i 1  i  CallSites | edge |  stack ) ( stack Off Limits!

Off Limits! unbalLeft  matched unbalLeft | (i unbalLeft 1  i  CallSites |  stack ) ( stack Off Limits! (

Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L: L(unbalLeft) Fact d holds at n iff there is an L(unbalLeft)-path from

Asymptotic Running Time [Reps, Horwitz, & Sagiv 95] CFL-reachability Exploded control-flow graph: ND nodes Running time: O(N3D3) Exploded control-flow graph Special structure Running time: O(ED3) Typically: E l N, hence O(ED3) l O(ND3) “Gen/kill” problems: O(ED)

Outline Interprocedural slicing Interprocedural dataflow analysis Demand-driven analysis Linear . . . cubic . . . undecidable Model-checking of recursive HFSMs

Exhaustive Versus Demand Analysis Exhaustive analysis: All facts at all points Optimization: Concentrate on inner loops Program-understanding tools: Only some facts are of interest Demand analysis: Does a given fact hold at a given point? Which facts hold at a given point? At which points does a given fact hold?

All “appropriate” demands x y a b YES! ( ) start p(a,b) “Semi-exhaustive”: All “appropriate” demands start main Might b be uninitialized here? Might y be uninitialized here? if . . . x = 3 b = a p(x,y) p(a,b) return from p return from p printf(y) printf(b) NO! exit main exit p

Experimental Results [Horwitz , Reps, & Sagiv 1995] 53 C programs (200-6,700 lines) For a single fact of interest: Demand algorithm always better than exhaustive algorithm All “appropriate” demands beats exhaustive when percentage of “yes” answers is high Live variables Truly live variables Constant predicates . . .

Outline Interprocedural slicing Interprocedural dataflow analysis Demand-driven analysis Linear . . . cubic . . . undecidable Model-checking of recursive HFSMs

Structure-Transmitted Dependences [Reps1995] McCarthy’s equations: car(cons(x,y)) = x cdr(cons(x,y)) = y w = cons(x,y); v = car(w); v w y x

Dependences + Matched Paths? Enter main x y hd hd-1 ( ) w=cons(x,y) Call p Call p w w Enter p w v = car(w)

Undecidable! [Reps, TOPLAS 00] hd hd-1 ( ) Interleaved Parentheses!

Outline Interprocedural slicing Interprocedural dataflow analysis Demand-driven analysis Linear . . . cubic . . . undecidable Model-checking of recursive HFSMs

Model-Checking of Recursive HFSMs [Benedikt, Godefroid, & Reps (in prep.)] Non-recursive HFSMs [Alur & Yannakakis 98] Ordinary FSMs T-reachability/circularity queries Recursive HFSMs Matched-parenthesis T-reachability/circularity Key observation: Linear-time algorithms for matched-parenthesis T-reachability/cyclicity Single-entry/multi-exit [or multi-entry/single-exit] Deterministic, multi-entry/multi-exit

T-Cyclicity in Hierarchical Kripke Structures SN/SX SN/MX MN/SX MN/MX non-rec: O(|k|) non-rec: O(|k|) ? ? rec: O(|k3|) rec: ? SN/SX SN/MX MN/SX MN/MX O(|k|) O(|k|) O(|k|) O(|k3|) O(|k2|) [lin rec] O(|k|) [det]

Recursive HFSMs: Data Complexity SN/SX SN/MX MN/SX MN/MX LTL non-rec: O(|k|) non-rec: O(|k|) ? ? rec: P-time rec: ? CTL O(|k|) bad ? bad CTL* O(|k2|) [L2] bad ? bad

Recursive HFSMs: Data Complexity SN/SX SN/MX MN/SX MN/MX LTL O(|k|) O(|k|) O(|k|) O(|k3|) O(|k2|) [lin rec] O(|k|) [det] CTL O(|k|) bad O(|k|) bad CTL* O(|k|) bad O(|k|) bad Not Dual Problems!

CFL-Reachability: Scope of Applicability Static analysis Slicing, DFA, structure-transmitted dep., points-to analysis Formal-language theory CF-, 2DPDA-, 2NPDA-recognition Attribute-grammar analysis Verification Model-checking recursive HFSMs “Ping-pong” protocols [Dolev, Even, & Karp 83]

CFL-Reachability: Benefits Algorithms Demand & exhaustive Complexity Linear-, quadratic-, cubic-time algorithms PTIME-completeness Variants that are undecidable Complementary to Equations Set constraints Types . . .

But . . .  Model checking Dataflow analysis Huge graphs (10100 reachable states) Reachability/circularity queries Represent implicitly (OBDDs) Dataflow analysis Large graphs e.g., Stmts Vars ( 1011) CFL-reachability queries [Reps,Horwitz,Sagiv 95] OBDDs blew up [Siff & Reps 95 (unpub.)] . . . yes, we tried the usual tricks . . .

Most Significant Contributions: 1987-2000 Asymptotically fastest algorithms Interprocedural slicing Interprocedural dataflow analysis Demand algorithms All “appropriate” demands may beat exhaustive Tool for slicing and browsing ANSI C Slices programs as large as 60,000 lines University research distribution Commercial product: CodeSurfer (GrammaTech, Inc.) CFL-reachability as unifying conceptual model [Kou 77], [Holley&Rosen 81], [Cooper&Kennedy 88], [Callahan 88], [Horwitz,Reps,&Binkley 88], . . . Identifies fundamental bottlenecks (e.g., cubic-time “barrier”)