Spring 2016 Program Analysis and Verification Lecture 18: Interprocedural Analysis Roman Manevich Ben-Gurion University
Syllabus Program Verification Program Analysis Basics Operational semantics Hoare Logic Applying Hoare Logic Weakest Precondition Calculus Proving Termination Data structures Automated Verification Program Analysis Basics From Hoare Logic to Static Analysis Control Flow Graphs Equation Systems Collecting Semantics Using Soot Abstract Interpretation fundamentals Lattices Fixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Analysis Techniques Numerical Domains Pointer analysis Shape Analysis Interprocedural Analysis
Previously Shape analysis TVLA
Today Handling procedures
Analyzing Procedures
Interprocedural Analysis Until now assumed no procedures Intraprocedural analysis: analyzing one procedure at a time x = foo(…) interpreted by forgetting everything about x Very conservative analysis Does not handle global variables or heap Interprocedural analysis: uses calling relationships among procedures Enables more precise analysis information
Interprocedural analysis foo() bar() Call bar() The effect of calling a procedure is the effect of executing its body – a big step
Interprocedural analysis foo() bar() Call bar() Goal: compute the abstract effect of calling a procedure
Interprocedural analysis challenges Stack can grow without a bound Matching of call/return
Reduction to intraprocedural analysis Procedure inlining Naive solution: call-as-goto
Solution attempt #1 Inline callees into callers Good Bad End up with one big procedure CFGs of individual procedures = duplicated many times Good Reduce to intraprocedural analysis: can use existing analyses as is Precise: distinguishes different calls to the same function Bad Exponential blow-up of code size Not efficient: re-analyze procedure bodies many times Doesn’t work with recursion
Inlining example void main() { int x; x = p(7); x = p(9); } int p(int a) { return a + 1; }
Inlining example void main() { int x; int ret_p; { int a = 7; ret_p = a + 1; } x = ret_p; { int a = 9; ret_p = a + 1; } x = ret_p; } int p(int a) { return a + 1; } {x =8} {x =10}
Inlining: exponential blowup main() { f1(); } f1() { f2(); … fn() { ... }
Solution attempt #2 Build a “supergraph” = inter-procedural CFG Replace each call from P to Q with An edge from point before the call (call point) to Q’s entry point An edge from Q’s exit point to the point after the call (return pt) Add assignments of actual arguments to formal arguments, and assignment of return value Good: efficient Graph of each function included exactly once in the supergraph Works for recursive functions (although local variables need additional treatment) Bad: imprecise, “context-insensitive” The “unrealizable paths problem”: dataflow facts can propagate along infeasible control paths
Method calls in detail x=1; y=2; z=0; L1: z = foo(x, y, z) assert x==1 && y==2 && z==3 foo(a, b, c) { int i = a+b; return i; }
Method calls in detail entry node call node Call context x=1; y=2; z=0; L1: z = foo(x, y, z) L1 foo(a, b, c) int i = a+b; foo_ret = i; Call context assert x==1 && y==2 && z==3 L1 return; exit node return node
Simple example: CP int p(int a) { void main() { return a + 1; int x ; } void main() { int x ; x = p(7); x = p(9) ; }
Simple example: CP int p(int a) { void main() { [a 7] int x ; return a + 1; } void main() { int x ; x = p(7); x = p(9) ; }
Simple example: CP Special returned-value variable int p(int a) { return a + 1; [a 7, $$ 8] } void main() { int x ; x = p(7); x = p(9) ; }
Simple example: CP int p(int a) { void main() { [a 7] int x ; return a + 1; [a 7, $$ 8] } void main() { int x ; x = p(7); [x 8] x = p(9) ; }
Simple example: CP int p(int a) { void main() { [a 7] int x ; return a + 1; [a 7, $$ 8] } void main() { int x ; x = p(7); [x 8] x = p(9) ; }
Simple example: CP int p(int a) { void main() { [a ] int x ; return a + 1; [a 7, $$ 8] } void main() { int x ; x = p(7); [x 8] x = p(9) ; }
Simple example: CP int p(int a) { void main() { [a ] int x ; return a + 1; [a , $$ ] } void main() { int x ; x = p(7); [x 8] x = p(9); }
Simple example: CP int p(int a) { void main() { [a ] int x ; return a + 1; [a , $$ ] } void main() { int x ; x = p(7) ; [x ] x = p(9) ; }
A naive interprocedural solution Treat procedure calls as gotos Pros: Simple Usually fast Cons: Abstracts away call/return correlations: context-insensitive Obtain a conservative solution
why was the naive solution less precise? Analysis by reduction Procedure inlining Call-as-goto void main() { int x ; x = p(7) ; [x ] x = p(9) ; } int p(int a) { [a ] return a + 1; [a , $$ ] } void main() { int a, x, ret; a = 7; ret = a+1; x = ret; a = 9; ret = a+1; x = ret; } [a ⊥, x ⊥, ret ⊥] [a 7, x 8, ret 8] [a 9, x 10, ret 10] why was the naive solution less precise?
Stack regime R P P() { … R(); } Q() { … R(); } R(){ … } call call return return R P
Guiding light Exploit stack regime Precision Efficiency Main idea / guiding light
Simplifying assumptions Parameter passed by value No procedure nesting No concurrency Recursion is supported
Unrealizable paths zoo() foo() bar() Call bar() Call bar()
IVP: Interprocedural Valid Paths f1 callq ret f2 fk-1 fk f3 ( ) fk-2 enterq exitq f4 fk-3 f5 IVP: all paths with matching calls and returns And prefixes
Valid paths zoo() foo() bar() (1 (2 Call bar() Call bar() )2 )1
Interprocedural valid paths IVP set of paths Start at program entry Only considers matching calls and returns aka, valid Can be defined via context-free grammar matched ::= matched (i matched )i | ε valid ::= valid (i matched | matched paths can be defined by a regular expression
Sharir and Pnueli ‘82 Call String approach Functional approach Blend interprocedural flow with intra procedural flow Tag every dataflow fact with call history Functional approach Determine the effect of a procedure E.g., in/out map Construct abstract transformers for entire procedure On-the-fly Functional approach very similar to Cousot & Cousot
The call string approach Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks A flat abstract domain (different call strings incomparable) Abstract transformers C: call f()# = ? return from C: call f()# = ?
Call strings abstract domain Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks IPA = (CtxStrDA, , , , , ) (c1c2c3, d1) (c1c2c3, d2) = ?
Call strings abstract domain Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks IPA = (CtxStrDA, , , , , ) (c1c2c3, d1) (c1c2c3, d2) = (c1c2c3, d1A d2)
Call strings abstract domain Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks IPA = (CtxStrDA, , , , , ) (c1c2c3, d1) (c1c2c3, d2) = (c1c2c3, d1A d2) Obeys ascending chain condition? Use Chaotic iterations over the supergraph
Extending analysis with call strings Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1) (c1c2c3, d2) = ?
Extending analysis with call strings Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1) (c1c2c3, d2) = (c1c2c3, d1A d2)
Extending analysis with call strings Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1) (c1c2c3, d2) = (c1c2c3, d1A d2) (c1c2c3, d1) (c1c2c4, d2) = ?
Extending analysis with call strings Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1) (c1c2c3, d2) = (c1c2c3, d1A d2) (c1c2c3, d1) (c1c2c4, d2) = {(c1c2c3, d1), (c1c2c4, d2)}
Extending analysis with call strings Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) Use Chaotic iterations over the supergraph Obeys ascending chain condition?
Supergraph Q P R fc2e f1 fc2e f1 f1 f1 f2 f3 f1 fx2r f5 f3 fx2r f6 What happens here? Q P R Entry node sP sQ sR Call node fc2e f1 Call node fc2e f1 f1 f1 n4 n6 n1 n2 Call R Call R f2 f3 f1 n5 n3 n7 Return node Return node fx2r f5 f3 fx2r f6 eP eR eQ What happens here? Exit node
Simple call-string example int p(int a) { return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }
Simple call-string example int p(int a) { c1: [a7] return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }
Simple call-string example int p(int a) { c1: [a7] return a + 1; c1: [a7, $$8] } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }
Simple call-string example int p(int a) { c1: [a7] return a + 1; c1: [a7, $$8] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; }
Simple call-string example int p(int a) { c1: [a7] return a + 1; c1: [a7, $$8] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; }
Simple call-string example int p(int a) { c1: [a7] c2: [a9] return a + 1; c1: [a7, $$8] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; }
Simple call-string example int p(int a) { c1: [a7] c2: [a9] return a + 1; c1: [a7, $$8] c2: [a9, $$10] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; }
Simple call-string example int p(int a) { c1: [a7] c2: [a9] return a + 1; c1: [a7, $$8] c2: [a9, $$10] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; : [x10] }
Enforcing termination Adding arbitrary call strings violates ACC Analysis will not terminate for recursive procedures Too heavy even without recursion – exponential blow-up in number of possible call strings What can we do?
Abstracting call strings CtxStr = Ctx* Can abstract CtxStr into a finite domain in different ways Most popular: k-limited suffix abstraction (keep only last k contexts) [k](c1…cn cn+1…cn+k) = cn+1…cn+k What we will talk about in this class Set abstraction: store all contexts in a set set(c1, c2, c1, c2, c3) = {c1, c2, c3} Forgets order of calls Combinations More generally – abstract using finite automata/regular expressions
Another example (|cs|=2) void main() { int x ; c1: x = p(7); : [x 16] c2: x = p(9) ; : [x 20] } int p(int a) { c1:[a 7] c2:[a 9] return c3: p1(a + 1); c1:[a 7, $$ 16] c2:[a 9, $$ 20] } int p1(int b) { c1.c3:[b 8] c2.c3:[b 10] return 2 * b; c1.c3:[b 8,$$16] c2.c3:[b 10,$$20] }
Another example (|cs|=1) void main() { int x ; c1: x = p(7); : [x ] c2: x = p(9) ; } int p(int a) { c1:[a 7] c2:[a 9] return c3: p1(a + 1); c1:[a 7, $$ ] c2:[a 9, $$ ] } int p1(int b) { (c1|c2)c3:[b ] return 2 * b; (c1|c2)c3:[b , $$] }
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] if (…) { a = a -1 ; c1: [a 6] c2: p (a); a = a + 1; } x = -2*a + 5;
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] c1.c2: [a 6] if (…) { c1: [a 7] c1.c2: [a 6] a = a -1 ; c1: [a 6] c1.c2: [a 5] c2: p (a); a = a + 1; } x = -2*a + 5;
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] c1.c2: [a 6] c2.c2: [a 5] if (…) { c1: [a 7] c1.c2: [a 6] c2.c2: [a 5] a = a -1 ; c1: [a 6] c1.c2: [a 5] c2.c2: [a 4] c2: p (a); a = a + 1; } x = -2*a + 5;
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] c1.c2: [a 6] c2.c2: [a 5] c2.c2: [a 4] if (…) { c1: [a 7] c1.c2: [a 6] c2.c2: [a 5] a = a -1 ; c1: [a 6] c1.c2: [a 5] c2.c2: [a 4] c2: p (a); a = a + 1; } x = -2*a + 5;
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] c1.c2: [a 6] c2.c2: [a ] if (…) { c1: [a 7] c1.c2: [a 6] c2.c2: [a 5] a = a -1 ; c1: [a 6] c1.c2: [a 5] c2.c2: [a 4] c2: p (a); a = a + 1; } x = -2*a + 5;
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] c1.c2: [a 6] c2.c2: [a ] if (…) { c1: [a 7] c1.c2: [a 6] c2.c2: [a ] a = a -1 ; c1: [a 6] c1.c2: [a 5] c2.c2: [a ] c2: p (a); a = a + 1; } x = -2*a + 5;
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] c1.c2: [a 6] c2.c2: [a ] if (…) { c1: [a 7] c1.c2: [a 6] c2.c2: [a ] a = a -1 ; c1: [a 6] c1.c2: [a 5] c2.c2: [a ] c2: p (a); a = a + 1; } x = -2*a + 5; c1: [a7, x9] c1.c2: [a6, x7] c2.c2: [a , x7]
Handling recursion (|cs|=2) void main() { c1: p(7); : [x ] } int p(int a) { c1: [a 7] c1.c2: [a 6] c2.c2: [a ] if (…) { c1: [a 7] c1.c2: [a 6] c2.c2: [a ] a = a -1 ; c1: [a 6] c1.c2: [a 5] c2.c2: [a ] c2: p (a); a = a + 1; c1: [a 7] c1.c2: [a 6] c2.c2: [a ] } x = -2*a + 5; c1: [a7, x9] c1.c2: [a6, x7] c2.c2: [a , x7] What should the value of a be here?
Summary: call string Associate abstract state with each call string Apply chaotic iterations to supergraph To guarantee termination apply abstraction to call string (e.g., suffix 2) Represents tails of calls Exponential in call-string length A kind of abstract inline Simple Often loses precision under recursion Although can still be precise in some cases
Functional approach The meaning of a procedure is mapping from states into states The abstract meaning of a procedure is function from abstract state to abstract states Procedure summary Usually implemented as a table and constructed on-the-fly
Interprocedural shape analysis for cutpoint-free programs Noam Rinetzky Tel Aviv University Mooly Sagiv Tel Aviv University Eran Yahav Technion
How to handle procedures? Pure functions Procedure input/output relation No side-effects p ret 1 2 3 .. … main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x is even } int inc(int p) { return 2 + p - 1; }
How to handle procedures? Pure functions Procedure input/output relation No side-effects p ret Even Odd Tabulation main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x is even } int inc(int p) { return 2 + p - 1; } w x y z E O E O E
What about global variables? p g ret g’ 1 … Procedures have side-effects Easy fix p g ret g’ Even E/O Odd int g = 0; g = p; int g = 0; main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x+g is even } int inc(int p) { g = p; return 2 + p - 1; }
But what about pointers and heap? Aliasing Destructive update Heap Global resource Anonymous objects x.n.n ~ y n n append(y,z) n y.n=z x z x.n.n.n ~ z y How to tabulate append?
How to tabulate procedures? Procedure input/output relation Not reachable Not effected proc: local (reachable) heap local heap main() { append(y,z); } p q append(List p, List q) { … } p q x n t x n t n y z y n z p q n n
How to handle sharing? External sharing may break the functional view main() { append(y,z); } p q n append(List p, List q) { … } p q n n n y t t x z y n z p q n x n
What’s the difference? n n n n x y append(y,z); append(y,z); t t y z x 1st Example 2nd Example append(y,z); append(y,z); x n t n n n y y t z x z
Cutpoints An object is a cutpoint for an invocation Reachable from actual parameters Not pointed to by an actual parameter Reachable without going through a parameter append(y,z) append(y,z) n n n n y y n n t t x z z
Cutpoint freedom Cutpoint-free Invocation: has no cutpoints Execution: every invocation is cutpoint-free Program: every execution is cutpoint-free append(y,z) append(y,z) n n n n y x t x t y z z
Memory states A memory state encodes a local heap Local variables of the current procedure invocation Relevant part of the heap Relevant Reachable main append p q n n x t y z
Interprocedural shape analysis Tabulation exits p p call f(x) y x x y
Interprocedural shape analysis Analyze f p p Tabulation exits p p call f(x) y x x y
Interprocedural shape analysis Procedure input/output relation Input Output q q rq rq p q p q n rp rq rp rq rp p n n q p q n n n rp rp rq rp rp rp rq …
Interprocedural shape analysis Reusable procedure summaries Heap modularity q p rp rq n y n z x append(y,z) rx ry rz y x z n rx ry rz append(y,z) g h i k n rg rh ri rk append(h,i)
Interprocedural analysis summary Call string approach Simple Efficient for coarse abstractions of call stacks Not modular Functional approach More complex chaotic iteration algorithm (not shown in detail here) Often uses tabulation Modular Better handles recursion
We’ve covered all the course material Good luck with the home exam!