Spring 2016 Program Analysis and Verification

Spring 2016 Program Analysis and Verification
Lecture 18: Interprocedural Analysis Roman Manevich Ben-Gurion University

Syllabus Program Verification Program Analysis Basics
Operational semantics Hoare Logic Applying Hoare Logic Weakest Precondition Calculus Proving Termination Data structures Automated Verification Program Analysis Basics From Hoare Logic to Static Analysis Control Flow Graphs Equation Systems Collecting Semantics Using Soot Abstract Interpretation fundamentals Lattices Fixed-Points Chaotic Iteration Galois Connections Domain constructors Widening/ Narrowing Analysis Techniques Numerical Domains Pointer analysis Shape Analysis Interprocedural Analysis

Previously Shape analysis TVLA

Today Handling procedures

Analyzing Procedures

Interprocedural Analysis
Until now assumed no procedures Intraprocedural analysis: analyzing one procedure at a time x = foo(…) interpreted by forgetting everything about x Very conservative analysis Does not handle global variables or heap Interprocedural analysis: uses calling relationships among procedures Enables more precise analysis information

Interprocedural analysis
foo() bar() Call bar() The effect of calling a procedure is the effect of executing its body – a big step

Interprocedural analysis
foo() bar() Call bar() Goal: compute the abstract effect of calling a procedure

Interprocedural analysis challenges
Stack can grow without a bound Matching of call/return

Reduction to intraprocedural analysis
Procedure inlining Naive solution: call-as-goto

Solution attempt #1 Inline callees into callers Good Bad
End up with one big procedure CFGs of individual procedures = duplicated many times Good Reduce to intraprocedural analysis: can use existing analyses as is Precise: distinguishes different calls to the same function Bad Exponential blow-up of code size Not efficient: re-analyze procedure bodies many times Doesn’t work with recursion

Inlining example void main() { int x; x = p(7); x = p(9); }
int p(int a) { return a + 1; }

Inlining example void main() { int x; int ret_p; { int a = 7; ret_p = a + 1; } x = ret_p; { int a = 9; ret_p = a + 1; } x = ret_p; } int p(int a) { return a + 1; } {x =8} {x =10}

Inlining: exponential blowup
main() { f1(); } f1() { f2(); … fn() { ... }

Solution attempt #2 Build a “supergraph” = inter-procedural CFG
Replace each call from P to Q with An edge from point before the call (call point) to Q’s entry point An edge from Q’s exit point to the point after the call (return pt) Add assignments of actual arguments to formal arguments, and assignment of return value Good: efficient Graph of each function included exactly once in the supergraph Works for recursive functions (although local variables need additional treatment) Bad: imprecise, “context-insensitive” The “unrealizable paths problem”: dataflow facts can propagate along infeasible control paths

Method calls in detail x=1; y=2; z=0; L1: z = foo(x, y, z)
assert x==1 && y==2 && z==3 foo(a, b, c) { int i = a+b; return i; }

Method calls in detail entry node call node Call context
x=1; y=2; z=0; L1: z = foo(x, y, z) L1 foo(a, b, c) int i = a+b; foo_ret = i; Call context assert x==1 && y==2 && z==3 L1 return; exit node return node

Simple example: CP int p(int a) { void main() { return a + 1; int x ;
} void main() { int x ; x = p(7); x = p(9) ; }

Simple example: CP int p(int a) { void main() { [a 7] int x ;
return a + 1; } void main() { int x ; x = p(7); x = p(9) ; }

Simple example: CP Special returned-value variable int p(int a) {
return a + 1; [a 7, $$ 8] } void main() { int x ; x = p(7); x = p(9) ; }

Simple example: CP int p(int a) { void main() { [a 7] int x ;
return a + 1; [a 7, $$ 8] } void main() { int x ; x = p(7); [x 8] x = p(9) ; }

Simple example: CP int p(int a) { void main() { [a ] int x ;
return a + 1; [a 7, $$ 8] } void main() { int x ; x = p(7); [x 8] x = p(9) ; }

return a + 1; [a , $$ ] } void main() { int x ; x = p(7); [x 8] x = p(9); }

return a + 1; [a , $$ ] } void main() { int x ; x = p(7) ; [x ] x = p(9) ; }

A naive interprocedural solution
Treat procedure calls as gotos Pros: Simple Usually fast Cons: Abstracts away call/return correlations: context-insensitive Obtain a conservative solution

why was the naive solution less precise?
Analysis by reduction Procedure inlining Call-as-goto void main() { int x ; x = p(7) ; [x ] x = p(9) ; } int p(int a) { [a ] return a + 1; [a , $$ ] } void main() { int a, x, ret; a = 7; ret = a+1; x = ret; a = 9; ret = a+1; x = ret; } [a ⊥, x ⊥, ret ⊥] [a 7, x 8, ret 8] [a 9, x 10, ret 10] why was the naive solution less precise?

Stack regime R P P() { … R(); } Q() { … R(); } R(){ … } call call
return return R P

Guiding light Exploit stack regime Precision Efficiency
Main idea / guiding light

Simplifying assumptions
Parameter passed by value No procedure nesting No concurrency Recursion is supported

Unrealizable paths zoo() foo() bar() Call bar() Call bar()

IVP: Interprocedural Valid Paths
f1 callq ret f2 fk-1 fk f3 ( ) fk-2 enterq exitq f4 fk-3 f5 IVP: all paths with matching calls and returns And prefixes

Valid paths zoo() foo() bar() (1 (2 Call bar() Call bar() )2 )1

Interprocedural valid paths
IVP set of paths Start at program entry Only considers matching calls and returns aka, valid Can be defined via context-free grammar matched ::= matched (i matched )i | ε valid ::= valid (i matched | matched paths can be defined by a regular expression

Sharir and Pnueli ‘82 Call String approach Functional approach
Blend interprocedural flow with intra procedural flow Tag every dataflow fact with call history Functional approach Determine the effect of a procedure E.g., in/out map Construct abstract transformers for entire procedure On-the-fly Functional approach very similar to Cousot & Cousot

The call string approach
Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks A flat abstract domain (different call strings incomparable) Abstract transformers C: call f()# = ? return from C: call f()# = ?

Call strings abstract domain
Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks IPA = (CtxStrDA, , , , , ) (c1c2c3, d1)  (c1c2c3, d2) = ?

Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks IPA = (CtxStrDA, , , , , ) (c1c2c3, d1)  (c1c2c3, d2) = (c1c2c3, d1A d2)

Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Ctx = Calling contexts in program = {c1, c2, …} Program locations of method call statements CtxStr = Ctx* Intuitively – represent all possible call stacks IPA = (CtxStrDA, , , , , ) (c1c2c3, d1)  (c1c2c3, d2) = (c1c2c3, d1A d2) Obeys ascending chain condition? Use Chaotic iterations over the supergraph

Extending analysis with call strings
Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1)  (c1c2c3, d2) = ?

Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1)  (c1c2c3, d2) = (c1c2c3, d1A d2)

Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1)  (c1c2c3, d2) = (c1c2c3, d1A d2) (c1c2c3, d1)  (c1c2c4, d2) = ?

Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) (c1c2c3, d1)  (c1c2c3, d2) = (c1c2c3, d1A d2) (c1c2c3, d1)  (c1c2c4, d2) = {(c1c2c3, d1), (c1c2c4, d2)}

Abstract domain used for intraprocedural analysis A = (DA, A, A, A, A, A) Construct abstract domain associating a single abstract state with each call string IPA = (CtxStrDA, , , , , ) Use Chaotic iterations over the supergraph Obeys ascending chain condition?

Supergraph Q P R fc2e f1 fc2e f1 f1 f1 f2 f3 f1 fx2r f5 f3 fx2r f6
What happens here? Q P R Entry node sP sQ sR Call node fc2e f1 Call node fc2e f1 f1 f1 n4 n6 n1 n2 Call R Call R f2 f3 f1 n5 n3 n7 Return node Return node fx2r f5 f3 fx2r f6 eP eR eQ What happens here? Exit node

Simple call-string example
int p(int a) { return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

int p(int a) { c1: [a7] return a + 1; } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

int p(int a) { c1: [a7] return a + 1; c1: [a7, $$8] } void main() { int x ; c1: x = p(7); c2: x = p(9) ; }

int p(int a) { c1: [a7] return a + 1; c1: [a7, $$8] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; }

int p(int a) { c1: [a7] c2: [a9] return a + 1; c1: [a7, $$8] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; }

int p(int a) { c1: [a7] c2: [a9] return a + 1; c1: [a7, $$8] c2: [a9, $$10] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; }

int p(int a) { c1: [a7] c2: [a9] return a + 1; c1: [a7, $$8] c2: [a9, $$10] } void main() { int x ; c1: x = p(7); : [x8] c2: x = p(9) ; : [x10] }

Enforcing termination
Adding arbitrary call strings violates ACC Analysis will not terminate for recursive procedures Too heavy even without recursion – exponential blow-up in number of possible call strings What can we do?

Abstracting call strings
CtxStr = Ctx* Can abstract CtxStr into a finite domain in different ways Most popular: k-limited suffix abstraction (keep only last k contexts) [k](c1…cn cn+1…cn+k) = cn+1…cn+k What we will talk about in this class Set abstraction: store all contexts in a set set(c1, c2, c1, c2, c3) = {c1, c2, c3} Forgets order of calls Combinations More generally – abstract using finite automata/regular expressions

Another example (|cs|=2)
void main() { int x ; c1: x = p(7);  : [x  16] c2: x = p(9) ;  : [x  20] } int p(int a) { c1:[a 7] c2:[a 9] return c3: p1(a + 1); c1:[a 7, $$ 16] c2:[a 9, $$ 20] } int p1(int b) { c1.c3:[b 8] c2.c3:[b 10] return 2 * b; c1.c3:[b 8,$$16] c2.c3:[b 10,$$20] }

Another example (|cs|=1)
void main() { int x ; c1: x = p(7);  : [x  ] c2: x = p(9) ; } int p(int a) { c1:[a 7] c2:[a 9] return c3: p1(a + 1); c1:[a 7, $$ ] c2:[a 9, $$ ] } int p1(int b) { (c1|c2)c3:[b ] return 2 * b; (c1|c2)c3:[b , $$] }

Handling recursion (|cs|=2)
void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] if (…) { a = a -1 ; c1: [a  6] c2: p (a); a = a + 1; } x = -2*a + 5;

void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] c1.c2: [a  6] if (…) { c1: [a  7] c1.c2: [a  6] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2: p (a); a = a + 1; } x = -2*a + 5;

void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] c1.c2: [a  6] c2.c2: [a  5] if (…) { c1: [a  7] c1.c2: [a  6] c2.c2: [a  5] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2.c2: [a  4] c2: p (a); a = a + 1; } x = -2*a + 5;

void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] c1.c2: [a  6] c2.c2: [a  5]  c2.c2: [a  4] if (…) { c1: [a  7] c1.c2: [a  6] c2.c2: [a  5] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2.c2: [a  4] c2: p (a); a = a + 1; } x = -2*a + 5;

void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] c1.c2: [a  6] c2.c2: [a  ] if (…) { c1: [a  7] c1.c2: [a  6] c2.c2: [a  5] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2.c2: [a  4] c2: p (a); a = a + 1; } x = -2*a + 5;

void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] c1.c2: [a  6] c2.c2: [a ] if (…) { c1: [a  7] c1.c2: [a  6] c2.c2: [a ] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2.c2: [a ] c2: p (a); a = a + 1; } x = -2*a + 5;

void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] c1.c2: [a  6] c2.c2: [a ] if (…) { c1: [a  7] c1.c2: [a  6] c2.c2: [a ] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2.c2: [a ] c2: p (a); a = a + 1; } x = -2*a + 5; c1: [a7, x9] c1.c2: [a6, x7] c2.c2: [a , x7]

void main() { c1: p(7);  : [x ] } int p(int a) { c1: [a  7] c1.c2: [a  6] c2.c2: [a ] if (…) { c1: [a  7] c1.c2: [a  6] c2.c2: [a ] a = a -1 ; c1: [a  6] c1.c2: [a  5] c2.c2: [a ] c2: p (a); a = a + 1; c1: [a  7] c1.c2: [a  6] c2.c2: [a ] } x = -2*a + 5; c1: [a7, x9] c1.c2: [a6, x7] c2.c2: [a , x7] What should the value of a be here?

Summary: call string Associate abstract state with each call string
Apply chaotic iterations to supergraph To guarantee termination apply abstraction to call string (e.g., suffix  2) Represents tails of calls Exponential in call-string length A kind of abstract inline Simple Often loses precision under recursion Although can still be precise in some cases

Functional approach The meaning of a procedure is mapping from states into states The abstract meaning of a procedure is function from abstract state to abstract states Procedure summary Usually implemented as a table and constructed on-the-fly

Interprocedural shape analysis for cutpoint-free programs
Noam Rinetzky Tel Aviv University Mooly Sagiv Tel Aviv University Eran Yahav Technion

How to handle procedures?
Pure functions Procedure  input/output relation No side-effects p ret 1 2 3 .. … main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x is even } int inc(int p) { return 2 + p - 1; }

How to handle procedures?
Pure functions Procedure  input/output relation No side-effects p ret Even Odd Tabulation main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x is even } int inc(int p) { return 2 + p - 1; } w x y z E O E O E

What about global variables?
p g ret g’ 1 … Procedures have side-effects Easy fix p g ret g’ Even E/O Odd int g = 0; g = p; int g = 0; main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x+g is even } int inc(int p) { g = p; return 2 + p - 1; }

But what about pointers and heap?
Aliasing Destructive update Heap Global resource Anonymous objects x.n.n ~ y n n append(y,z) n y.n=z x z x.n.n.n ~ z y How to tabulate append?

How to tabulate procedures?
Procedure  input/output relation Not reachable  Not effected proc: local (reachable) heap  local heap main() { append(y,z); } p q append(List p, List q) { … } p q x n t x n t n y z y n z p q n n

How to handle sharing? External sharing may break the functional view
main() { append(y,z); } p q n append(List p, List q) { … } p q n n n y t t x z y n z p q n x n

What’s the difference? n n n n x y append(y,z); append(y,z); t t y z x
1st Example 2nd Example append(y,z); append(y,z); x n t n n n y y t z x z

Cutpoints An object is a cutpoint for an invocation
Reachable from actual parameters Not pointed to by an actual parameter Reachable without going through a parameter append(y,z) append(y,z) n n n n y y n n t t x z z

Cutpoint freedom Cutpoint-free Invocation: has no cutpoints
Execution: every invocation is cutpoint-free Program: every execution is cutpoint-free append(y,z) append(y,z) n n n n y x t x t y z z

Memory states A memory state encodes a local heap
Local variables of the current procedure invocation Relevant part of the heap Relevant  Reachable main append p q n n x t y z

Interprocedural shape analysis
Tabulation exits p p call f(x) y x x y

Analyze f p p Tabulation exits p p call f(x) y x x y

Procedure  input/output relation Input Output q q rq rq p q p q n rp rq rp rq rp p n n q p q n n n rp rp rq rp rp rp rq …

Reusable procedure summaries Heap modularity q p rp rq n y n z x append(y,z) rx ry rz y x z n rx ry rz append(y,z) g h i k n rg rh ri rk append(h,i)

Interprocedural analysis summary
Call string approach Simple Efficient for coarse abstractions of call stacks Not modular Functional approach More complex chaotic iteration algorithm (not shown in detail here) Often uses tabulation Modular Better handles recursion

We’ve covered all the course material Good luck with the home exam!

Spring 2016 Program Analysis and Verification

Similar presentations

Presentation on theme: "Spring 2016 Program Analysis and Verification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spring 2016 Program Analysis and Verification

Similar presentations

Presentation on theme: "Spring 2016 Program Analysis and Verification"— Presentation transcript:

Similar presentations

About project

Feedback