Program Analysis and Verification 0368-4479

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Abstract Interpretation Part II
Predicate Abstraction and Canonical Abstraction for Singly - linked Lists Roman Manevich Mooly Sagiv Tel Aviv University Eran Yahav G. Ramalingam IBM T.J.
Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.
Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.
Interprocedural Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
1 Lecture 08(a) – Shape Analysis – continued Lecture 08(b) – Typestate Verification Lecture 08(c) – Predicate Abstraction Eran Yahav.
1 Lecture 06 – Inter-procedural Analysis Eran Yahav.
1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.
Local Heap Shape Analysis Noam Rinetzky Tel Aviv University Joint work with Jörg Bauer Universität des Saarlandes Thomas Reps University of Wisconsin Mooly.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Next Section: Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis (Wilson & Lam) –Unification.
Program analysis Mooly Sagiv html://
1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
Modular Shape Analysis for Dynamically Encapsulated Programs Noam Rinetzky Tel Aviv University Arnd Poetzsch-HeffterUniversität Kaiserlauten Ganesan RamalingamMicrosoft.
Interprocedural Analysis Noam Rinetzky Mooly Sagiv Tel Aviv University Textbook Chapter 2.5.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 Program Analysis Systematic Domain Design Mooly Sagiv Tel Aviv University Textbook: Principles.
Modular Shape Analysis for Dynamically Encapsulated Programs Noam Rinetzky Tel Aviv University Arnd Poetzsch-HeffterUniversität Kaiserlauten Ganesan RamalingamMicrosoft.
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
A Semantics for Procedure Local Heaps and its Abstractions Noam Rinetzky Tel Aviv University Jörg Bauer Universität des Saarlandes Thomas Reps University.
Static Program Analysis via Three-Valued Logic Thomas Reps University of Wisconsin Joint work with M. Sagiv (Tel Aviv) and R. Wilhelm (U. Saarlandes)
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Dagstuhl Seminar "Applied Deductive Verification" November Symbolically Computing Most-Precise Abstract Operations for Shape.
Program Analysis and Verification Noam Rinetzky Lecture 10: Shape Analysis 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
TVLA: A system for inferring Quantified Invariants Tal Lev-Ami Tom Reps Mooly Sagiv Reinhard Wilhelm Greta Yorsh.
Program Analysis and Verification
Shape Analysis Overview presented by Greta Yorsh.
Symbolically Computing Most-Precise Abstract Operations for Shape Analysis Greta Yorsh Thomas Reps Mooly Sagiv Tel Aviv University University of Wisconsin.
Program Analysis and Verification
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6
Interprocedural Analysis Noam Rinetzky Mooly Sagiv.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 13: Abstract Interpretation V Roman Manevich Ben-Gurion University.
Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications Chapter.
Program Analysis via 3-Valued Logic Thomas Reps University of Wisconsin Joint work with Mooly Sagiv and Reinhard Wilhelm.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Roman Manevich Ben-Gurion University Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 16: Shape Analysis.
Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Interprocedural shape analysis for cutpoint-free programs Noam Rinetzky Tel Aviv University Joint work with Mooly Sagiv Tel Aviv University Eran Yahav.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
Putting Static Analysis to Work for Verification A Case Study Tal Lev-Ami Thomas Reps Mooly Sagiv Reinhard Wilhelm.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
Program Analysis and Verification
Interprocedural shape analysis for cutpoint-free programs
Spring 2016 Program Analysis and Verification
Spring 2016 Program Analysis and Verification
G. Ramalingam Microsoft Research, India & K. V. Raghavan
Symbolic Implementation of the Best Transformer
Iterative Program Analysis Abstract Interpretation
Program Analysis and Verification
Program Analysis and Verification
Program Analysis and Verification
Program Analysis and Verification
Program Analysis and Verification
Pointer analysis.
Symbolic Characterization of Heap Abstractions
A Semantics for Procedure Local Heaps and its Abstractions
Program Analysis and Verification
Presentation transcript:

Program Analysis and Verification Noam Rinetzky Lecture 13: Interprocedural Shape Analysis Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav

Effect of procedures The effect of calling a procedure is the effect of executing its body call bar() foo() bar() 2

Interprocedural Analysis goal: compute the abstract effect of calling a procedure call bar() foo() bar() 3

A trivial treatment of procedure Analyze a single procedure After every call continue with conservative information – Global variables and local variables which “ may be modified by the call ” have unknown values Can be easily implemented Procedures can be written in different languages Procedure inline can help 4

Reduction to intraprocedural analysis Procedure inlining Naive solution: call-as-goto 5

Interprocedural analysis: Guiding light Exploit stack regime  Precision  Efficiency 6

Stack regime P() { … R(); … } R(){ … } Q() { … R(); … } call return call return P R 7

Simplifying Assumptions - Parameter passed by value - No procedure nesting - No concurrency Recursion is supported 8

The collecting lattice Lattice for a given control-flow node v: L v =(2 State, , , , , State) Lattice for entire control-flow graph with nodes V: L CFG = Map(V, L v ) We will use this lattice as a baseline for static analysis and define abstractions of its elements 9

paths(n) the set of paths from s to n – ( (s,n 1 ), (n 1,n 3 ), (n 3,n 1 ) ) Paths s n1n1 n3n3 n2n2 f1f1 f1f1 f3f3 f2f2 f1f1 e f3f3 10

Join Over All Paths (JOP) start n i JOP[v] =  {[[ e 1, e 2, …,e n ]](  ) | ( e 1, …, e n ) ∈ paths(v)} JOP  LFP – Sometimes JOP = LFP precise up to “symbolic execution” Distributive problem  L → L fk o... o f1 11

Join-Over-All-Paths (JOP) Let paths(v) denote the potentially infinite set paths from start to v (written as sequences of edges) For a sequence of edges [e 1, e 2, …, e n ] define f [e 1, e 2, …, e n ]: L → L by composing the effects of basic blocks f [e 1, e 2, …, e n ](l) = f(e n ) (… ( f(e 2 ) ( f(e 1 ) (l)) …) JOP[v] =  { f [ e 1, e 2, …,e n ](  ) | [ e 1, e 2, …, e n ] ∈ paths(v)} 12

Join-Over-All-Paths (JOP) 9 e1 e2 e3 e4 e5 e6 e7 e8 Paths transformers: f[e1,e2,e3,e4] f[e1,e2,e7,e8] f[e5,e6,e7,e8] f[e5,e6,e3,e4] f[e1,e2,e3,e4,e9, e1,e2,e3,e4] f[e1,e2,e7,e8,e9, e1,e2,e3,e4,e9,…] … JOP: f[e1,e2,e3,e4](initial)  f[e1,e2,e7,e8](initial)  f[e5,e6,e7,e8](initial)  f[e5,e6,e3,e4](initial)  … e9 Number of program paths is unbounded due to loops 13

The lfp computation approximates JOP JOP[v] =  { f [ e 1, e 2, …,e n ](  ) | [ e 1, e 2, …, e n ] ∈ paths(v)} LFP[v] =  { f [ e ](LFP[v’]) | e =(v ’,v)} JOP  LFP - for a monotone function – f(x  y)  f(x)  f(y) JOP = LFP - for a distributive function – f(x  y) = f(x)  f(y) JOP may not be precise enough for interprocedural analysis! LFP[v 0 ]  =  14

Interprocedural analysis sRsR n1n1 n3n3 n2n2 f1f1 f3f3 f2f2 f1f1 eReR f3f3 sPsP n5n5 n4n4 f1f1 ePeP f5f5 P R Call Q f c2e f x2r Call node Return node Entry node Exit node sQsQ n7n7 n6n6 f1f1 eQeQ f6f6 Q Call Q Call node Return node f c2e f x2r f1f1 Supergraph 15

Interprocedural Valid Paths () f1f1 f2f2 f k -1 fkfk f3f3 f4f4 f5f5 f k -2 f k -3 call q enter q exit q ret IVP: all paths with matching calls and returns – And prefixes 16

Interprocedural Valid Paths IVP set of paths – Start at program entry Only considers matching calls and returns – aka, valid Can be defined via context free grammar – matched ::= matched ( i matched ) i | ε – valid ::= valid ( i matched | matched paths can be defined by a regular expression 17

The Join-Over-Valid-Paths (JVP) vpaths(n) all valid paths from program start to n JVP[n] =  {[[ e 1, e 2, …, e ]](  ) ( e 1, e 2, …, e ) ∈ vpaths( n )} JVP  JOP – In some cases the JVP can be computed – (Distributive problem) 18

The Call String Approach The data flow value is associated with sequences of calls (call string) Use Chaotic iterations over the supergraph 19

Summary Call String Easy to implement Efficient for very small call strings Limited precision – Often loses precision for recursive programs – For finite domains can be precise even with recursion (with a bounded callstring) Order of calls can be abstracted Related method: procedure cloning 20

The Functional Approach The meaning of a procedure is mapping from states into states The abstract meaning of a procedure is function from an abstract state to abstract states Relation between input and output In certain cases can compute JVP 21

The Functional Approach Two phase algorithm – Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values) – Compute the dataflow values at every point using the functional values 22

Phase 1 int p(int a) { if (…) { a = a -1 ; p (a); a = a + 1; } x = -2*a + 5; } void main() { p(7); } [a  a 0, x  ] [a  a 0, x  x 0 ] [a  a 0 -1, x  x 0 ] [a  a 0 -1, x  -2a 0 +7] [a  a 0, x  -2a 0 +7] [a  a 0, x  x 0 ] [a  a 0, x  -2*a 0 +5] p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] 23

int p(int a) { if (…) { a = a -1 ; p (a); a = a + 1; } x = -2*a + 5; } Phase 2 void main() { p(7); } p(a 0,x 0 ) = [a  a 0, x  -2a 0 + 5] [a  7, x  0] [a  7, x  -9] [x  -9] [a  6, x  0] [a  6, x  -7] [a  7, x  -7] [a  6, x  0] [a , x  0] [a , x  ] 24

Summary Functional approach Computes procedure abstraction Sharing between different contexts Rather precise Recursive procedures may be more precise/efficient than loops But requires more from the implementation – Representing (input/output) relations – Composing relations 25

Issues in Functional Approach How to guarantee that finite height for functional lattice? – It may happen that L has finite height and yet the lattice of monotonic function from L to L do not Efficiently represent functions – Functional join – Functional composition – Testing equality 26

27 Special case: L is finite Data facts: d  L  L Initialization: – f start,start = ( ,  ) ; otherwise ( ,  ) – S[start,  ] =  Propagation of (x,y) over edge e = (n,n’) - Maintain summary: S[n’,x] = S[n’,x]   n  (y)) - n intra-node:  n’: (x,  n  (y)) - n call-node:  n’: (y,y) if S[n’,y] =  and n’= entry node  n’: (x,z) if S[exit(call(n),y] = z and n’= ret-site-of n - n return-node:  n’: (u,y) ; n c = call-site-of n’, S[n c,u]=x Tabulation

CFL-Graph reachability Special cases of functional analysis Finite distributive lattices Provides more efficient analysis algorithms Reduce the interprocedural analysis problem to finding context free reachability 28

IDFS / IDE IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL’95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Reps, Horowitz, and Sagiv, FASE’95, TCS’96 –More general solutions exist 29

Possibly Uninitialized Variables Startx = 3 if... y = x y = w w = 8 printf(y) {w,x,y} {w,y} {w} {w,y} {} {w,y} {} 30

IFDS Problems Finite subset distributive – Lattice L =  (D) –  is  –  is  – Transfer functions are distributive Efficient solution through formulation as CFL reachability 31

Encoding Transfer Functions Enumerate all input space and output space Represent functions as graphs with 2(D+1) nodes Special symbol “ 0 ” denotes empty sets (sometimes denoted  ) Example: D = { a, b, c } f(S) = (S – {a}) U {b} 207 0abc 0abc 32

Efficiently Representing Functions Let f:2 D  2 D be a distributive function Then: – f(X) = f( ∅ )  (  { f({z}) | z ∈ X }) – f(X) = f( ∅ )  (  { f({z}) \ f( ∅ ) | z ∈ X }) 33

Representing Dataflow Functions Identity Function Constant Function a bc a bc 34

“ Gen/Kill ” Function Non- “ Gen/Kill ” Function a bc a bc Representing Dataflow Functions 35

x = 3 p(x,y) return from p printf(y) start main exit main start p(a,b) if... b = a p(a,b) return from p printf(b) exit p xy a b 36

a bcbc a Composing Dataflow Functions bc a 37

x = 3 p(x,y) return from p start main exit main start p(a,b) if... b = a p(a,b) return from p exit p xy a b printf(y) Might b be uninitialized here? printf(b) NO! ( ] Might y be uninitialized here? YES! ( ) 38

The Tabulation Algorithm Worklist algorithm, start from entry of “ main ” Keep track of – Path edges: matched paren paths from procedure entry – Summary edges: matched paren call-return paths At each instruction – Propagate facts using transfer functions; extend path edges At each call – Propagate to procedure entry, start with an empty path – If a summary for that entry exits, use it At each exit – Store paths from corresponding call points as summary paths – When a new summary is added, propagate to the return node 39

Interprocedural Dataflow Analysis via CFL-Reachability Graph: Exploded control-flow graph L : L ( unbalLeft ) – unbalLeft = valid Fact d holds at n iff there is an L ( unbalLeft )-path from 40

Asymptotic Running Time CFL-reachability – Exploded control-flow graph: ND nodes – Running time: O ( N 3 D 3 ) Exploded control-flow graph Special structure Running time: O ( ED 3 ) Typically: E N, hence O ( ED 3 ) O ( ND 3 ) “ Gen/kill ” problems: O ( ED ) 41

IDE Goes beyond IFDS problems – Can handle unbounded domains Requires special form of the domain Can be much more efficient than IFDS 42

Example Linear Constant Propagation Consider the constant propagation lattice The value of every variable y at the program exit can be represented by: y =  {(a x x + b x )| x ∈ Var * }  c a x,c ∈ Z  { ,  } b x ∈ Z Supports efficient composition and “ functional ” join – [z := a * y + b] – What about [z:=x+y]? 43

Linear constant propagation Point-wise representation of environment transformers 44

IDE Analysis Point-wise representation closed under composition CFL-Reachability on the exploded graph Compose functions 45

46

47

Costs O(ED 3 ) Class of value transformers F  L  L – id  F – Finite height Representation scheme with (efficient) Application Composition Join Equality Storage 48

Conclusion Handling functions is crucial for abstract interpretation Virtual functions and exceptions complicate things But scalability is an issue – Small call strings – Small functional domains – Demand analysis 49

Challenges in Interprocedural Analysis Respect call-return mechanism Handling recursion Local variables Parameter passing mechanisms The called procedure is not always known The source code of the called procedure is not always available 50

Bibliography Textbook 2.5 Patrick Cousot & Radhia Cousot. Static determination of dynamic properties of recursive procedures In IFIP Conference on Formal Description of Programming Concepts, E.J. Neuhold, (Ed.), pages , St-Andrews, N.B., Canada, North-Holland Publishing Company (1978). Two Approaches to interprocedural analysis by Micha Sharir and Amir Pnueli IDFS Interprocedural Distributive Finite Subset Precise interprocedural dataflow analysis via graph reachability. Reps, Horowitz, and Sagiv, POPL’95 IDE Interprocedural Distributive Environment Precise interprocedural dataflow analysis with applications to constant propagation. Sagiv, Reps, Horowitz, and TCS’96 51

Disadvantages of the trivial solution Modular (object oriented and functional) programming encourages small frequently called procedures Almost all information is lost 52

A Semantics for Procedure Local Heaps and its Abstractions Noam Rinetzky Tel Aviv University Jörg Bauer Universität des Saarlandes Thomas Reps University of Wisconsin Mooly Sagiv Tel Aviv University Reinhard Wilhelm Universität des Saarlandes

Motivation Interprocedural shape analysis – Conservative static pointer analysis – Heap intensive programs Imperative programs with procedures Recursive data structures Challenge – Destructive update – Localized effect of procedures

Main idea Local heaps y t g x y t g call p(x); x x x

Main idea Local heaps Cutpoints y t g x y t g call p(x); x x x

Main Results Concrete operational semantics – Large step Functional analysis – Storeless Shape abstractions – Local heap – Observationally equivalent to “ standard ” semantics Java and “ clean ” C Abstractions – Shape analysis [Sagiv, Reps, Wilhelm, TOPLAS ‘ 02] – May-alias [Deutsch, PLDI ‘ 94] –…–…

Outline Motivating example – Local heaps – Cutpoints Why semantics Local heap storeless semantics Shape abstraction

static List reverse(List t) { } static void main() { } Example p n n t r n n n List x = reverse(p); return r; n n t List y = reverse(q); List z = reverse(x); … n n n t r n n n p x n n q n n q

static List reverse(List t) { } static void main() { } Example List y = reverse(q); return r;List z = reverse(x); List x = reverse(p); n n t t r n n n t r n n n n n n p x q y n n n n t q n n n n n p x n n n

static List reverse(List t) { } static void main() { } Example return r; n n t t r n n n t r n n n n n n p x x z n n n p x List z = reverse(x); List x = reverse(p); List y = reverse(q); q y n n n n n n t n n n t q y n n n p n n n

Separating objects – Not pointed-to by a parameter Cutpoints

Separating objects – Not pointed-to by a parameter Cutpoints p x n n n n n n proc(x) Stack sharing

Separating objects – Not pointed-to by a parameter x n n n n n n n y Cutpoints p x n n n n n n n proc(x) Stack sharing Heap sharing proc(x)

Separating objects – Not pointed-to by a parameter Capture external sharing patterns x n n n n n n n y Cutpoints p x n n n n n n n proc(x) Stack sharing Heap sharing proc(x)

static List reverse(List t) { } static void main() { } Example return r; r t n n n r t n n n n n n p x z x n n n p x List z = reverse(x); List x = reverse(p); List y = reverse(q); q y n n n n n n t q y n n n p n n n

Outline Motivating example Why semantics Local heap storeless semantics Shape abstraction

Abstract Interpretation [Cousot and Cousot, POPL ’ 77] Operational semantics Abstract transformer  

Introducing local heap semantics Operational semantics Abstract transformer Local heap Operational semantics ~ ’’ ’’

Outline Motivating example Why semantics Local heap storeless semantics Shape abstraction

Programming model Single threaded Procedures Value parameters Recursion Heap Recursive data structures Destructive update  No explicit addressing (&)  No pointer arithmetic

Simplifying assumptions No primitive values (only references) No globals Formals not modified

Storeless semantics No addresses Memory state: – Object: 2 Access paths – Heap: 2 Object Alias analysis y=x x nn x x.n x.n.n x=null x nn xyxy x.n y.n x.n.n y.n.n y y nn y y.n y.n.n

static void main() { } static List reverse(List t) { return r; } Example x List z = reverse(x); p x.n.n n n x.n.n.n p x x.n n y.n.n q n y y.n n y q y.n.n q n y y.n n y q t.n.n t.n.n.n t t.n t.n.n n n t.n.n.n t t.n n t t n n n List x = reverse(p); List y = reverse(q); r.n n n r t r.n.n.n r.n.n n t r r.n n n r t r.n.n.n r.n.n n t r z.n n n z x z.n.n.n z.n.n n z x p?

static void main() { } static List reverse(List t) { return r; } Example x List z = reverse(x); p x.n.n n n x.n.n.n p x x.n n y.n.n q n y y.n n y q y.n.n q n y y.n n y q t.n.n t.n.n.n L t t.n t.n.n n n t.n.n.n L t t.n n L t L t n n n List x = reverse(p); List y = reverse(q); L.n r.n n n LrLr t L.n.n.n r.n.n.n L.n.n r.n.n n L t r L.n r.n n n LrLr t L.n.n.n r.n.n.n L.n.n r.n.n n t L r p.n z.n n n pzpz x p.n.n.n z.n.n.n p.n.n z.n.n n z x p p.n pp.n.n p.n.n.n

Cutpoint labels Relate pre-state with post-state Additional roots Mark cutpoints at and throughout an invocation

Cutpoint labels Cutpoint label : the set of access paths that point to a cutpoint – when the invoked procedure starts L t.n.n t.n.n.n L t t.n t L  {t.n.n.n}

Sharing patterns Cutpoint labels encode sharing patterns L t t.n.n n n t.n.n.n L t t.n n L t t.n.n n n t.n.n.n L t t.n n p w n w w.n n L  {t.n.n.n} Stack sharing Heap sharing

Observational equivalence  L   L (Local-heap Storeless Semantics)  G   G (Global-heap Store-based Semantics)  L and  G observationally equivalent when for every access paths AP 1, AP 2  AP 1 = AP 2  (  L )   AP 1 = AP 2  (  G )

Main theorem: semantic equivalence  L   L (Local-heap Storeless Semantics)  G   G (Global-heap Store-based Semantics)  L and  G observationally equivalent  st,  L    ’ L   st,  G    ’ G  ’ L and  ’ G are observationally equivalent LSL GSB

Corollaries Preservation of invariants – Assertions: AP 1 = AP 2 Detection of memory leaks

Applications Develop new static analyses – Shape analysis Justify soundness of existing analyses

Related work Storeless semantics – Jonkers, Algorithmic Languages ‘ 81 – Deutsch, ICCL ‘ 92

Shape abstraction Shape descriptors represent unbounded memory states – Conservatively – In a bounded way Two dimensions – Local heap (objects) – Sharing pattern (cutpoint labels)

A Shape abstraction L r.n L.n rLrL t, r.n.n.n L.n.n.n r.n.n L.n.n t L={t.n.n.n} r n n n

A Shape abstraction L r.n L.n rLrL t, r.n.n.n L.n.n.n r.n.n L.n.n t L=* r n n n

L rLrL t, r.n L.n r.n L.n t L=* r n n n A Shape abstraction L r.n L.n rLrL t, r.n L.n r.n L.n t L=* r n n n

A Shape abstraction L rLrL t, r.n L.n r.n L.n t L=* r n n n

A Shape abstraction L rLrL t, r.n L.n r.n L.n t L=* r n n n L r.n L.n rLrL t, r.n.n.n L.n.n.n r.n.n L.n.n t L={t.n.n.n} r n n n

A Shape abstraction rLrL t, r.n L.n r.n L.n t L=* r n n n L1 r.n L1.n r L1 t, r.n.n.n L1.n.n.n r.n.n L1.n.n t L1={t.n.n.n} r n n n L2={g.n.n.n} L2 d.n L2.n d L2 g, d.n.n.n L2.n.n.n d.n.n L2.n.n g d n n n L dLdL t, d.n L.n d.n L.n t d n n n

Cutpoint-Freedom 91

main() { append(y,z); } Procedure  input/output relation – Not reachable  Not effected – proc: local (  reachable) heap  local heap How to tabulate procedures? append(List p, List q) { … } n x n t y n z n p q p q n y z p q n x n t

main() { append(y,z); } y n n z n How to handle sharing? External sharing may break the functional view append(List p, List q) { … } n n t z x n p q n n p q n n t y p q n n n x

append(y,z); What ’ s the difference? n n t z y x 1 st Example2 nd Example append(y,z); n x n t y z

Cutpoints An object is a cutpoint for an invocation – Reachable from actual parameters – Not pointed to by an actual parameter – Reachable without going through a parameter append(y,z) y x n n t z y n n t z n n

Cutpoint freedom Cutpoint-free – Invocation: has no cutpoints – Execution: every invocation is cutpoint-free – Program: every execution is cutpoint-free x y n n t z y n n t z x append(y,z)

Interprocedural shape analysis for cutpoint-free programs using 3-Valued Shape Analysis

Memory states: 2-Valued Logical Structure A memory state encodes a local heap – Local variables of the current procedure invocation – Relevant part of the heap Relevant  Reachable x y n n t z p q main append

Memory states Represented by first-order logical structures PredicateMeaning x(v)x(v)Variable x points to v n(v1,v2)n(v1,v2)Field n of object v 1 points to v 2

Memory states Represented by first-order logical structures p q p u1u1 1 u2u2 0 u1u1 u2u2 q u1u1 0 u2u2 1 nu1u1 u2u2 u1u1 01 u2u2 00 n PredicateMeaning x(v)x(v)Variable x points to v n(v1,v2)n(v1,v2)Field n of object v 1 points to v 2

Operational semantics Statements modify values of predicates Specified by predicate-update formulae – Formulae in FO-TC

append body Procedure calls p q p n q y z x n y z x n n 1. Verify cutpoint freedom 2Compute input … Execute callee … 3 Combine output append(y,z) append(p,q)

Procedure call: 1. Verifying cutpoint-freedom y x n t z y n t z n n Not Cutpoint free n n x Cutpoint free An object is a cutpoint for an invocation – Reachable from actual parameters – Not pointed to by an actual parameter – Reachable without going through a parameter append(y,z)

Procedure call: 1. Verifying cutpoint-freedom y x n t z y n t z n n (main ’ s locals: x,y,z,t) Cutpoint free Not Cutpoint free Invoking append(y,z) in main – R {y,z} (v)=  v 1 :y(v 1 )  n*(v 1,v)   v 1 :z(v 1 )  n*(v 1,v) – isCP main,{y,z} (v)= R {y,z} (v)  (  y(v)  z(v 1 ))  ( x(v)  t(v)   v 1 :  R {y,z} (v 1 )  n(v 1,v)) n n x

Procedure call: 2. Computing the input local heap Retain only reachable objects Bind formal parameters y n t z n n x Call state p n q n Input state

Procedure body: append(p,q) Input state p n q n Output state p n q n n

Procedure call: 3. Combine output Call state Output state y n t z n n p n q n n x

Procedure call: 3. Combine output y n z n p n q n inUc(v) inUx(v) Auxiliary predicates n t n x y n z n n x t n Call state Output state

Observational equivalence  CPF   CPF (Cutpoint free semantics)  GSB   GSB (Standard semantics)  CPF and  GSB observationally equivalent when for every access paths AP 1, AP 2  AP 1 = AP 2  (  CPF )   AP 1 = AP 2  (  GSB )

Observational equivalence For cutpoint free programs: –  CPF   CPF (Cutpoint free semantics) –  GSB   GSB (Standard semantics) –  CPF and  GSB observationally equivalent It holds that –  st,  CPF    ’ CPF   st,  GSB    ’ GSB –  ’ CPF and  ’ GSB are observationally equivalent

Introducing local heap semantics Operational semantics Abstract transformer Local heap Operational semantics ~ ’’ ’’

Shape abstraction Abstract memory states represent unbounded concrete memory states – Conservatively – In a bounded way – Using 3-valued logical structures

3-Valued logic 1 = true 0 = false 1/2 = unknown A join semi-lattice, 0  1 = 1/2

Canonical abstraction y z x t n n n n n n y z x n n t n n n n

Instrumentation predicates Record derived properties Refine the abstraction – Instrumentation principle [SRW, TOPLAS’02] Reachability is central! PredicateMeaning rx(v)rx(v)v is reachable from variable x r obj (v 1,v 2 )v 2 is reachable from v 1 ils(v)v is heap-shared c(v)v resides on a cycle

Abstract memory states (with reachability) y r x r x,r y r x n n z rzrz rzrz x n n rtrt t rtrt n rtrt n r x,r y r x rzrz rzrz rtrt rtrt rtrt rzrz n y r x,r y n n z rzrz rzrz x r x n n rtrt t rtrt n n n rzrz

The importance of reachability: Call append(y,z) y z r x r x,r y n n r z x r x n n r t t n n y z x n n t n n n r x r x,r y r x n n rzrz rzrz x n n rtrt t rtrt n rtrt n rzrz n y z n

Abstract semantics Conservatively apply statements on abstract memory states – Same formulae as in concrete semantics – Soundness guaranteed [SRW, TOPLAS’02]

append body Procedure calls p q p n q y z x n y z x n n 1. Verify cutpoint freedom 2 Compute input … Execute callee … 3 Combine output append(y,z) append(p,q)

append body Procedure calls p q p n q y z x n y z x n n 1. Verify cutpoint freedom 2Compute input … Execute callee … 3 Combine output append(y,z) append(p,q)

Conservative verification of cutpoint- freedom y t z y n t n n n ryry ryry ryry ryry ryry rxrx rtrt rtrt rtrt rtrt rzrz rzrz n n n n n z x Cutpoint free Not Cutpoint free Invoking append(y,z) in main – R {y,z} (v)=  v 1 :y(v 1 )  n*(v 1,v)   v 1 :z(v 1 )  n*(v 1,v) – isCP main,{y,z} (v)= R {y,z} (v)  (  y(v)  z(v 1 ))  ( x(v)  t(v)   v 1 :  R {y,z} (v 1 )  n(v 1,v))

Tabulation exits y Interprocedural shape analysis call f(x) p x y x p

p y Interprocedural shape analysis call f(x) x y p p Tabulation exits Analyze f p x

Interprocedural shape analysis Procedure  input/output relation p q n n rqrq rprp rprp q p n n rprp rqrq n rprp q q rqrq rqrq q p rprp q p rprp rqrq n rprp rqrq … Input Output rprp

Interprocedural shape analysis Reusable procedure summaries – Heap modularity q p rprp rqrq n rprp q p rprp rqrq g h i k n n g h i k n rgrg rgrg rgrg rhrh rhrh riri rkrk rhrh rgrg riri rkrk append(h,i) y x z n n nn rxrx ryry rzrz y x z n n n rzrz rxrx ryry rxrx rxrx rxrx ryry rxrx rxrx append(y,z) y n z x y z x rxrx ryry rzrz ryry rxrx ryry

Plan Cutpoint freedom Non-standard concrete semantics Interprocedural shape analysis Prototype implementation

TVLA based analyzer Soot-based Java front-end Parametric abstraction Data structureVerified properties Singly linked listCleanness, acyclicity Sorting (of SLL)+ Sortedness Unshared binary treesCleaness, tree-ness

Iterative vs. Recursive (SLL) 585

Inline vs. Procedural abstraction // Allocates a list of // length 3 List create3(){ … } main() { List x1 = create3(); List x2 = create3(); List x3 = create3(); List x4 = create3(); … }

Call string vs. Relational vs. CPF [Rinetzky and Sagiv, CC ’ 01] [Jeannet et al., SAS ’ 04]

Related Work Interprocedural shape analysis – Rinetzky and Sagiv, CC ’ 01 – Chong and Rugina, SAS ’ 03 – Jeannet et al., SAS ’ 04 – Hackett and Rugina, POPL ’ 05 – Rinetzky et al., POPL ‘ 05 Local Reasoning – Ishtiaq and O ’ Hearn, POPL ‘ 01 – Reynolds, LICS ’ 02 Encapsulation – Noble et al. IWACO ’ 03 –...

Future work Bounded number of cutpoints False cutpoints – Liveness analysis y n t n n ryry ryry ryry rxrx rtrt rtrt rzrz n n z x append(y,z); x = null;

Summary Cutpoint freedom Non-standard operational semantics Interprocedural shape analysis – Partial correctness of quicksort Prototype implementation

Application Properties proved – Absence of null dereferences – Listness preservation – API conformance Recursive  Iterative Procedural abstraction

Related work Interprocedural shape analysis – Rinetzky, Bauer, Reps, Sagiv, Wilhelm POPL’05 Cutpoints – Rinetzky and Sagiv, CC ’ 01 Global heap – Jeannet et al., SAS ’ 04 Local heap, relational – Chong and Rugina, SAS ’ 03 Local heap Local reasoning – Ishtiaq and O ’ Hearn, POPL ‘ 01 – Reynolds, LICS ’ 02

Summary Operational semantics – Storeless – Local heap – Cutpoints – Equivalence theorem Applications – Shape analysis – May-alias analysis

Project 1-2 Students in a group Theoretical + Practical Your choice of topic – Contact me in 2 weeks Submission – 15/Sep – Code + Examples – Document – 20 minutes presentation 137

Past projects JavaScript Dominator Analysis Attribute Analysis for JavaScript Simple Pointer Analysis for C Adding program counters to Past Abstraction Verification of Asynchronous programs Verifying SDNs using TVLA Verifying independent accesses to arrays in GO 138

Past projects Detecting index out of bound errors in C programs Lattice-Based Semantics for Combinatorial Models Evolution 139