Compile-Time Verification of Properties of Heap Intensive Programs Mooly Sagiv Thomas Reps Reinhard Wilhelm

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Abstract Interpretation Part II
Predicate Abstraction and Canonical Abstraction for Singly - linked Lists Roman Manevich Mooly Sagiv Tel Aviv University Eran Yahav G. Ramalingam IBM T.J.
Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.
Interprocedural Shape Analysis for Recursive Programs Noam Rinetzky Mooly Sagiv.
Abstraction and Modular Reasoning for the Verification of Software Corina Pasareanu NASA Ames Research Center.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
Bebop: A Symbolic Model Checker for Boolean Programs Thomas Ball Sriram K. Rajamani
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
1 E. Yahav School of Computer Science Tel-Aviv University Verifying Safety Properties using Separation and Heterogeneous Abstractions G. Ramalingam IBM.
1 Lecture 07 – Shape Analysis Eran Yahav. Previously  LFP computation and join-over-all-paths  Inter-procedural analysis  call-string approach  functional.
1 Lecture 08(a) – Shape Analysis – continued Lecture 08(b) – Typestate Verification Lecture 08(c) – Predicate Abstraction Eran Yahav.
Static Program Analysis via Three-Valued Logic Thomas Reps University of Wisconsin Joint work with M. Sagiv (Tel Aviv) and R. Wilhelm (U. Saarlandes)
Local Heap Shape Analysis Noam Rinetzky Tel Aviv University Joint work with Jörg Bauer Universität des Saarlandes Thomas Reps University of Wisconsin Mooly.
Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Finite Differencing of Logical Formulas for Static Analysis Thomas Reps University of Wisconsin Joint work with M. Sagiv and A. Loginov.
3-Valued Logic Analyzer (TVP) Part II Tal Lev-Ami and Mooly Sagiv.
1 Motivation Dynamically allocated storage and pointers are an essential programming tools –Object oriented –Modularity –Data structure But –Error prone.
Model Checking of Concurrent Software: Current Projects Thomas Reps University of Wisconsin.
Modular Shape Analysis for Dynamically Encapsulated Programs Noam Rinetzky Tel Aviv University Arnd Poetzsch-HeffterUniversität Kaiserlauten Ganesan RamalingamMicrosoft.
Overview of program analysis Mooly Sagiv html://
Semantics with Applications Mooly Sagiv Schrirber html:// Textbooks:Winskel The.
Detecting Memory Errors using Compile Time Techniques Nurit Dor Mooly Sagiv Tel-Aviv University.
Modular Shape Analysis for Dynamically Encapsulated Programs Noam Rinetzky Tel Aviv University Arnd Poetzsch-HeffterUniversität Kaiserlauten Ganesan RamalingamMicrosoft.
1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
A Semantics for Procedure Local Heaps and its Abstractions Noam Rinetzky Tel Aviv University Jörg Bauer Universität des Saarlandes Thomas Reps University.
Static Program Analysis via Three-Valued Logic Thomas Reps University of Wisconsin Joint work with M. Sagiv (Tel Aviv) and R. Wilhelm (U. Saarlandes)
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Dagstuhl Seminar "Applied Deductive Verification" November Symbolically Computing Most-Precise Abstract Operations for Shape.
Program Analysis and Verification Noam Rinetzky Lecture 10: Shape Analysis 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Thread Quantification for Concurrent Shape Analysis Josh BerdineMSR Cambridge Tal Lev-AmiTel Aviv University Roman ManevichTel Aviv University Mooly Sagiv.
June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.
T. Lev-Ami, R. Manevich, M. Sagiv TVLA: A System for Generating Abstract Interpreters A. Loginov, G. Ramalingam, E. Yahav.
TVLA: A system for inferring Quantified Invariants Tal Lev-Ami Tom Reps Mooly Sagiv Reinhard Wilhelm Greta Yorsh.
1 Employing decision procedures for automatic analysis and verification of heap-manipulating programs Greta Yorsh under the supervision of Mooly Sagiv.
Shape Analysis Overview presented by Greta Yorsh.
Shape Analysis via 3-Valued Logic Mooly Sagiv Thomas Reps Reinhard Wilhelm
Symbolically Computing Most-Precise Abstract Operations for Shape Analysis Greta Yorsh Thomas Reps Mooly Sagiv Tel Aviv University University of Wisconsin.
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6
Schedule 27/12 Shape Analysis 3/1 Static Analysis in Soot 10/1 Static Analysis in LLVM 17/1 Advanced Topics: Concurrent programs and TAU research topics.
Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
1 Program Analysis via 3-Valued Logic Mooly Sagiv, Tal Lev-Ami, Roman Manevich Tel Aviv University Thomas Reps, University of Wisconsin, Madison Reinhard.
Program Analysis via 3-Valued Logic Thomas Reps University of Wisconsin Joint work with Mooly Sagiv and Reinhard Wilhelm.
Roman Manevich Ben-Gurion University Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 16: Shape Analysis.
Shape & Alias Analyses Jaehwang Kim and Jaeho Shin Programming Research Laboratory Seoul National University
1 Simulating Reachability using First-Order Logic with Applications to Verification of Linked Data Structures Tal Lev-Ami 1, Neil Immerman 2, Tom Reps.
Interprocedural shape analysis for cutpoint-free programs Noam Rinetzky Tel Aviv University Joint work with Mooly Sagiv Tel Aviv University Eran Yahav.
Finding bugs with a constraint solver daniel jackson. mandana vaziri mit laboratory for computer science issta 2000.
Putting Static Analysis to Work for Verification A Case Study Tal Lev-Ami Thomas Reps Mooly Sagiv Reinhard Wilhelm.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
Interprocedural shape analysis for cutpoint-free programs
Partially Disjunctive Heap Abstraction
Compactly Representing First-Order Structures for Static Analysis
Spring 2016 Program Analysis and Verification
Program Analysis and Verification
Compile-Time Verification of Properties of Heap Intensive Programs
Spring 2016 Program Analysis and Verification
Symbolic Implementation of the Best Transformer
Program Analysis and Verification
Parametric Shape Analysis via 3-Valued Logic
Program Analysis and Verification
Program Analysis and Verification
Symbolic Characterization of Heap Abstractions
A Semantics for Procedure Local Heaps and its Abstractions
Presentation transcript:

Compile-Time Verification of Properties of Heap Intensive Programs Mooly Sagiv Thomas Reps Reinhard Wilhelm

... and also Tel-Aviv University –G. Arnold –I. Bogudlov –G. Erez –N. Dor –T. Lev-Ami –R. Manevich –R. Shaham –A. Rabinovich –N. Rinetzky –G. Yorsh –A. Warshavsky Universität des Saarlandes –Jörg Bauer –Ronald Biber University of Wisconsin –F. DiMaio –D. Gopan –A. Loginov IBM Research –J. Field –H. Kolodner –M. Rodeh –E. Yahav Microsoft Research –G. Ramalingam University of Massachusetts –N. Immerman –B. Hesse The Technical University of Denmark –H.R. Nielson –F. Nielson Weizmann Institute/NYU –A. Pnueli Inria –B. Jeannet

Shape Analysis Determine the possible shapes of a dynamically allocated data structure at given program point Relevant questions: –Does x.next point to a shared element? –Does a variable point p to an allocated element every time p is dereferenced –Does a variable point to an acyclic list? –Does a variable point to a doubly-linked list? –? –Can a procedure create a memory-leak

Problem Programs with pointers and dynamically allocated data structures are error prone Automatically prove correctness Identify subtle bugs at compile time

Interesting Properties of Heap Manipulating Programs No null dereference No memory leaks Preservation of data structure invariant Correct API usage Partial correctness Total correctness

Example rotate(List first, List last) { if ( first != NULL) { last  next = first; first = first  next; last = last  next; last  next = NULL; } last first n n n last first n n n n last first n n n n last first n n n n last first n n n

Interesting Properties rotate(List first, List last) { if ( first != NULL) { last  next = first; first = first  next; last = last  next; last  next = NULL; } No null-de references

Interesting Properties rotate(List first, List last) { if ( first != NULL) { last  next = first; first = first  next; last = last  next; last  next = NULL; } No null-de references No memory leaks

Interesting Properties rotate(List first, List last) { if ( first != NULL) { last  next = first; first = first  next; last = last  next; last  next = NULL; } No null-de references No memory leaks Returns an acyclic linked list Partially correct

Partial Correctness List InsertSort(List x) { List r, pr, rn, l, pl; r = x; pr = NULL; while (r != NULL) { l = x; rn = r  n; pl = NULL; while (l != r) { if (l  data > r  data) { pr  n = rn; r  n = l; if (pl = = NULL) x = r; else pl  n = r; r = pr; break; } pl = l; l = l  n; } pr = r; r = rn; } return x; } typedef struct list_cell { int data; struct list_cell *n; } *List;

Partial Correctness List quickSort(List p, List q) { if(p==q || q == NULL) return p; List h = partition(p,q); List x = p  n; p  n = NULL; List low = quickSort(h, p); List high = quickSort(x, NULL); p  n = high; return low; }

Challenges Specification –Desired properties –Program Semantics Automatic Verification –Program Semantics  Desired properties –Undecidable even for simple programs and prooperties

Plan Concrete Interpretation of Heap Canonical Heap Abstraction Abstract Interpretation using Canonical Abstraction The TVLA system Applications Techniques for scaling

Logical Structures (Labeled Graphs) Nullary relation symbols Unary relation symbols Binary relation symbols FO TC over TC,  express logical structure properties Logical Structures provide meaning for relations –A set of individuals (nodes) U –Interpretation of relation symbols in P p 0 ()  {0,1} p 1 (v)  {0,1} p 2 (u,v)  {0,1} Fixed

Representing Stores as Logical Structures Locations  Individuals Program variables  Unary relations Fields  Binary relations Example –U = {u1, u2, u3, u4, u5} –x = {u1}, p = {u3} –n = {,,, } u1u2u3u4u5 x n nn n p

Example: List Creation typedef struct node { int val; struct node *next; } *List; ✔ No null dereferences ✔ No memory leaks ✔ Returns acyclic list List create (…) { List x, t; x = NULL; while (…) do { t = malloc(); t  next=x; x = t ;} return x; }

Example: Concrete Interpretation x t n n t x n x t n x t n n x t n n x t t x n t t n t x t x t x empty return x x = t t =malloc(..); t  next=x; x = NULL T F

Concrete Interpretation Rules StatementUpdate formula x =NULLx’(v)= 0 x= malloc()x’(v) = IsNew(v) x=yx’(v)= y(v) x=y  nextx’(v)=  w: y(w)  n(w, v) x  next=y n’(v, w) = (  x(v)  n(v, w))  (x(v)  y(w))

Invariants No garbage  v:  {x  PVar}  w: x(w)  n*(w, v) Acyclic list(x)  v, w: x(v)  n*(v, w)   n + (w, v) Reverse (x)  v, w, r: x(v)  n*(v, w)  n(w, r)  n’(r, w)

Why use logical structures? Naturally model pointers and dynamic allocation No a priori bound on number of locations Use formulas to express semantics Indirect store updates using quantifiers Can model other features –Concurrency –Abstract fields

Example: Abstract Interpretation t x n x t n x t n n x t t x n t t n t x t x t x empty x t n n x t n n n x t n t n x n x t n n return x x = t t =malloc(..); t  next=x; x = NULL T F

3-Valued Logical Structures A set of individuals (nodes) U Relation meaning –Interpretation of relation symbols in P p 0 ()  {0,1, 1/2} p 1 (v)  {0,1, 1/2} p 2 (u,v)  {0,1, 1/2} A join semi-lattice: 0  1 = 1/2

Canonical Abstraction (  ) Partition the individuals into equivalence classes based on the values of their unary relations –Every individual is mapped into its equivalence class Collapse relations via  –p S (u’ 1,..., u’ k ) =  {p B (u 1,..., u k ) | f(u 1 )=u’ 1,..., f(u k )=u’ k ) } At most 2 A abstract individuals

Canonical Abstraction x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2 u3 u1 x t u2,3 n n n n

x t n n u2 u1 u3 Canonical Abstraction x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2,3 n n n   

Canonical Abstraction and Equality Summary nodes may represent more than one element (In)equality need not be preserved under abstraction Explicitly record equality Summary nodes are nodes with eq(u, u)=1/2

Canonical Abstraction and Equality x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2 u3 u1 x t u2,3 eq n n n n  eq eq  u2,3  eq

Canonical Abstraction x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2 u3 n n u1 x t u2,3 n n

Canonical Abstraction Partition the individuals into equivalence classes based on the values of their unary relations –Every individual is mapped into its equivalence class Collapse relations via  –p S (u’ 1,..., u’ k ) =  {p B (u 1,..., u k ) | f(u 1 )=u’ 1,..., f(u’ k )=u k ) } At most 2 A abstract individuals

Canonical Abstraction x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2 u3 n n u1 x t u2,3 n n

Limitations Information on summary nodes is lost

Increasing Precision Global invariants –User-supplied, or consequence of the semantics of the programming language –Naturally expressed in FO TC Record extra information in the concrete interpretation –Tunes the abstraction –Refines the concretization

Cyclicity relation c[x]() =  v 1,v 2 : x(v 1 )  n * (v 1,v 2 )  n + (v 2, v 2 ) c[x]()=0 u1u1 x t u2u2 unun … u1 x t u 2..n n n n n n

Cyclicity relation c[x]() =  v 1,v 2 : x(v 1 )  n * (v 1,v 2 )  n + (v 2, v 2 ) c[x]()=1 u1u1 x t u2u2 unun … u1 x t u 2..n n n n n n n

Heap Sharing relation is(v)=0 u1u1 x t u2u2 unun … u1 x t u 2..n n n is(v) =  v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2 is(v)=0 n n n

Heap Sharing relation is(v)=0 u1u1 x t u2u2 unun … is(v) =  v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2 is(v)=1is(v)=0 n n n n u1 x t u2 n is(v)=0is(v)=1is(v)=0 n u 3..n n n

Concrete Interpretation Rules StatementUpdate formula x =NULLx’(v)= 0 x= malloc()x’(v) = IsNew(v) is’(v) =is(v)   IsNew(v) x=yx’(v)= y(v) x=y  nextx’(v)=  w: y(w)  n(w, v) x  next=NULLn’(v, w) =  x(v)  n(v, w) is’(v) = is(v)   v1, v2: n(v1, v)   x(v1)  n(v2, v)   x(v2)   eq(v1, v2)

Reachability relation t[n](v1, v2) = n * (v1,v2) u1u1 x t u2u2 unun n n n t[n] u1 x t u 2..n n n t[n]...

List Segments u1u1 x u2u2 u5u5 n u3u3 u4u4 u6u6 u7u7 u8u8 n n nn n n y u1u1 x u 2,3,4,6,7,8 u5u5 n n y

Reachability from a variable r[n,y](v) =  w: y(w)  n*(w, v) u1u1 x u2u2 u5u5 n u3u3 u4u4 u6u6 u7u7 u8u8 n n nn n n y r[n,y]=0 r[n,y]=1 u1u1 x u 2,3,4 u5u5 n n n y u 6,7,8

inOrder(v) =  w: n(v, w)  data(v)  data(w) c fb (v) =  w: f(v, w)  b(w, v) tree(v) dag(v) Weakest Precondition [Ramalingam, PLDI’02] Learned via Inductive Logic Programming [Loginov, CAV’05] Counterexample guided refinement Additional Instrumentation relations

Instrumentation (Summary) Refines the abstraction Adds global invariants But requires update-formulas (generated automatically in TVLA2) is(v) =  v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2 is(v)   v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2  (S # )={S : S  ,  (S)= S # }

Abstract Interpretation Best Transformers Kleene Evaluation Kleene Evaluation + semantic reduction –Focus Based Transformers

 y x y x Evaluate update formulas y x y x inverse canonical y x y x canonical abstraction x y Best Transformer Transformer (x = x  n)

Boolean Connectives [Kleene]

Embedding A logical structure B can be embedded into a structure S via an onto function f (B  f S) if the basic relations are preserved, i.e., p B (u 1,.., u k )  p S (f(u 1 ),..., f(u k )) S is a tight embedding of B with respect to f if: –S does not lose unnecessary information, i.e., –p S (u # 1,.., u # k ) =  {p B (u 1..., u k ) | f(u 1 )=u # 1,..., f(u k )=u # k } Canonical Abstraction is a tight embedding

Embedding and Concretization Two natural choices –B  1 (S) if B can be embedded into S via an onto function f (B  f S) –B  2 (S) if S is a tight embedding of B

Embedding Theorem Assume B  f S, p B (u 1,.., u k )  p S (f(u 1 ),..., f(u k )) Then every formula  is preserved: –If    = 1 in S, then    = 1 in B –If    = 0 in S, then    = 0 in B –If    = 1/2 in S, then    could be 0 or 1 in B

Embedding Theorem u1 x t u2,3 n n  v: x(v) 1=Yes  v: x(v)  t(v) 1=Yes  v: x(v)  y(v) 0=No  v 1,v 2 : x(v 1 )  n(v 1, v 2 ) ½=Maybe  v 1,v 2 : x(v 1 )  n(v 1, v 2 )  n*(v 2, v 1 ) 0=No  v 1,v 2 : x(v 1 )  n*(v 1,v 2 )  n+(v 2, v 2 ) 1/2=Maybe

x y Kleene Transformer (x = x  n)

Semantic Reduction Improve the precision of the analysis by recovering properties of the program semantics A Galois connection (L 1, , , L 2 ) An operation op:L 2  L 2 is a semantic reduction –  l  L 2 op(l)  l –  (op(l)) =  (l) Can be applied before and after basic operations l L1L1 L2L2   op

“Focus”-Based Transformer (x = x  n) Kleene Evaluation canonical Focus(x  n) “Partial  ” x y y x y x y x x x y y y x x y y y

The Focus Operation Focus: Formula  (P(3-Struct)  P(3-Struct)) Generalizes materialization For every formula  –Focus(  )(X) yields structure in which  evaluates to a definite values in all assignments –Only maximal in terms of embedding –Focus(  ) is a semantic reduction –But Focus(  )(X) may be undefined for some X

“Focus”-Based Transformer (x = x  n) Kleene Evaluation canonical Focus(x  n) “Partial  ” x y y x y x y x x x y y y x x y y y  w: x(w)  n(w, v)

The Coercion Principle Another Semantic Reduction Can be applied after Focus or after Update or both Increase precision by exploiting some structural properties possessed by all stores (Global invariants) Structural properties captured by constraints Apply a constraint solver

Apply Constraint Solver x y y r[n,y](v)=1 x x y is(v)=0 x y y

Sources of Constraints Properties of the operational semantics Domain specific knowledge –Instrumentation predicates User supplied

Example Constraints x(v1)  x(v2)  eq(v1, v2) n(v, v1)  n(v,v2)  eq(v1, v2) n(v1, v)  n(v2,v)   eq(v1, v2)  is(v) n*(v3, v4)  t[n](v1, v2)

Apply Constraint Solver y is(v)=0 x x(v1)  x(v2)  eq(v1, v2) y is(v)=0 x

Apply Constraint Solver y is(v)=0 x n(v1, v)  n(v2,v)   eq(v1, v2)  is(v) n(v1, v)   is(v)   eq(v1, v2)  n(v2, v) y is(v)=0 x

Summary Transformers Kleene evaluation yields sound solution Focus is statement specific implements partial concretization Coerce applies global constraints

Three Valued Logic Analysis (TVLA) T. Lev-Ami & R. Manevich Input (FO TC) –Concrete interpretation rules –Definition of instrumentation relations –Definition of safety properties –First Order Transition System (TVP) Output –Warnings (text) –The 3-valued structure at every node (invariants)

TVLA inputs TVP - Three Valued ProgramTVP - Three Valued Program –Predicate declarationPredicate declaration –Action definitions SOS Statements Conditions –Control flow graph TVS - Three Valued StructureTVS - Three Valued Structure Program independent

Null Dereferences Demo typedef struct element { int value; struct element  n; } Element bool search( int value, Element  x) { Element  c = x while ( x != NULL ) { if (c  val == value) return TRUE; c = c  n; } return FALSE; } 276

Proving Correctness of Sorting Implementations (Lev-Ami, Reps, S, Wilhelm ISSTA 2000) Partial correctness –The elements are sorted –The list is a permutation of the original list Termination –At every loop iterations the set of elements reachable from the head is decreased

Sortedness u1u1 x t u2u2 unun n n n dle u1 x t u 2..n n n dle...

Example: Sortedness inOrder(v) =  v1: n(v,v1)  dle(v, v1) u1u1 x t u2u2 unun n n dle u1 x t u 2..n n n dle inOrder = 1 n...

Example: InsertSort Run Demo List InsertSort(List x) { List r, pr, rn, l, pl; r = x; pr = NULL; while (r != NULL) { l = x; rn = r  n; pl = NULL; while (l != r) { if (l  data > r  data) { pr  n = rn; r  n = l; if (pl = = NULL) x = r; else pl  n = r; r = pr; break; } pl = l; l = l  n; } pr = r; r = rn; } return x; } typedef struct list_cell { int data; struct list_cell *n; } *List; pred.tvp actions.tvp

Example: InsertSort Run Demo List InsertSort(List x) { if (x == NULL) return NULL pr = x; r = x->n; while (r != NULL) { pl = x; rn = r->n; l = x->n; while (l != r) { pr->n = rn ; r->n = l; pl->n = r; r = pr; break; } pl = l; l = l->n; } pr = r; r = rn; } typedef struct list_cell { int data; struct list_cell *n; } *List; 14

Example: Mark and Sweep void Sweep() { unexplored = Universe collected =  while (unexplored   ) { x = SelectAndRemove(unexplored) if (x  marked) collected = collected  {x} } assert(collected = = Universe – Reachset(root) ) } void Mark(Node root) { if (root != NULL) { pending =  pending = pending  {root} marked =  while (pending   ) { x = SelectAndRemove(pending) marked = marked  {x} t = x  left if (t  NULL) if (t  marked) pending = pending  {t} t = x  right if (t  NULL) if (t  marked) pending = pending  {t} } assert(marked = = Reachset(root)) }

Example: Mark and Sweep void Mark(Node root) { if (root != NULL) { pending =  pending = pending  {root} marked =  while (pending   ) { x = SelectAndRemove(pending) marked = marked  {x} t = x  left if (t  NULL) if (t  marked) pending = pending  {t} t = x  right if (t  NULL) if (t  marked) pending = pending  {t} } assert(marked = = Reachset(root)) } Run Demo pred.tvp pred_set.tvp actions.tvp actions_set.tvp

Example: Mark and Sweep void Sweep() { unexplored = Universe collected =  while (unexplored   ) { x = SelectAndRemove(unexplored) if (x  marked) collected = collected  {x} } assert(collected = = Universe – Reachset(root) ) } void Mark(Node root) { if (root != NULL) { pending =  pending = pending  {root} marked =  while (pending   ) { x = SelectAndRemove(pending) marked = marked  {x} t = x  left if (t  NULL) if (t  marked) pending = pending  {t} /* t = x  right * if (t  NULL) * if (t  marked) * pending = pending  {t} */ } } assert(marked = = Reachset(root)) } Run Demo

Lightweight Specification  "correct usage" rules a client must follow  "call open() before read()" Certification does the client program satisfy the lightweight specification? Verification of Safety Properties (PLDI’02, 04) Component a library with cleanly encapsulated state Client a program that uses the library The Canvas Project (with IBM Watson) (Component Annotation, Verification and Stuff)

Prototype Implementation Applied to several example programs –Up to 5000 lines of Java Used to verify –Absence of concurrent modification exception –JDBC API conformance –IOStreams API conformance

Summary Canonical abstraction is powerful –Intuitive –Adapts to the property of interest –More instrumentation may mean more efficient Used to verify interesting program properties –Very few false alarms But scaling is an issue

Scaling for Larger Programs Staged Analyses Represent 3-valued structures with BDDs [Manevich SAS’02] Coercer Abstractions [Manevich SAS’04]Coercer Abstractions Reduce static costs Handling procedures Assume/Guarantee Reasoning –Use procedure specifications [Yorsh, TACAS’04] –Decision procedures for linked data structures [Immerman, CAV’04, Lev-Ami, CADE’05, Yorsh FOSSACS06]

Scaling Staged analysis Reduce static costs Controlled complexity –More coarse abstractions [Manevich SAS’04] –Counter example based refinement Exploit “good” program properties –Encapsulation & Data abstraction Handle procedures efficiently

Partially Disjunctive Heap Abstraction (Manevich, SAS’04) Use a heap-similarity criterion –We defined similarity by universe congruence Merge similar heaps Avoid merging dissimilar heaps The same concrete state can belong to more than one abstract value

x t n x t n n Partially Disjunctive Abstraction x t n

Running times

Interprocedural Analysis Noam Rinetzky

How to handle procedures? Pure functions –Procedure  input/output relation –No side-effects main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x is even } int inc(int p) { return 2 + p - 1; } pret …

How to handle procedures? Pure functions –Procedure  input/output relation –No side-effects main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x is even } int inc(int p) { return 2 + p - 1; } wxyz pret EvenOdd Even EEEE OEEE OOEE

What about global variables? Procedures have side-effects Easy fix int g = 0; main() { int w=0,x=0,y=0,z=0; w = inc(y); x = inc(z); assert: w+x+g is even } int inc(int p) { g = p; return 2 + p - 1; } int g = 0; g = p; pgretg’ 0010 ………… pgretg’ EvenE/OOddEven OddE/OEvenOdd

n append(y,z) n n But what about pointers and heap? Pointers Aliasing Destructive update Heap Global resource Anonymous objects How to tabulate append? z x.n.n ~ y x.n.n.n ~ z y.n=z x y

main() { append(y,z); } Procedure  input/output relation –Not reachable  Not effected –proc: local (  reachable) heap  local heap How to tabulate procedures? append(List p, List q) { … } n x n t y n z n p q p q n y z p q n x n t

main() { append(y,z); } y n n z n How to handle sharing? External sharing may break the functional view append(List p, List q) { … } n n t z x n p q n n p q n n t y p q n n n x

append(y,z); What’s the difference? n n t z y x 1 st Example2 nd Example append(y,z); n x n t y z

Cutpoints An object is a cutpoint for an invocation –Reachable from actual parameters –Not pointed to by an actual parameter –Reachable without going through a parameter append(y,z) y x n n t z y n n t z n n

Main Results(POPL’05) Concrete operational semantics –Sequential programs –Local heap –Track cutpoints –Storeless good for shape abstractions –Observational equivalent with “standard” global store-based heap semantics Java and “clean” C Abstractions –Shape Analysis of singly-linked lists –May-alias [Deutsch, PLDI 04]

Introducing local heap semantics Operational semantics Abstract transformer Local heap Operational semantics ~ ’’ ’’

Main results(SAS’05) Cutpoint freedom Non-standard concrete semantics –Verifies that an execution is cutpoint-free –Local heaps Interprocedural shape analysis –Conservatively verifies program is cutpoint free Desired properties –Partial correctness of quicksort –Procedure summaries Prototype implementation

Cutpoint freedom Cutpoint-free –Invocation: has no cutpoints –Execution: every invocation is cutpoint-free –Program: every execution is cutpoint-free x y n n t z y n n t z x append(y,z)

Programming model Single threaded Procedures Value parameters  Formal parameters not modified Recursion Heap Recursive data structures Destructive update  No explicit addressing (&)  No pointer arithmetic

Memory states A memory state encodes a local heap –Local variables of the current procedure invocation –Relevant part of the heap Relevant  Reachable x y n n t z p q main append

Abstract semantics Conservatively apply statements using 3-valued logic (with the non-standard semantics) –Use canonical abstraction –Reinterpret FO formulas using Kleene value

1. Verify cutpoint freedom 2 Compute input … Execute callee … 3 Combine output append body Procedure calls p q p n q y z x n y z x n n append(y,z) append(p,q)

Tabulation exists? y Interprocedural shape analysis call f(x) p x y x p

p y Interprocedural shape analysis call f(x) x y p p Tabulation exists? Analyze f p x

Interprocedural shape analysis Procedure  input/output relation p q n n rqrq rprp rprp q p n n rprp rqrq n rprp q q rqrq rqrq q p rprp q p rprp rqrq n rprp rqrq … Input Output rprp

Interprocedural shape analysis Reusable procedure summaries –Heap modularity q p rprp rqrq n rprp q p rprp rqrq g h i k n n g h i k n rgrg rgrg rgrg rhrh rhrh riri rkrk rhrh rgrg riri rkrk append(h,i) y x z n n nn rxrx ryry rzrz y x z n n n rzrz rxrx ryry rxrx rxrx rxrx ryry rxrx rxrx append(y,z) y n z x y z x rxrx ryry rzrz ryry rxrx ryry

Prototype implementation TVLA based analyzer Soot-based Java front-end Parametric abstraction Data structureVerified properties Singly linked listCleanness, acyclicity Sorting (of SLL)+ Sortedness Unshared binary treesCleaness, tree-ness

Iterative vs. Recursive (SLL) 585

Inline vs. Procedural abstraction // Allocates a list of // length 3 List create3(){ … } main() { List x1 = create3(); List x2 = create3(); List x3 = create3(); List x4 = create3(); … }

Call string vs. Relational vs. CPF [Rinetzky and Sagiv, CC’01] [Jeannet et al., SAS’04]

Summary Cutpoint freedom Non-standard operational semantics Interprocedural shape analysis –Partial correctness of quicksort Prototype implementation

Summary Reasoning about the heap is challenging [Parametric] Abstraction is necessary Canonical abstraction is powerful Useful for programs with arrays [Gopan POPL’05] Information lost by canonical abstraction –Correlations between list lengths