Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University

Slides:



Advertisements
Similar presentations
Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Advertisements

Abstract Interpretation Part II
Predicate Abstraction and Canonical Abstraction for Singly - linked Lists Roman Manevich Mooly Sagiv Tel Aviv University Eran Yahav G. Ramalingam IBM T.J.
Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.
Interprocedural Shape Analysis for Recursive Programs Noam Rinetzky Mooly Sagiv.
Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.
3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
1 E. Yahav School of Computer Science Tel-Aviv University Verifying Safety Properties using Separation and Heterogeneous Abstractions G. Ramalingam IBM.
1 Lecture 07 – Shape Analysis Eran Yahav. Previously  LFP computation and join-over-all-paths  Inter-procedural analysis  call-string approach  functional.
1 Lecture 08(a) – Shape Analysis – continued Lecture 08(b) – Typestate Verification Lecture 08(c) – Predicate Abstraction Eran Yahav.
Compile-Time Verification of Properties of Heap Intensive Programs Mooly Sagiv Thomas Reps Reinhard Wilhelm
Static Program Analysis via Three-Valued Logic Thomas Reps University of Wisconsin Joint work with M. Sagiv (Tel Aviv) and R. Wilhelm (U. Saarlandes)
1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.
Local Heap Shape Analysis Noam Rinetzky Tel Aviv University Joint work with Jörg Bauer Universität des Saarlandes Thomas Reps University of Wisconsin Mooly.
Counterexample-Guided Focus TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A AA A A Thomas Wies Institute of.
Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.
Finite Differencing of Logical Formulas for Static Analysis Thomas Reps University of Wisconsin Joint work with M. Sagiv and A. Loginov.
3-Valued Logic Analyzer (TVP) Part II Tal Lev-Ami and Mooly Sagiv.
1 Motivation Dynamically allocated storage and pointers are an essential programming tools –Object oriented –Modularity –Data structure But –Error prone.
1 Verifying Temporal Heap Properties Specified via Evolution Logic Eran Yahav, Tom Reps, Mooly Sagiv and Reinhard Wilhelm
Model Checking of Concurrent Software: Current Projects Thomas Reps University of Wisconsin.
1 Eran Yahav and Mooly Sagiv School of Computer Science Tel-Aviv University Verifying Safety Properties.
Modular Shape Analysis for Dynamically Encapsulated Programs Noam Rinetzky Tel Aviv University Arnd Poetzsch-HeffterUniversität Kaiserlauten Ganesan RamalingamMicrosoft.
Overview of program analysis Mooly Sagiv html://
Detecting Memory Errors using Compile Time Techniques Nurit Dor Mooly Sagiv Tel-Aviv University.
Modular Shape Analysis for Dynamically Encapsulated Programs Noam Rinetzky Tel Aviv University Arnd Poetzsch-HeffterUniversität Kaiserlauten Ganesan RamalingamMicrosoft.
1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6
Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
A Semantics for Procedure Local Heaps and its Abstractions Noam Rinetzky Tel Aviv University Jörg Bauer Universität des Saarlandes Thomas Reps University.
Static Program Analysis via Three-Valued Logic Thomas Reps University of Wisconsin Joint work with M. Sagiv (Tel Aviv) and R. Wilhelm (U. Saarlandes)
Dagstuhl Seminar "Applied Deductive Verification" November Symbolically Computing Most-Precise Abstract Operations for Shape.
Program Analysis and Verification Noam Rinetzky Lecture 10: Shape Analysis 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Thread Quantification for Concurrent Shape Analysis Josh BerdineMSR Cambridge Tal Lev-AmiTel Aviv University Roman ManevichTel Aviv University Mooly Sagiv.
June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.
T. Lev-Ami, R. Manevich, M. Sagiv TVLA: A System for Generating Abstract Interpreters A. Loginov, G. Ramalingam, E. Yahav.
TVLA: A system for inferring Quantified Invariants Tal Lev-Ami Tom Reps Mooly Sagiv Reinhard Wilhelm Greta Yorsh.
1 Employing decision procedures for automatic analysis and verification of heap-manipulating programs Greta Yorsh under the supervision of Mooly Sagiv.
Shape Analysis Overview presented by Greta Yorsh.
Shape Analysis via 3-Valued Logic Mooly Sagiv Thomas Reps Reinhard Wilhelm
Symbolically Computing Most-Precise Abstract Operations for Shape Analysis Greta Yorsh Thomas Reps Mooly Sagiv Tel Aviv University University of Wisconsin.
Symbolic Execution with Abstract Subsumption Checking Saswat Anand College of Computing, Georgia Institute of Technology Corina Păsăreanu QSS, NASA Ames.
1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6
Schedule 27/12 Shape Analysis 3/1 Static Analysis in Soot 10/1 Static Analysis in LLVM 17/1 Advanced Topics: Concurrent programs and TAU research topics.
Data Structures and Algorithms for Efficient Shape Analysis by Roman Manevich Prepared under the supervision of Dr. Shmuel (Mooly) Sagiv.
1 Combining Abstract Interpreters Mooly Sagiv Tel Aviv University
Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications Chapter.
1 Program Analysis via 3-Valued Logic Mooly Sagiv, Tal Lev-Ami, Roman Manevich Tel Aviv University Thomas Reps, University of Wisconsin, Madison Reinhard.
Quantified Data Automata on Skinny Trees: an Abstract Domain for Lists Pranav Garg 1, P. Madhusudan 1 and Gennaro Parlato 2 1 University of Illinois at.
Program Analysis via 3-Valued Logic Thomas Reps University of Wisconsin Joint work with Mooly Sagiv and Reinhard Wilhelm.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
1 Numeric Abstract Domains Mooly Sagiv Tel Aviv University Adapted from Antoine Mine.
Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Shape & Alias Analyses Jaehwang Kim and Jaeho Shin Programming Research Laboratory Seoul National University
1 Simulating Reachability using First-Order Logic with Applications to Verification of Linked Data Structures Tal Lev-Ami 1, Neil Immerman 2, Tom Reps.
Interprocedural shape analysis for cutpoint-free programs Noam Rinetzky Tel Aviv University Joint work with Mooly Sagiv Tel Aviv University Eran Yahav.
Static Analysis of Concurrent Programs Mooly Sagiv.
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
Putting Static Analysis to Work for Verification A Case Study Tal Lev-Ami Thomas Reps Mooly Sagiv Reinhard Wilhelm.
Interprocedural shape analysis for cutpoint-free programs
Partially Disjunctive Heap Abstraction
Compactly Representing First-Order Structures for Static Analysis
Spring 2016 Program Analysis and Verification
Program Analysis and Verification
Compile-Time Verification of Properties of Heap Intensive Programs
Symbolic Implementation of the Best Transformer
Parametric Shape Analysis via 3-Valued Logic
Symbolic Characterization of Heap Abstractions
A Semantics for Procedure Local Heaps and its Abstractions
Presentation transcript:

Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University

Topics A new abstract domain for static analysis Abstract dynamically allocated memory TVLA: A system for generating abstract interpreters Applications

Motivation Dynamically allocated storage and pointers are essential programming tools –Object oriented –Modularity –Data structure But –Error prone –Inefficient Static analysis can be very useful here

A Pathological C Program a = malloc(…) ; b = a; free (a); c = malloc (…); if (b == c) printf(“unexpected equality”);

Dereference of NULL pointers typedef struct element { int value; struct element *next; } Elements bool search(int value, Elements *c) { Elements *elem; for (elem = c; c != NULL; elem = elem->next;) if (elem->val == value) return TRUE; return FALSE

Dereference of NULL pointers typedef struct element { int value; struct element *next; } Elements bool search(int value, Elements *c) { Elements *elem; for (elem = c; c != NULL; elem = elem->next;) if (elem->val == value) return TRUE; return FALSE potential null de-reference

Memory leakage Elements* reverse(Elements *c) { Elements *h,*g; h = NULL; while (c!= NULL) { g = c->next; h = c; c->next = h; c = g; } return h; typedef struct element { int value; struct element *next; } Elements

Memory leakage Elements* reverse(Elements *c) { Elements *h,*g; h = NULL; while (c!= NULL) { g = c->next; h = c; c->next = h; c = g; } return h; leakage of address pointed-by h typedef struct element { int value; struct element *next; } Elements

Memory leakage Elements* reverse(Elements *c) { Elements *h,*g; h = NULL; while (c!= NULL) { g = c->next; h = c; c->next = h; c = g; } return h; typedef struct element { int value; struct element *next; } Elements ✔ No memory leaks

Example: List Creation typedef struct node { int val; struct node *next; } *List; ✔ No null dereferences ✔ No memory leaks ✔ Returns acyclic list List create (…) { List x, t; x = NULL; while (…) do { t = malloc(); t  next=x; x = t ;} return x; }

Example: Collecting Interpretation x t n n t x n x t n x t n n x t n n x t t x n t t n t x t x t x empty return x x = t t =malloc(..); t  next=x; x = NULL T F

Example: Abstract Interpretation t x n x t n x t n n x t t x n t t n t x t x t x empty x t n n x t n n n x t n t n x n x t n n return x x = t t =malloc(..); t  next=x; x = NULL T F

Challenge 1 - Memory Allocation The number of allocated objects/threads is not known Concrete state space is infinite How to guarantee termination?

Challenge 2 - Destructive Updates The program manipulates states using destructive updates –e  next = t Hard to define concrete interpretation Harder to define abstract interpretation

Challenge 2 - Destructive Update Unsound  y p x y p x n p x n y  next = NULL y p x n y p x p x n

Challenge 2 - Destructive Update Imprecise  y  next = NULL y p x n y p x n

Challenge 3 – Re-establishing Data Structure Invariants Data-structure invariants typically only hold at the beginning and end of ADT operations Need to verify that data-structure invariants are re-established

Challenge 3 – Re-establishing Data Structure Invariants rotate(List first, List last) { if ( first != NULL) { last  next = first; first = first  next; last = last  next; last  next = NULL; } last first n n n last first n n n n last first n n n n last first n n n n last first n n n

Plan Concrete interpretation Canonical abstraction Abstract interpretation using canonical abstraction The TVLA system

Traditional Heap Interpretation States = Two level stores –Env: Var  Values –fields: Loc  Values –Values=Loc  Atoms Example –Env = [x  30, p  79] –next = [30  40, 40  50, 50  79, 79  90] –val = [30  1, 40  2, 50  3, 79  4, 90  5] x p

Predicate Logic Vocabulary –A finite set of predicate symbols P each with a fixed arity Logical Structures S provide meaning for predicates –A set of individuals (nodes) U –p S : (U S ) k  {0, 1} FO TC over TC,  express logical structure properties

Representing Stores as Logical Structures Locations  Individuals Program variables  Unary predicates Fields  Binary predicates Example –U = {u1, u2, u3, u4, u5} –x = {u1}, p = {u3} –n = {,,, } u1u2u3u4u5 x n nn n p

Formal Semantics of First Order Formulae For a structure S= Formulae  with LVar free variables Assignment z: LVar  U S    S (z): {0, 1}  1  S (z)=1  p (v 1, v 2, …, v k )  S (z)=p S (z(v 1 ), z(v 2 ), …, z(v k ))  0  S (z)=0

Formal Semantics of First Order Formulae For a structure S= Formulae  with LVar free variables Assignment z: LVar  U S    S (z): {0, 1}   1   2  S (z)=max (   1  S (z),   2  S (z))   1   2  S (z)=min (   1  S (z),   2  S (z))   1  S (z)=1-   1  S (z)   v:  1  S (z)=max {   1  S (z[v  u]) : u  U S }

Formal Semantics of Transitive Closure For a structure S= Formulae  with LVar free variables Assignment z: LVar  U S    S (z): {0, 1}  p*(v 1, v 2 )  S (z) = max {u 1,..., u k  U, Z(v 1 )=u 1, Z(v 2 )=u k } min{1  i < k} p S (u i, u i+1 )

Concrete Interpretation Rules StatementUpdate formula x =NULLx’(v)= 0 x= malloc()x’(v) = IsNew(v) x=yx’(v)= y(v) x=y  nextx’(v)=  w: y(w)  n(w, v) x  next=y n’(v, w) = (  x(v)  n(v, w))  (x(v)  y(w))

Invariants No memory leaks  v:  {x  PVar}  w: x(w)  n*(w, v) Acyclic list(x)  v, w: x(v)  n*(v, w)   n + (w, v) Reverse (x)  v, w, r: x(v)  n*(v, w)  n(w, r)  n’(r, w)

Why use logical structures? Naturally model pointers and dynamic allocation No a priori bound on number of locations Use formulas to express semantics Indirect store updates using quantifiers Can model other features –Concurrency –Abstract fields

Why use logical structures? Behaves well under abstraction Enables automatic construction of abstract interpreters from concrete interpretation rules (TVLA)

Collecting Interpretation The set of reachable logical structures in every program point Statements operate on sets of logical structures Cannot be directly computed for programs with unbounded store and loops x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1u1 x t empty u1u1 x t u2u2 n u1u1 x t u2u2 unun … n n n

Plan Concrete interpretation Canonical abstraction TVLA

Canonical Abstraction Convert logical structures of unbounded size into bounded size Guarantees that number of logical structures in every program is finite Every first-order formula can be conservatively interpreted

1: True 0: False 1/2: Unknown A join semi-lattice: 0  1 = 1/2 Kleene Three-Valued Logic   1/2 Information order Logical order

Boolean Connectives [Kleene]

3-Valued Logical Structures A set of individuals (nodes) U Predicate meaning –p S : (U S ) k  {0, 1, 1/2}

Canonical Abstraction Partition the individuals into equivalence classes based on the values of their unary predicates –Every individual is mapped into its equivalence class Collapse predicates via  –p S (u’ 1,..., u’ k ) =  {p B (u 1,..., u k ) | f(u 1 )=u’ 1,..., f(u’ k )=u’ k ) } At most 2 A abstract individuals

Canonical Abstraction x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2 u3 u1 x t u2,3 n n n n

x t n n u2 u1 u3 Canonical Abstraction x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2,3 n n n   

Canonical Abstraction and Equality Summary nodes may represent more than one element (In)equality need not be preserved under abstraction Explicitly record equality Summary nodes are nodes with eq(u, u)=1/2

Canonical Abstraction and Equality x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2 u3 u1 x t u2,3 eq n n n n  u2,3

Canonical Abstraction x = NULL; while (…) do { t = malloc(); t  next=x; x = t } u1 x t u2 u3 n n u1 x t u2,3 n n

Challenges: Heap & Concurrency [Yahav POPL’01] Concurrency with the heap is evil… Java threads are just heap allocated objects Data and control are strongly related –Thread-scheduling info may require understanding of heap structure (e.g., scheduling queue) –Heap analysis requires information about thread scheduling Thread t1 = new Thread(); Thread t2 = new Thread(); … t = t1; … t.start();

Configurations – Example at[l_C] rval[myLock] held_by at[l_1] rval[myLock] at[l_0] at[l_1] rval[myLock] blocked l_0: while (true) { l_1: synchronized(myLock) { l_C:// critical actions l_2: } l_3: }

Concrete Configuration at[l_C] rval[myLock] held_by at[l_1] rval[myLock] at[l_0] at[l_1] rval[myLock] blocked

Abstract Configuration at[l_C] rval[myLock] held_by blocked at[l_1] rval[myLock] at[l_0]

Examples Verified ProgramProperty twoLock QNo interference No memory leaks Partial correctness Producer/consumerNo interference No memory leaks Apprentice Challenge Counter increasing Dining philosophers with resource ordering Absence of deadlock MutexMutual exclusion Web ServerNo interference

Summary Canonical abstraction guarantees finite number of structures The concrete location of an object plays no significance But what is the significance of 3-valued logic?

Topics Embedding Instrumentation Abstract Interpretation [Extensions]

Embedding u1u1 u2u2 u3u3 u4u4 x u5u5 u6u6 u 12 u 34 u 56 x u 123 u 456 x

Embedding B  f S onto function f p B (u 1,.., u k )  p S (f(u 1 ),..., f(u k )) S is a tight embedding of B with respect to f if: p S (u # 1,.., u # k ) =  {p B (u 1..., u k ) | f(u 1 )=u # 1,..., f(u k )=u # k } Canonical Abstraction is a tight embedding

Embedding (cont) S 1  f S 2  every concrete state represented by S 1 is also represented by S 2 The set of nodes in S 1 and S 2 may be different –No meaning for node names (abstract locations)  (S # )= {S : 2-valued structure S, S  f S # }

Embedding Theorem Assume B  f S, p B (u 1,.., u k )  p S (f(u 1 ),..., f(u k )) Then every formula  is preserved: –If    = 1 in S, then    = 1 in B –If    = 0 in S, then    = 0 in B –If    = 1/2 in S, then    could be 0 or 1 in B

Embedding Theorem For every formula  is preserved: –If    = 1 in S, then    = 1 for all B  (S) –If    = 0 in S, then    = 0 for all B  (S) –If    = 1/2 in S, then    could be 0 or 1 in  (S)

Challenge 2 - Destructive Update Sound y  next = NULL y p x n y p x n’(v, w) =  y(v)  n(v, w)

Challenge 2 - Destructive Update Sound y  next = NULL y p x n y p x n’(v, w) =  y(v)  n(v, w)

Embedding Theorem u1 x t u2,3 n n  v: x(v) 1=Yes  v: x(v)  t(v) 1=Yes  v: x(v)  y(v) 0=No  v,w: x(v)  n(v, w) ½=Maybe  v, w: x(v)  n(v, w)  n(v, w) 0=No  v,w: x(v)  n*(v,w)  n + (w, w) 1/2=Maybe

Summary The embedding theorem eliminates the need for proving near commutavity Guarantees soundness Applied to arbitrary logics But can be imprecise

Limitations Information on summary nodes is lost Leads to useless verification

Increasing Precision User (Programming Language) supplied global invariants –Naturally expressed in FO TC Record extra information in the concrete interpretation –Tune the abstraction –Refine concretization

Cyclicity predicate c[x]() =  v 1,v 2 : x(v 1 )  n * (v 1,v 2 )  n + (v 2, v 2 ) c[x]()=0 u1u1 x t u2u2 unun … u1 x t u 2..n n n n n n

Cyclicity predicate c[x]() =  v 1,v 2 : x(v 1 )  n * (v 1,v 2 )  n + (v 2, v 2 ) c[x]()=1 u1u1 x t u2u2 unun … u1 x t u 2..n n n n n n n

Heap Sharing predicate is(v)=0 u1u1 x t u2u2 unun … u1 x t u 2..n n n is(v) =  v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2 is(v)=0 n n n

Heap Sharing predicate is(v)=0 u1u1 x t u2u2 unun … is(v) =  v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2 is(v)=1is(v)=0 n n n n u1 x t u2 n is(v)=0is(v)=1is(v)=0 n u 3..n n n

Concrete Interpretation Rules StatementUpdate formula x =NULLx’(v)= 0 x= malloc()x’(v) = IsNew(v) x=yx’(v)= y(v) x=y  nextx’(v)=  w: y(w)  n(w, v) x  next=NULLn’(v, w) =  x(v)  n(v, w) is’(v) = is(v)   v1, v2: n(v1, v)  n(v2, v)   x(v1)   x(v2)   eq(v1, v2)

Reachability predicate t[n](v1, v2) = n * (v1,v2) u1u1 x t u2u2 unun n n n t[n] u1 x t u 2..n n n t[n]

reachable-from-variable-x(v) c fb (v) =  v 1 : f(v, v 1 )  b(v 1, v) tree(v) dag(v) inOrder(v) =  v 1 : n(v, v 1 )  dle(v,v 1 ) Weakest Precondition [Ramalingam PLDI 02] Additional Instrumentation predicates

Instrumentation (Summary) Refines the abstraction Adds global invariants But requires update-formulas (generated automatically in TVLA2 is(v) =  v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2 is(v)   v 1,v 2 : n(v 1,v)  n(v 2,v)  v 1  v 2  (S # )={S : S  , S  f S # }

Plan Embedding Theorem Instrumentation Abstract interpretation using canonical abstraction TVLA

Best Conservative Interpretation (CC79) Abstraction Concretization Concrete Representation Collecting Interpretation  st  c Concrete Representation Abstract Representation Abstract Representation Abstract Interpretation  st  #

Best Transformer (x = x  n) y x y x  Evaluate update formulas y x y x inverse embedding y x y x canonic canonic abstraction x y

 y x y x Evaluate update formulas y x y x inverse embedding y x y x canonic canonic abstraction x y “Focus”- Based Transformer (x = x  n)

y x y x Evaluate update Formulas (Kleene) y x y x canonic y x y x Focus(x  n) “Partial  ” x y

Semantic Reduction Improve the precision by recovering properties of the program semantics A Galois connection (L 1, , , L 2 ) An operation op:L 2  L 2 is a semantic reduction –  l  L 2 op(l)  l –  (op(l)) =  (l) Can be applied before and after basic operations l L1L1 L2L2   op

Three Valued Logic Analysis (TVLA) T. Lev-Ami & R. Manevich Input (FO TC) –Concrete interpretation rules –Definition of instrumentation predicates –Definition of safety properties –First Order Transition System (TVP) Output –Warnings (text) –The 3-valued structure at every node (invariants)

Null Dereferences Demo typedef struct element { int value; struct element  n; } Element bool search( int value, Element  x) { Element  c = x while ( x != NULL ) { if (c  val == value) return TRUE; c = c  n; } return FALSE; } 40

TVLA inputs TVP - Three Valued Program –Predicate declarationPredicate declaration –Action definitions SOSAction definitions SOS –Control flow graphControl flow graph TVS - Three Valued StructureTVS - Three Valued Structure Program independent Demo

Challenge 1 Write a C procedure on which TVLA reports false null dereference

Proving Correctness of Sorting Implementations (Lev-Ami, Reps, S, Wilhelm ISSTA 2000) Partial correctness –The elements are sorted –The list is a permutation of the original list Termination –At every loop iterations the set of elements reachable from the head is decreased

Example: InsertSort Run Demo List InsertSort(List x) { List r, pr, rn, l, pl; r = x; pr = NULL; while (r != NULL) { l = x; rn = r  n; pl = NULL; while (l != r) { if (l  data > r  data) { pr  n = rn; r  n = l; if (pl = = NULL) x = r; else pl  n = r; r = pr; break; } pl = l; l = l  n; } pr = r; r = rn; } return x; } typedef struct list_cell { int data; struct list_cell *n; } *List; pred.tvp actions.tvp

Example: InsertSort Run Demo List InsertSort(List x) { if (x == NULL) return NULL pr = x; r = x->n; while (r != NULL) { pl = x; rn = r->n; l = x->n; while (l != r) { pr->n = rn ; r->n = l; pl->n = r; r = pr; break; } pl = l; l = l->n; } pr = r; r = rn; } typedef struct list_cell { int data; struct list_cell *n; } *List; 14

Example: Reverse Run Demo typedef struct list_cell { int data; struct list_cell *n; } *List; List reverse (List x) { List y, t; y = NULL; while (x != NULL) { t = y; y = x; x = x  next; y  next = t; } return y; }

Challenge Write a sorting C procedure on which TVLA fails to prove sortedness or permutation

Example: Mark and Sweep void Sweep() { unexplored = Universe collected =  while (unexplored   ) { x = SelectAndRemove(unexplored) if (x  marked) collected = collected  {x} } assert(collected = = Universe – Reachset(root) ) } void Mark(Node root) { if (root != NULL) { pending =  pending = pending  {root} marked =  while (pending   ) { x = SelectAndRemove(pending) marked = marked  {x} t = x  left if (t  NULL) if (t  marked) pending = pending  {t} t = x  right if (t  NULL) if (t  marked) pending = pending  {t} } assert(marked = = Reachset(root)) } Run Demo pred.tvp

Challenge 2 Use TVLA to show termination of markAndSweep

Lightweight Specification  "correct usage" rules a client must follow  "call open() before read()" Certification does the client program satisfy the lightweight specification? Verification of Safety Properties (PLDI’02, 04) Component a library with cleanly encapsulated state Client a program that uses the library The Canvas Project (with IBM Watson) (Component Annotation, Verification and Stuff)

Prototype Implementation Applied to several example programs –Up to 5000 lines of Java Used to verify –Absence of concurrent modification exception –JDBC API conformance –IOStreams API conformance

Scaling Staged analysis Controlled complexity –More coarse abstractions [Manevich SAS’04] Handle libraries –Use procedure specifications [Yorsh, TACAS’04] –Decision procedures for linked data structures [Immerman, CAV’04, Lev-Ami, CADE’05] Handling procedures –Compute procedure summaries [Jeannet, SAS’04] –Local heaps [Rinetzky, POPL’05]

y t g x x Local heaps [Rinetzky, POPL’05] x y t g call p(x); x

Why is Heap Analysis Difficult? Destructive updating through pointers –p  next = q –Produces complicated aliasing relationships –Track aliasing on 3-valued structures Dynamic storage allocation –No bound on the size of run-time data structures –Canonical abstraction  finite-sized 3-valued structures Data-structure invariants typically only hold at the beginning and end of operations –Need to verify that data-structure invariants are re- established –Query the 3-valued structures that arise at the exit

Summary Canonical abstraction is powerful –Intuitive –Adapts to the property of interest Used to verify interesting program properties –Very few false alarms But scaling is an issue

Summary Effective Abstract Interpretation –Always terminates –Precise enough –But still expensive Can model –Heap –Unbounded arrays –Concurrency More instrumentation can mean more efficient But canonic abstraction is limited –Correlation between list lengths –Arithmetic –Partial heaps

Summary The embedding theorem eliminates the need for proving near commutavity Guarantees soundness Applied to arbitrary logics But can be imprecise