Program Analysis and Verification

Slides:

Advertisements

Similar presentations

Abstract Interpretation Part II

Advertisements

Predicate Abstraction and Canonical Abstraction for Singly - linked Lists Roman Manevich Mooly Sagiv Tel Aviv University Eran Yahav G. Ramalingam IBM T.J.

Shape Analysis by Graph Decomposition R. Manevich M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine B. Cook MSR Cambridge.

Context-Sensitive Interprocedural Points-to Analysis in the Presence of Function Pointers Presentation by Patrick Kaleem Justin.

Interprocedural Shape Analysis for Recursive Programs Noam Rinetzky Mooly Sagiv.

Pointer Analysis – Part I Mayur Naik Intel Research, Berkeley CS294 Lecture March 17, 2009.

3-Valued Logic Analyzer (TVP) Tal Lev-Ami and Mooly Sagiv.

1 Lecture 07 – Shape Analysis Eran Yahav. Previously  LFP computation and join-over-all-paths  Inter-procedural analysis  call-string approach  functional.

1 Lecture 08(a) – Shape Analysis – continued Lecture 08(b) – Typestate Verification Lecture 08(c) – Predicate Abstraction Eran Yahav.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

1 Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications.

Establishing Local Temporal Heap Safety Properties with Applications to Compile-Time Memory Management Ran Shaham Eran Yahav Elliot Kolodner Mooly Sagiv.

Program analysis Mooly Sagiv html://

Finite Differencing of Logical Formulas for Static Analysis Thomas Reps University of Wisconsin Joint work with M. Sagiv and A. Loginov.

Program analysis Mooly Sagiv html://

1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3

Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.

Overview of program analysis Mooly Sagiv html://

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Comparison Under Abstraction for Verifying Linearizability Daphna Amit Noam Rinetzky Mooly Sagiv Tom RepsEran Yahav Tel Aviv UniversityUniversity of Wisconsin.

Static Program Analysis via Three-Valued Logic Thomas Reps University of Wisconsin Joint work with M. Sagiv (Tel Aviv) and R. Wilhelm (U. Saarlandes)

Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.

Precision Going back to constant prop, in what cases would we lose precision?

Dagstuhl Seminar "Applied Deductive Verification" November Symbolically Computing Most-Precise Abstract Operations for Shape.

Program Analysis and Verification Noam Rinetzky Lecture 10: Shape Analysis 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

Shape Analysis Overview presented by Greta Yorsh.

Symbolically Computing Most-Precise Abstract Operations for Shape Analysis Greta Yorsh Thomas Reps Mooly Sagiv Tel Aviv University University of Wisconsin.

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.

1 Shape Analysis via 3-Valued Logic Mooly Sagiv Tel Aviv University Shape analysis with applications Chapter 4.6

Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 13: Abstract Interpretation V Roman Manevich Ben-Gurion University.

1 Combining Abstract Interpreters Mooly Sagiv Tel Aviv University

Pointer Analysis – Part I CS Pointer Analysis Answers which pointers can point to which memory locations at run-time Central to many program optimization.

Operational Semantics Mooly Sagiv Tel Aviv University Textbook: Semantics with Applications Chapter.

Program Analysis via 3-Valued Logic Thomas Reps University of Wisconsin Joint work with Mooly Sagiv and Reinhard Wilhelm.

1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:

Roman Manevich Ben-Gurion University Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 16: Shape Analysis.

Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Shape & Alias Analyses Jaehwang Kim and Jaeho Shin Programming Research Laboratory Seoul National University

Interprocedural shape analysis for cutpoint-free programs Noam Rinetzky Tel Aviv University Joint work with Mooly Sagiv Tel Aviv University Eran Yahav.

Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson

Putting Static Analysis to Work for Verification A Case Study Tal Lev-Ami Thomas Reps Mooly Sagiv Reinhard Wilhelm.

Spring 2017 Program Analysis and Verification

Noam Rinetzky Lecture 13: Numerical, Pointer & Shape Domains

Interprocedural shape analysis for cutpoint-free programs

Partially Disjunctive Heap Abstraction

Compactly Representing First-Order Structures for Static Analysis

Spring 2016 Program Analysis and Verification

Compilers Principles, Techniques, & Tools Taught by Jing Zhang

Spring 2016 Program Analysis and Verification

Pointer Analysis Lecture 2

Graph-Based Operational Semantics

G. Ramalingam Microsoft Research, India & K. V. Raghavan

Symbolic Implementation of the Best Transformer

Iterative Program Analysis Abstract Interpretation

Parametric Shape Analysis via 3-Valued Logic

Program Analysis and Verification

Program Analysis and Verification

Parametric Shape Analysis via 3-Valued Logic

Program Analysis and Verification

Program Analysis and Verification

Pointer Analysis Lecture 2

Pointer analysis.

((a)) A a and c C ((c))

Symbolic Characterization of Heap Abstractions

Data Structures & Algorithms

A Semantics for Procedure Local Heaps and its Abstractions

Spring 2016 Program Analysis and Verification

Presentation transcript:

Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 10: Shape Analysis + Numerical Analysis Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav

Abstract Interpretation [Cousot’77] Mathematical foundation of static analysis

Galois Connection: c  ((c)) The most precise (least) element in A representing c  3 ((c))  2 (c) c  1

Galois Connection: ((a))  a What a represents in C (its meaning) C A  a 2 (a) 1   3 ((a))

Inducing along the connections M A,C M,A c’ 5 4 a’ =A,M(C,A(c)) 3 c C,A(c) 1 C,A 2 A,M

Transformer soundness condition 1 c: f(c)=c’  (f#(c))  (c’) C A  5 f# 4 1 f 2 3

Transformer soundness condition 2 a: f#(a)=a’  f((a))  (a’) C A 4  f 5 1 f# 2 3

Best (induced) transformer f#(a)= (f((a))) C A 4 f f# 3 1 2 Problem:  incomputable directly

Soundness theorem 1 C A lpf(f)  lpf(f#)  fn  f#n  … … f3 f#3 aDA : f((a))  (f#(a))  aDA : fn((a))  (f#n(a))  aDA : lfp(fn)((a))  (lfp(f#n)(a))  lfp(f)   lfp(f#)  C A lpf(f)  lpf(f#)  fn  f#n  … … f3 f#3 f#2  f2  f#  f   

Soundness theorem 2 C A lpf(f#)  lpf(f)  f#n  fn  … … f#3 f3 c DC : (f(c))  f#((c))  c DC : (fn(c))  f#n((c))  c DC : (lfp(f)(c))  lfp(f#)((c))  lfp(f)   lfp(f#)  C A lpf(f#)   f  fn  … lpf(f)  f2  f3 f#n  … f#3 f#2  f#  

Pointer Analysis Points-To Analysis Alias Analysis may-point-to must-point-to Alias Analysis may-alias must-alias

Applications Compiler optimizations Verification & Bug Finding Method de-virtualization Call graph construction Allocating objects on stack via escape analysis Verification & Bug Finding Data race detection Use in preliminary phases Use in verification itself

PWhile syntax A primitive statement is of the form x := null x := y skip (where x and y are variables in Var) Omitted (for now) Dynamic memory allocation Pointer arithmetic Structures and fields Procedures

Destructive Update: *x = y Strong updates Weak Updates

Points-to analysis: a simple example p = &x; q = &y; if (?) { q = p; } x = &a; y = &b; z = *q; {p=&x} {p=&x  q=&y} {p=&x  q=&x} {p=&x  (q=&y  q=&x)} {p=&x  (q=&y  q=&x)  x=&a} {p=&x  (q=&y  q=&x)  x=&a  y=&b} {p=&x  (q=&yq=&x)  x=&a  y=&b} How would you construct an abstract domain to represent these abstract states?

PWhile operational semantics State : (VarZ)  (VarVar{null})  x = y  s =  x = *y  s =  *x = y  s =  x = null  s =  x = &y  s =

Andersen: Flow-insensitive analysis L1: x = &a; L2: y = x; L3: x = &b; L4: z = x; L5: x a y b z L1-5

Steensgaard’s Flow-insensitive analysis L1: x = &a; L2: y = x; L3: y = &b; L4: b = &c; L5: x y a b c

May-points-to analyses Ideal-May-Point-To ??? Algorithm A more efficient / less precise Andersen’s more efficient / less precise Steensgaard’s

Handling memory allocation s: x = new () / malloc () Assume, for now, that allocated object stores one pointer s: x = malloc ( sizeof(void*) ) Introduce a pseudo-variable Vs to represent objects allocated at statement s, and use previous algorithm Treat s as if it were “x = &Vs” Also track possible values of Vs Allocation-site based approach Key aspect: Vs represents a set of objects (locations), not a single object referred to as a summary object (node)

Dynamic memory allocation example L1: x = new O; L2: y = x; L3: *y = &b; L4: *y = &a; L1 y a b How should we handle these statements

Summary object update x L1: x = new O; L2: y = x; L3: *y = &b; L4: *y = &a; L1 y a b

Object fields Field-insensitive analysis class Foo { A* f; B* g; } x class Foo { A* f; B* g; } L1: x = new Foo() x->f = &b; x->g = &a; L1 a b

Object fields Field-sensitive analysis class Foo { A* f; B* g; } x class Foo { A* f; B* g; } L1: x = new Foo() x->f = &b; x->g = &a; L1 f g a b

Other Aspects Context-sensitivity Indirect (virtual) function calls and call-graph construction Pointer arithmetic Object-sensitivity

Shape Analysis

Shape Analysis Automatically verify properties of programs manipulating dynamically allocated storage Identify all possible shapes (layout) of the heap

Analyzing Singly Linked Lists

Limitations of pointer analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } // Singly-linked list // data type. class SLL { int data; public SLL n; // next cell SLL(Object data) { this.data = data; this.n = null; } }

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; }

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n L1

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n L1 tmp

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n L1 n tmp L2

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n L1 n tmp L2

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n L1 n tmp L2

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp != null tmp L2  h null t n L1 tmp == null tmp

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2  h null t n L1 tmp

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2 tmp == null ?

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2 Possible null dereference!

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2 Fixed-point for first loop

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2

Flow&Field-sensitive Analysis h // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } null t n L1 n tmp L2 Possible null dereference!

What was the problem? Pointer analysis abstract all objects allocated at same program location into one summary object. However, objects allocated at same memory location may behave very differently E.g., object is first/last one in the list Assign extra predicates to distinguish between objects with different roles Number of objects represented by summary object 1 – does not allow strong updates Distinguish between concrete objects (#=1) and abstract objects (#1) Join operator very coarse – abstracts away important distinctions (tmp=null/tmp!=null) Apply disjunctive completion

Improved solution Pointer analysis abstract all objects allocated at same program location into one summary object. However, objects allocated at same memory location may behave very differently E.g., object is first/last one in the list Add extra instrumentation predicates to distinguish between objects with different roles Number of objects represented by summary object 1 – does not allow strong updates Distinguish between concrete objects (#=1) and abstract objects (#1) Join operator very coarse – abstracts away important distinctions (tmp=null/tmp!=null) Apply disjunctive completion

Adding properties to objects Let’s first drop allocation site information and instead… Define a unary predicate x(v) for each pointer variable x meaning x points to x Predicate holds for at most one node Merge together nodes with same sets of predicates

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n {h, t}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n {h, t} tmp

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n {h, t} tmp n {tmp} No null dereference

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n {h, t} tmp n {tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null t n {t} tmp n {h, tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp {h, tmp}  h null t n {h, t} tmp

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp {h, tmp}  h null t n {h, t} tmp Do we need to analyze this shape graph again?

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp {h, tmp}  h null t n {h, t} tmp

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { h} n { tmp} No null dereference

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { h} n { tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n { h, tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n { h, tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n { h} n { tmp} No null dereference

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n { h} n { tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n { } n {h, tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n n {h, tmp} How many objects does it represent? Summarization

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n n {h} n { tmp} No null dereference

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n n {h} n { tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n n {h, tmp} Fixed-point for first loop

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n n {h, tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { } n n {h, tmp}

Flow&Field-sensitive Analysis // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n tmp { tmp?} n n { tmp}

Best (induced) transformer f#(a)= (f((a))) C A 4 f f# 3 1 2 Problem:  incomputable directly

Best transformer for tmp=tmp.n null h t {t} tmp n { } {h, tmp} null h n {t}  t n { } tmp n n {h, tmp} null h t {t} tmp n { } {h, tmp}

Best transformer for tmp=tmp.n:  null h t {t} tmp n { } {h, tmp} null h n {t}  t n { } tmp n n {h, tmp} null h t {t} tmp n { } {h, tmp} h null n t {t} n { } { } … { } n tmp {h, tmp}

Best transformer for tmp=tmp.n h null null n h t {t} n {t} n t {tmp } n n { } tmp {h} tmp n n {h, tmp} h h null null n n t {t} t {t} n { } n { } { } { } … tmp {tmp } {tmp } tmp n n {h} {h}

Best transformer for tmp=tmp.n:  h null null n h t {t} n {t} n t {tmp } n n { } tmp {h} tmp n n {h, tmp} h h null null n n t {t} t {t} n n { } { } … tmp {tmp } {tmp } tmp n n {h} {h}

Handling updates on summary nodes Transformers accessing only concrete nodes are easy Transformers accessing summary nodes are complicated Can’t concretize summary nodes – represents potentially unbounded number of concrete nodes We need to split into cases by “materializing” concrete nodes from summary node Introduce a new temporary predicate tmp.n Partial concretization

Transformer for tmp=tmp.n: ’ null h n {t} ’ t n { } tmp n n {h, tmp} Case 1: Exactly 1 object. Case 2: >1 objects.

Transformer for tmp=tmp.n: ’ null h t {t} tmp n {tmp.n } {h, tmp} null h n {t} ’ t n { } tmp n n {h, tmp} Summary node h null n t {t} spaghetti n { } tmp.n is a concrete node {tmp.n } n tmp {h, tmp}

Transformer tmp=tmp.n null h t {t} tmp n {tmp, tmp.n } {h} null h n {t} t n { } tmp n n {h, tmp} h null n t {t} n { } tmp {tmp, tmp.n } n {h}

Transformer for tmp=tmp.n:  null h t {t} tmp n {tmp} {h} // Build a list SLL h=null, t = null; L1: h=t= new SLL(-1); SLL tmp = null; while (…) { int data = getData(…); L2: tmp = new SLL(data); tmp.n = h; h = tmp; } // Process elements tmp = h; while (tmp != t) { assert tmp != null; tmp.data += 1; tmp = tmp.n; } h null n t {t} n { } tmp {tmp} n {h}

Transformer via partial-concretization f#(a)= ’(f#’(’(a))) C A’ A ’ 6 4 f f#’ f# 3 5 1 2 ’

Recap Adding more properties to nodes refines abstraction Can add temporary properties for partial concretization Materialize concrete nodes from summary nodes Allows turning weak updates into strong ones Focus operation in shape-analysis lingo Not trivial in general and requires more semantic reduction to clean up impossible edges General algorithms available via 3-valued logic and implemented in TVLA system

3-Value logic based shape analysis

Sequential Stack Want to Verify No Null Dereference void push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x; } int pop() { if (Top == NULL) return EMPTY; Node *s = Top->n; int r = Top->d; Top = s; return r; } Want to Verify No Null Dereference Underlying list remains acyclic after each operation

Shape Analysis via 3-valued Logic 1) Abstraction 3-valued logical structure canonical abstraction 2) Transformers via logical formulae soundness by construction embedding theorem, [SRW02]

Concrete State represent a concrete state as a two-valued logical structure Individuals = heap allocated objects Unary predicates = object properties Binary predicates = relations parametric vocabulary n n Top u1 u2 u3 (storeless, no heap addresses)

Concrete State S = <U,  > over a vocabulary P U – universe  - interpretation, mapping each predicate from p to its truth value in S Top n u1 u2 u3 U = { u1, u2, u3} P = { Top, n } (n)(u1,u2) = 1, (n)(u1,u3)=0, (n)(u2,u1)=0,… (Top)(u1)=1, (Top)(u2)=0, (Top)(u3)=0

Formulae for Observing Properties void push (int v) { Node x = malloc(sizeof(Node)); xd = v; xn = Top; Top = x; } Top u1 u2 u3 Top != null w: x(w) w:Top(w) 1 w: x(w) No node precedes Top v1,v2: n(v1, v2) Top(v2) 1 No Cycles 1 v1,v2: n(v1, v2)  n*(v2, v1) v1,v2: n(v1, v2)  n*(v2, v1) v1,v2: n(v1, v2) Top(v2)

Concrete Interpretation Rules Statement Update formula x =NULL x’(v)= 0 x= malloc() x’(v) = IsNew(v) x=y x’(v)= y(v) x=y next x’(v)= w: y(w)  n(w, v) x next=y n’(v, w) = (x(v) n(v, w))  (x(v)  y(w))

Example: s = Topn Top Top s u1 u2 u3 u1 u2 u3 s’(v) = v1: Top(v1)  n(v1,v) u1 u2 u3 Top u1 1 u2 u3 n u1 u2 U3 1 u3 Top u1 1 u2 u3 n u1 u2 U3 1 u3 s u1 u2 u3 s u1 u2 1 u3

Collecting Semantics  { st(w)(S) | S  CSS[w] }  if v = entry { <,> }  { st(w)(S) | S  CSS[w] }  (w,v)  E(G), w  Assignments(G)  { S | S  CSS[w] }  (w,v)  E(G), w  Skip(G) CSS [v] =  { S | S  CSS[w] and S  cond(w)}  othrewise (w,v)  True-Branches(G)  { S | S  CSS[w] and S  cond(w)} (w,v)  False-Branches(G)

Collecting Semantics At every program point – a potentially infinite set of two-valued logical structures Representing (at least) all possible heaps that can arise at the program point Next step: find a bounded abstract representation

3-Valued Logic 1 = true 0 = false 1/2 = unknown A join semi-lattice, 0  1 = 1/2 1/2 information order 1 logical order

3-Valued Logical Structures A set of individuals (nodes) U Relation meaning Interpretation of relation symbols in P p0()  {0,1, 1/2} p1(v)  {0,1, 1/2} p2(u,v)  {0,1, 1/2} A join semi-lattice: 0  1 = 1/2

Boolean Connectives [Kleene]

Property Space 3-struct[P] = the set of 3-valued logical structures over a vocabulary (set of predicates) P Abstract domain (3-Struct[P])  is 

Embedding Order Given two structures S = <U, >, S’ = <U’, ’> and an onto function f : U  U’ mapping individuals in U to individuals in U’ We say that f embeds S in S’ (denoted by S  S’) if for every predicate symbol p  P of arity k: u1, …, uk  U, (p)(u1, ..., uk)  ’(p)(f(u1), ..., f (uk)) and for all u’  U’ ( | { u | f (u) = u’ } | > 1)  ’(sm)(u’) We say that S can be embedded in S’ (denoted by S  S’) if there exists a function f such that S f S’

Tight Embedding S’ = <U’, ’> is a tight embedding of S=< U,  > with respect to a function f if: S’ does not lose unnecessary information ’(u’1,…, u’k) = {(u1 ..., uk) | f(u1)=u’1,..., f(uk)=u’k} One way to get tight embedding is canonical abstraction

Canonical Abstraction Top u1 u1 u2 u2 u3 u3 [Sagiv, Reps, Wilhelm, TOPLAS02]

Canonical Abstraction Top u1 u1 u2 u2 u3 u3 Top [Sagiv, Reps, Wilhelm, TOPLAS02]

Canonical Abstraction Top u1 u1 u2 u2 u3 u3 Top

Canonical Abstraction Top u1 u1 u2 u2 u3 u3 Top

Canonical Abstraction Top u1 u1 u2 u2 u3 u3 Top

Canonical Abstraction Top u1 u1 u2 u2 u3 u3 Top

Canonical Abstraction () Merge all nodes with the same unary predicate values into a single summary node Join predicate values  ’(u’1 ,..., u’k) =  { (u1 ,..., uk) | f(u1)=u’1 ,..., f(uk)=u’k } Converts a state of arbitrary size into a 3-valued abstract state of bounded size (C) =  { (c) | c  C }

Information Loss Top n Canonical abstraction Top Top n Top Top Top n

Instrumentation Predicates Record additional derived information via predicates rx(v) = v1: x(v1)  n*(v1,v) c(v) = v1: n(v1, v)  n*(v, v1) Canonical abstraction Top n Top c Top n c Top

Embedding Theorem: Conservatively Observing Properties Top rTop rTop No Cycles No cycles (derived) 1/2 1 v1,v2: n(v1, v2)  n*(v2, v1) v:c(v)

Operational Semantics void push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x; } Top n u1 u2 u3 int pop() { if (Top == NULL) return EMPTY; Node *s = Top->n; int r = Top->d; Top = s; return r; } s’(v) = v1: Top(v1)  n(v1,v) s = Top->n Top n u1 u2 u3 s

? Abstract Semantics s = Topn s’(v) = v1: Top(v1)  n(v1,v) Top Top rTop s Top rTop rTop s’(v) = v1: Top(v1)  n(v1,v) s = Top->n

Best Transformer (s = Topn) rTop Top rTop Concrete Semantics Top s rTop Top rTop s’(v) = v1: Top(v1)  n(v1,v) . . Canonical Abstraction  ? Top s rTop Abstract Semantics Top s rTop Top rTop rTop