Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II Roman Manevich Ben-Gurion University.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
CSE 231 : Advanced Compilers Building Program Analyzers.
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
Programming Language Semantics Denotational Semantics Chapter 5 Based on a lecture by Martin Abadi.
1 Iterative Program Analysis Part I Mooly Sagiv Tel Aviv University Textbook: Principles of Program.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
1 Iterative Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
Administrative stuff Office hours: After class on Tuesday.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.
San Diego October 4-7, 2006 Over 1,000 women in computing Events for undergraduates considering careers and graduate school Events for graduate students.
Recap: Reaching defns algorithm From last time: reaching defns worklist algo We want to avoid using structure of the domain outside of the flow functions.
1 Program Analysis Systematic Domain Design Mooly Sagiv Tel Aviv University Textbook: Principles.
Programming Language Semantics Denotational Semantics Chapter 5 Part III Based on a lecture by Martin Abadi.
Claus Brabrand, ITU, Denmark DATA-FLOW ANALYSISMar 25, 2009 Static Analysis: Data-Flow Analysis II Claus Brabrand IT University of Copenhagen (
Claus Brabrand, UFPE, Brazil Aug 09, 2010DATA-FLOW ANALYSIS Claus Brabrand ((( ))) Associate Professor, Ph.D. ((( Programming, Logic, and.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
1 Tentative Schedule u Today: Theory of abstract interpretation u May 5 Procedures u May 15, Orna Grumberg u May 12 Yom Hatzamaut u May.
Abstract Interpretation (Cousot, Cousot 1977) also known as Data-Flow Analysis.
Solving fixpoint equations
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 2: Operational Semantics I Roman Manevich Ben-Gurion University.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 11: Abstract Interpretation III Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 14: Numerical Abstractions Roman Manevich Ben-Gurion University.
Program Analysis and Verification Noam Rinetzky Lecture 6: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
Program Analysis and Verification
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 14: Numerical Abstractions Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 9: Abstract Interpretation I Roman Manevich Ben-Gurion University.
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 10: Abstract Interpretation II Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 4: Axiomatic Semantics I Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 4: Axiomatic Semantics I Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 13: Abstract Interpretation V Roman Manevich Ben-Gurion University.
Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.
Program Analysis and Verification
Program Analysis and Verification
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
Program Analysis and Verification Noam Rinetzky Lecture 8: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.
1 Iterative Program Analysis Part II Mathematical Background Mooly Sagiv Tel Aviv University
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Program Analysis and Verification
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
DFA foundations Simone Campanoni
Spring 2017 Program Analysis and Verification
Spring 2016 Program Analysis and Verification
Spring 2017 Program Analysis and Verification
Spring 2016 Program Analysis and Verification
Spring 2016 Program Analysis and Verification
Spring 2017 Program Analysis and Verification
Spring 2016 Program Analysis and Verification
Spring 2016 Program Analysis and Verification
Fall Compiler Principles Lecture 8: Loop Optimizations
Program Analysis and Verification
Fall Compiler Principles Lecture 10: Loop Optimizations
Data Flow Analysis Compiler Design
Program Analysis and Verification
Presentation transcript:

Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II Roman Manevich Ben-Gurion University

Syllabus Semantics Natural Semantics Structural semantics Axiomatic Verification Static Analysis Automating Hoare Logic Control Flow Graphs Equation Systems Collecting Semantics Abstract Interpretation fundamentals LatticesFixed-Points Chaotic Iteration Galois Connections Widening/ Narrowing Domain constructors Interprocedural Analysis Analysis Techniques Numerical Domains CEGARAlias analysis Shape Analysis Crafting your own Soot From proofs to abstractions Systematically developing transformers 2

Previously Semantic domains – Preorders – Partial orders (posets) – Pointed posets – Ascending/descending chains – The height of a poset – Join and Meet operators – Complete lattices Constructing new lattices from old Abstract Interpretation package – domains 3

Abstract domain types 4

A taxonomy of semantic domain types 5 Complete Lattice (D, , , , ,  ) Lattice (D, , , , ,  ) Join semilattice (D, , ,  ) Meet semilattice (D, , ,  ) Join/Meet exist for every subset of D Join/Meet exist for every finite subset of D (alternatively, binary join/meet) Join of the empty set Meet of the empty set Complete partial order (CPO) (D, ,  ) Partial order (poset) (D,  ) Preorder (D,  ) reflexive transitive anti-symmetric: d  d’ and d’  d implies d = d’ reflexive: d  d transitive: d  d’, d’  d’’ implies d  d’’ poset with LUB for all ascending chains

Composing domains 6

Cartesian product of complete lattices For two complete lattices L 1 = (D 1,  1,  1,  1,  1,  1 ) L 2 = (D 2,  2,  2,  2,  2,  2 ) Define the poset L cart = (D 1  D 2,  cart,  cart,  cart,  cart,  cart ) as follows: – (x 1, x 2 )  cart (y 1, y 2 ) iff x 1  1 y 1 and x 2  2 y 2 –  cart = ?  cart = ?  cart = ?  cart = ? Lemma: L is a complete lattice Define the Cartesian constructor L cart = Cart(L 1, L 2 ) 7

Disjunctive completion For a complete lattice L = (D, , , , ,  ) Define the powerset lattice L  = (2 D,  ,  ,  ,  ,   )   = ?   = ?   = ?   = ?   = ? Lemma: L  is a complete lattice L  contains all subsets of D, which can be thought of as disjunctions of the corresponding predicates Define the disjunctive completion constructor L  = Disj(L) 8

Relational product of lattices L 1 = (D 1,  1,  1,  1,  1,  1 ) L 2 = (D 2,  2,  2,  2,  2,  2 ) L rel = (2 D 1  D 2,  rel,  rel,  rel,  rel,  rel ) as follows: – L rel = Disj(Cart(L 1, L 2 )) Lemma: L is a complete lattice 9

Finite maps For a complete lattice L = (D, , , , ,  ) and finite set V Define the poset L V  L = (V  D,  V  L,  V  L,  V  L,  V  L,  V  L ) as follows: – f 1  V  L f 2 iff for all v  V f 1 (v)  f 2 (v) –  V  L = ?  V  L = ?  V  L = ?  V  L = ? Lemma: L is a complete lattice Define the map constructor L V  L = Map(V, L) 10

The collecting lattice Lattice for a given control-flow node v: L v =(2 State, , , , , State) Lattice for entire control-flow graph with nodes V: L CFG = Map(V, L v ) We will use this lattice as a baseline for static analysis and define abstractions of its elements 11

Implementation 12

Software package: paver142 Built on top of the Soot compiler framework for Java Download from web-site – Includes all necessary Soot jar files 13

14 Infrastructure for implementing static analysis Example analyses Soot-specific utilities

Existing analyses 15

Implementing abstract domains 16

Variable equalities analysis 17

Today Solving monotone systems Fixed-points Vanilla static analysis algorithm Chaotic iteration 18

Abstract interpretation via abstraction 19 set of states collecting semantics statement S abstract representation of sets of states  abstract representation of sets of states abstract semantics statement S abstract representation of sets of states abstraction {P}{P}S{Q}{Q}sp(S, P)  generalizes axiomatic verification

Abstract interpretation via concretization 20 set of states collecting semantics statement S set of states  abstract representation of sets of states abstract semantics statement S abstract representation of sets of states concretization {P}{P}S{Q}{Q} models(P)models(sp(S, P))models(Q) 

Missing knowledge Collecting semantics Abstract semantics Connection between collecting semantics and abstract semantics Algorithm to compute abstract semantics 21

Review of collecting semantics 22

The collecting lattice (sets of states) Lattice for a given control-flow node v: L v =(2 State, , , , , State) Lattice for entire control-flow graph with nodes V: L CFG = Map(V, L v ) We will use this lattice as a baseline for static analysis and define abstractions of its elements 23

Collecting semantics as equation system A vector of variables R[0, 1, 2, 3, 4] R[0] = { x  Z} // established input R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2] A (recursive) system of equations 24 if x > 0 x := x-1 entry exit R[0] R[1] R[2] R[4] R[3] Semantic function for assume x>0 Semantic function for x:=x-1 lifted to sets of states

General definition A vector of variables R[0, …, k] one per input/output of a node – R[0] is for entry For node n with multiple predecessors add equation R[n] =  {R[k] | k is a predecessor of n} For an atomic operation node R[m] S R[n] add equation R[n] =  S  R[m] Transform if b then S 1 else S 2 to ( assume b; S 1 ) or ( assume  b; S 2 ) 25 if x > 0 x := x-1 entry exit R[0] R[1] R[2] R[4] R[3]

Static analysis Given a system of equations for the collecting semantics A static analysis solves a corresponding system of equations over an abstract domain Questions: – What is the relation between the solutions? Next lecture – How do you solve the second system? This lecture 26 R[0] = { x  Z} // established input R[1] = R[0]  R[4] R[2] =  assume x>0  R[1] R[3] =  assume x  0  R[1] R[4] =  x:=x-1  R[2] R[0] # = { x  Z} # R[1] # = R[0]  R[4] R[2] # =  assume x>0  # R[1] R[3] # =  assume x  0  # R[1] R[4] # =  x:=x-1  # R[2]

Solving equation systems 27

Equation systems in general Let L be a complete lattice (D, , , , ,  ) Let R be a vector of analysis variables R[0, …, n]  D  …  D Let F be a vector of functions of the type F[i] : R[0, …, n]  R[0, …, n] A system of equations R[0] = f[0](R[0], …, R[n]) … R[n] = f[n](R[0], …, R[n]) In vector notation R = F(R) Questions: 1.Does a solution always exist? 2.If so, is it unique? 3.If so, is it computable? 28 For R[i]=f[i] R Usually f[i] reads only a small subset of R – D[i]. We say that R[i] depends on D[i] R[0] = { x  Z} // established input R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2]

Equation systems in general Let L be a complete lattice (D, , , , ,  ) Let R be a vector of analysis variables R[0, …, n]  D  …  D Let F be a vector of functions of the type F[i] : R[0, …, n]  R[0, …, n] A system of equations R[0] = f[0](R[0], …, R[n]) … R[n] = f[n](R[0], …, R[n]) In vector notation R = F(R) Questions: 1.Does a solution always exist? 2.If so, is it unique? 3.If so, is it computable? 29 If it does – it is a fixed point of this equation

Monotone systems 30

Monotone functions Let L 1 =(D 1,  ) and L 2 =(D 2,  ) be two posets A function f : D 1  D 2 is monotone if for every pair x, y  D 1 x  y implies f(x)  f(y) A special case: L 1 =L 2 =(D,  ) f : D  D 31

Monotone function 32 f 1 x  L1L1 L2L2 2 y 3 f(x)f(x) 4 f(y)f(y)  f

Important cases of monotonicity Join: f(X, Y) = X  Y is monotone in each operand – Prove it! Set lifting function: for a set X and any function g F(X) = { g(x) | x  X } is monotone w.r.t.  – Prove it! Notice that the collecting semantics function is defined in terms of – Join (set union) – Semantic function for atomic statements lifted to sets of states Conclusion: collecting semantics function is monotone 33

Fixed points 34

Extensive/reductive functions Let L=(D,  ) be a poset A function f : D  D is extensive if for every x  D, we have that x  f(x) A function f : D  D is reductive if for every x  D, we have that x  f(x) 35

Fixed points L = (D, , , , ,  ) f : D  D monotone Fix(f) = { d | f(d) = d } Red(f) = { d | f(d)  d } Ext(f) = { d | d  f(d) } Theorem [Tarski 1955] – lfp(f) =  Fix(f) =  Red(f)  Fix(f) – gfp(f) =  Fix(f) =  Ext(f)  Fix(f) 36 Red(f) Ext(f) Fix(f)   lfp gfp fn()fn() 1.Does a solution always exist? Yes 2.If so, is it unique? No, but it has least/greatest solutions 3.If so, is it computable? Under some conditions…

Fixed point example R[0] = { x  Z} R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2] 37 if x>0 x := x entry exit xZxZ xZxZ { x  0}{ x  0} 4 if x>0 x := x entry exit xZxZ xZxZ { x  0}{ x  0} 4 { x >0} F(d) : Fixed point = d { x> 0}

Pre-fixed point example R[0] = { x  Z} R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2] 38 if x>0 x := x-1 entry exit xZxZ xZxZ { x <-5} if x>0 x := x-1 entry exit xZxZ xZxZ { x  0} F(d) : pre-fixed point  d { x  0} { x> 0} { x  0} { x >0}

Post-fixed point example R[0] = { x  Z} R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2] 39 if x>0 x := x-1 entry exit xZxZ xZxZ { x <9} if x>0 x := x-1 entry exit xZxZ xZxZ { x  0} F(d) : post-fixed point  d { x  0} { x >0} { x  0}

Recap A system of equations of the form R=F(R) where R draws its elements from a complete lattice L = (D, , , , ,  ) Tarski’s fixed point theorem ensures us that there exists a least fixed point: lfp(f) =  Fix(f) However, it is not an algorithm since D is often infinite – Ineffective when D is finite We need a more constructive way of computing lfp(f) 40

Computing the least Fixed point 41

Continuous functions Let L = (D, , ,  ) be a complete partial order – Every ascending chain has an upper bound A function f is continuous if for every increasing chain Y  D*, f(  Y) =  { f(y) | y  Y } Lemma: if f is continuous then f is monotone Proof: assume x  y Therefore x  y=y Then f(y) = f(x  y) = f(x)  f(y), which means f(x)  f(y) 42

Kleene’s fixed point theorem Let L = (D, , ,  ) be a complete partial order and a continuous function f: D  D then lfp(f) =  n  N f n (  ) That is, take the ascending chain   f(  )  f(f(  ))  …  f n (  )  … and return the supremum – Why is this an ascending chain? But how do you know if a function f is continuous 43

Continuity and ACC condition Let L = (D, , ,  ) be a complete partial order – Every ascending chain has an upper bound L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes: d 0  d 1  …  d n = d n+1 = d n+2 = … Lemma: Monotone functions on posets satisfying ACC are continuous Proof: We need to show that f(  Y) =  { f(y) | y  Y } 1.Every ascending chain Y eventually stabilizes d 0  d 1  …  d n = d n+1 = … hence d n is the least upper bound of {d 0, d 1, …, d n }, thus f(  Y) = f(d n ) 2.From monotonicity of f we get that f(d 0 )  f(d 1 )  …  f(d n ) = f(d n+1 ) = … Hence f(d n ) is the least upper bound of {f(d 0 ), f(d 1 ), …, f(d n )}, thus  { f(y) | y  Y } = f(d n ) 44

Resulting algorithm Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone 45   lfp fn()fn() f()f() f2()f2() … d :=  while f(d)  d do d := f(d) return d Algorithm lfp(f) =  n  N f n (  ) Mathematical definition

Our very first generic static analysis algorithm 46

Vanilla algorithm Problem Definition: 1.Lattice of properties L of finite height (ACC) 2.For each statement define a monotone transformer Preparation: 1.Parse program into AST 2.Convert AST into CFG 3.Generate system of equations from CFG Analysis: 1.Initialize each analysis variable with  2.Update all analysis variables of each equation until reaching a fixed point 47 Non-incremental. Most variables don’t change.

Chaotic iteration 48

Chaotic iteration 49 Input: – A cpo L = (D, , ,  ) satisfying ACC – L n = L  L  …  L – A monotone function f : D n  D n – A system of equations { X[i] | f(X) | 1  i  n } Output: lfp(f) A worklist-based algorithm for i:=1 to n do X[i] :=  WL = {1,…,n} while WL   do j := pop WL // choose index non-deterministically N := F[i](X) if N  X[i] then X[i] := N add all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i]) return X

Chaotic iteration for static analysis Specialize chaotic iteration for programs Create a CFG for program Choose a cpo of properties for the static analysis to infer: L = (D, , ,  ) Define variables R[0,…,n] for input/output of each CFG node such that R[i]  D For each node v let v out be the variable at the output of that node: v out = F[v](  u | (u,v) is a CFG edge) – Make sure each F[v] is monotone Variable dependence determined by outgoing edges in CFG 50

Static analysis example: constant propagation 51

Constant propagation example 52 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit x := 4; while (y  5) do z := x; x := 4

Constant propagation lattice For each variable x define L as For a set of program variables Var=x 1,…,x n L n = L  L  …  L 53   no information yet not-a-constant

Write down variables 54 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit x := 4; while (y  5) do z := x; x := 4

Write down equations 55 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R2R2 R3R3 R4R4 R6R6 R1R1 R5R5 R0R0 x := 4; while (y  5) do z := x; x := 4

Collecting semantics equations 56 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R2R2 R3R3 R4R4 R6R6 R 0 = State R 1 =  x:=4  R 0 R 2 = R 1  R 5 R 3 =  assume y  5  R 2 R 4 =  z:=x  R 3 R 5 =  x:=4  R 4 R 6 =  assume y=5  R 2 R1R1 R5R5 R0R0

Constant propagation equations 57 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R2R2 R3R3 R4R4 R6R6 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R1R1 R5R5 R0R0 abstract transformer

Abstract operations for CP 58 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 Lattice elements have the form: (v x, v y, v z )  x:=4  # (v x,v y,v z ) = (4, v y, v z )  z:=x  # (v x,v y,v z ) = (v x, v y, v x )  assume y  5  # (v x,v y,v z ) = (v x, v y, v z )  assume y=5  # (v x,v y,v z ) = if v y = k  5 then ( , ,  ) else (v x, 5, v z ) R 1  R 5 = (a 1, b 1, c 1 )  (a 5, b 5, c 5 ) = (a 1  a 5, b 1  b 5, c 1  c 5 )   CP lattice for a single variable

Chaotic iteration for CP: initialization 59 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =( , ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 0, R 1, R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP: initialization 60 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =( , ,  ) R 5 =( , ,  ) WL = {R 1, R 2, R 3, R 4, R 5, R 6 } R 0 =( , ,  )

Chaotic iteration for CP: initialization 61 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =( , ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 1, R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 62 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =( , ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 63 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 64 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  )   3 4 WL = {R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 65 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =(4, ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  )   3 4 WL = {R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 66 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =(4, ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 4, R 5, R 6 }

Chaotic iteration for CP 67 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 3 =(4, ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 5 =( , ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {R 5, R 6 }

Chaotic iteration for CP 68 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 3 =(4, ,  ) R 4 =(4, , 4) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 5 =( , ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {R 6 }

Chaotic iteration for CP 69 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 5 =(4, , 4) R 4 =(4, , 4) R 3 =(4, ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {R 6 }

Chaotic iteration for CP 70 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 6 =( , ,  ) R 5 =(4, , 4) R 4 =(4, , 4) R 3 =(4, ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {}

Chaotic iteration for CP – fixed point 71 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 6 =(4, 5,  ) R 5 =(4, , 4) R 4 =(4, , 4) R 3 =(4, ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {} In practice maintain a worklist of nodes

Complexity of chaotic iteration Parameters: – n : number of CFG nodes – k: maximum in-degree of edges – h: height of lattice L – c: maximum cost of Applying F v  Checking fixed-point condition for lattice L Complexity: O(n  h  c  k) Incremental (worklist) algorithm reduces the n factor – Implement worklist by priority queue and order nodes by reversed topological order 72

implementation 73

Major classes 74 Variable per CFG node Converts CFG to equation system An equation per CFG edge and join point A system of equations Chaotic iteration algorithm to compute fixed point A transformer for assume statements A transformer non-assume statements Combines all sub-algorithms to get entire static analysis

Soot: a Java Optimization Framework Developed at McGill university (Canada) – Supports several input languages – Java source code – Java bytecode – Dalvik bytecode (Android) Dalvik bytecode – Jimple intermediate language Supported output languages – Java bytecode – Dalvik bytecode (Android) – Jimple intermediate language Support several intermediate languages – Jimple – what we will be using – Shimple – Baf – Grimp Supports static analysis: CFG, pointer-analysis, etc. Eclipse plug-in (useful for giving demos and teaching) 75

Soot documentation and resources Soot survivor’s guide Soot tutorials Soot API Eric Bodden’s blog – Running Soot:

Jimple synopsis TAC for Java: 15 statement types Core (intra-procedural) statements – NopStmt – IdentityStmt ( r0 Foo; i0 int; ) – AssignStmt ( $r1 = new Foo; ) Intra-procedural control-flow statements – IfStmt – GotoStmt – TableSwitchStmt (JVM tableSwitch instruction) – LookupSwithcStmt (JVM lookupswitch instruction) Inter-procedural control-flow statements – InvokeStmt – ReturnStmt – ReturnVoidStmt Monitor statements – EnterMonitorStmt – ExitMonitorStmt Exceptions – ThrowStmt – RetStmt 77

Jimple expressions 78

Java source 79

Running Soot 80 output.jimple files go in “sootOutput”

Jimple code 81 (default) static class initializer Local Local s IdentityStmt IdentityStmt s Two variables with same name ( w )?

Setting up for development 1.Set up Java 2.Set up Soot 3.Set up abstract interpretation package 82

Setting up Java Make sure you have version 1.7 If you want to operate from command line make sure you have jdk 1.7 – Set environment variable JAVA_HOME to point to your jdk installation path 83

Setting up Soot Download – sootclasses.jar sootclasses.jar – jasminclasses.jar jasminclasses.jar – polyglotclasses.jar polyglotclasses.jar Recommended: Soot source (complete package)source 84 Add Soot jar files as External Attach Soot sources

Example inputs Store input files in a separate directory than the ones you use for implementing the analyses (otherwise, front-end breaks) 85

Static analysis package Written for this course in the last few days – Not fully debugged Implements – Conversion of procedures to equation systems – Abstract domain implementations Some examples: variable equalities (VE), constant propagation (CP), more to come – Chaotic iterations Includes debugging information – Domain combinators: Cartesian, Disjunctive completion, and Relational – Code for displaying analysis results 86

Running the VE analysis Example: variable equalities 87

Running the VE analysis 88 Adds the analysis to Soot’s list of intra- procedural analyses 1.Creates the equation system 2.Runs chaotic iteration 3.Attaches results as StringTag s StringTag

Running the VE analysis Command-line options: -cp. : add the current directory to Soot’s CLASSPATH -pp : add Java’s CLASSPATH to Soot’s CLASSPATH -f jimple : output jimple code -p jb use-original-names : keep local variables names as they are -keep-line-number : write source code line numbers in the resulting jimple code -print-tags : write out tags for each jimple statement (analysis results) TestClass : the class to analyze 89 Enable assertions Which directory to run in

Debug printout 90

Analysis results inlined into.jimple 91

Next lecture: abstract interpretation III