Flow Analysis Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Lecture 11: Code Optimization CS 540 George Mason University.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
SSA.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
(c) 2007 Mauro Pezzè & Michal Young Ch 6, slide 1 Dependence and Data Flow Models.
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
Common Sub-expression Elim Want to compute when an expression is available in a var Domain:
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
Program analysis Mooly Sagiv html://
Control Flow Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
1 Iterative Program Analysis Part I Mooly Sagiv Tel Aviv University Textbook: Principles of Program.
From last time: Lattices A lattice is a tuple (S, v, ?, >, t, u ) such that: –(S, v ) is a poset – 8 a 2 S. ? v a – 8 a 2 S. a v > –Every two elements.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
1 Control Flow Analysis Mooly Sagiv Tel Aviv University Textbook Chapter 3
1 Iterative Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Range Analysis. Intraprocedural Points-to Analysis Want to compute may-points-to information Lattice:
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Administrative stuff Office hours: After class on Tuesday.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.
San Diego October 4-7, 2006 Over 1,000 women in computing Events for undergraduates considering careers and graduate school Events for graduate students.
Recap: Reaching defns algorithm From last time: reaching defns worklist algo We want to avoid using structure of the domain outside of the flow functions.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Pointer analysis. Pointer Analysis Outline: –What is pointer analysis –Intraprocedural pointer analysis –Interprocedural pointer analysis Andersen and.
Data-Flow Analysis. Approaches Static Analysis Inspections Dependence analysis Symbolic execution Software Verification Data flow analysis Concurrency.
Data Flow Analysis. 2 Source code parsed to produce AST AST transformed to CFG Data flow analysis operates on control flow graph (and other intermediate.
MIT Foundations of Dataflow Analysis Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
CS 598 Scripting Languages Design and Implementation 9. Constant propagation and Type Inference.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Chaotic Iterations Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
Optimization Simone Campanoni
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 8: Static Analysis II Roman Manevich Ben-Gurion University.
Lub and glb Given a poset (S, · ), and two elements a 2 S and b 2 S, then the: –least upper bound (lub) is an element c such that a · c, b · c, and 8 d.
DFA foundations Simone Campanoni
Data Flow Analysis Suman Jana
Spring 2016 Program Analysis and Verification
Textbook: Principles of Program Analysis
Dataflow analysis.
Simone Campanoni DFA foundations Simone Campanoni
Iterative Program Analysis Abstract Interpretation
Topic 10: Dataflow Analysis
Global optimizations.
University Of Virginia
Another example: constant prop
Dataflow analysis.
Data Flow Analysis Compiler Design
Lecture 20: Dataflow Analysis Frameworks 11 Mar 02
Pointer analysis.
Static Single Assignment
Live variables and copy propagation
Presentation transcript:

Flow Analysis Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Helpful Reading: Sections 1.1-1.5, 2.1

Data-flow analysis (DFA) A framework for statically proving facts about program data. Focuses on simple, finite facts about programs. Necessarily over- or under-approximate (may or must). Conservatively considers all possible behaviors. Requires control-flow information; i.e., a control-flow graph. If imprecise, DFA may consider infeasible code paths! Examples: reaching defs, available expressions, liveness,…

Control-flow graphs (CFGs) fact(n) entry define fact(n : int) { s := 1; while (n > 1) s := s*n; n := n-1; } return s; s := 1; n > 1 s := s*n; return s; n := n-1; fact(n) exit

Control-flow graphs (CFGs) Intraprocedural CFGs may have a single entry/exit. Nodes can be a full basic block or a single statement. Each block or statement has 1+ predecessor and 1+ successor (stmt or block). A fork point is where paths diverge and a join point is where paths come together. fact(n) entry s := 1; n > 1 s := s*n; return s; n := n-1; fact(n) exit

Data-flow analysis Computed by propagating facts forward or backward. Computes may or must information. Reaching definitions/assignments (def-use info): which assignments may reach each variable reference (use). Liveness: which variables are still needed at each point. Available expressions: which expressions are already stored. Very busy expressions: which expressions are computed down all possible paths forward.

Reaching defs, worklist algorithm Reaching definitions is a forward may analysis. gen(s) yields facts generated by a statement s. kill(s) yields facts invalidated by a statement s. pred(s) and succ(s) yield sets of statements preceding/succ. entry(s) = ∪s’ ∈ pred(s)exit(s’); exit(s) = (entry(s) \ kill(s)) ∪ gen(s) gen, kill, pred, succ, are fixed for each s; exit is monotonic in entry (if entry(s) grows, exit(s) grows); entry for exits of preds. We iterate rules for entry/exit until reaching a fixed point.

Reaching defs, worklist algorithm Main idea: worklist-based fixed-point algorithm. All entry(s) and exit(s) are initialized to be empty. Add all statements to the worklist; all must be considered. Until the worklist is empty, remove an s from worklist: Compute entry(s) as union of all exit(s’), of predecessor s’ Compute exit(s) from gen(s), kill(s), and entry(s) If exit(s) was increased, add all succ(s) to the worklist (This version is for forward may kill/gen analyses.)

Reaching defs, worklist algorithm exit(s) = ∅ W = all statements s //worklist while W not empty: remove s from W entry(s) = ∪s’∈pred(s)exit(s’) update = (entry(s) \ kill(s)) ∪ gen(s) if update != exit(s): exit(s) = update W = W ∪ succ(s) Forward may analysis

Reaching definitions analysis fact(n)0 entry Stmt GEN KILL s := 1;1 (s,1) (s,*) n > 12 s := s*n;3 (s,3) (s,3) n := n-1;4 (n,4) (n,4) return s;5 s := 1;1 n > 12 s := s*n;3 return s;5 n := n-1;4 fact(n) exit

Reaching definitions analysis fact(n)0 entry (n,0) s := 1;1 (n,0) (s,1) (n,4) (s,3) n > 12 (n,0) (s,1) (n,4) (s,3) s := s*n;3 return s;5 (n,0) (s,3) (n,4) n := n-1;4 fact(n) exit (n,4) (s,3)

Reaching definitions analysis all(x) = {(x,ℓ) | ∀ℓ} fact(n)0 entry RDentry(1) = RDexit(0) RDentry(2) = RDexit(1) ∪ RDexit(4) RDentry(3) = RDexit(2) RDentry(4) = RDexit(3) RDentry(5) = RDexit(2) s := 1;1 n > 12 RDexit(0) = {(n,0)} RDexit(1) = RDentry(1)\all(s) ∪ {(s,1)} RDexit(2) = RDentry(2) RDexit(3) = RDentry(3)\all(s) ∪ {(s,3)} RDexit(4) = RDentry(4)\all(n) ∪ {(n,4)} RDexit(5) = RDentry(5) s := s*n;3 return s;5 n := n-1;4 fact(n) exit

Lattices Facts range over lattices: partial orders with joins (least upper bounds) and meets (greatest lower bounds). {(s,1),(s,3),(n,4)} ⊤ {(s,1),(s,3)} {(s,1),(n,4)} {(s,3),(n,4)} {(s,1)} {(s,3)} {(n,4)} ∅ ⟂

Lattices Facts range over lattices: partial orders with joins (least upper bounds) and meets (greatest lower bounds). A partial order is a set X and an ordering (X, ⊑) that is: Reflexive; ∀x. x ⊑ x Transitive; ∀x,y,z. x ⊑ y ∧ y ⊑ z ⇒ x ⊑ z Anti-symmetric; ∀x,y. x ⊑ y ∧ y ⊑ x ⇒ x = y Lattices must also have unique joins and meets for any 2 points. Complete lattices have unique joins/meets for any set. Cartesian product of lattices is a lattice. Map with a lattice co- domain is a lattice.

Reaching definitions analysis The 11 sets RDexit(0), RDentry(1), RDexit(1),…RDexit(5), are defined in terms of one another. Written as a vector of sets: Our set of equations can be turned into a monotonic F: So that a satisfying vector of reachable defs is a fixed point: For example, the join point would end up encoded as: RD RD0 ⊑ RD1 ⇒ F(RD0) ⊑ F(RD1) RD = F(RD) = Fn(⟂) for some n F(…,RDexit(1),…,RDexit(4),…) = (…,RDexit(1) ∪ RDexit(4),…)

Very busy expressions analysis Computes a set of expressions that are computed down all paths forward before any subexpressions change value. Assignments represent GEN for the right hand side and KILL for expressions containing the right hand side (assigned var). Is a backward must data-flow analysis: Propagates a set of computed expressions backward. Computes the meet (GLB, intersection) of entry(s’) in s’∈succ(s) at each fork point to obtain exit(s).

Very busy expressions analysis fact(n)0 entry Stmt GEN KILL s := 1;1 1 s*n n > 12 s := s*n;3 s*n n := n-1;4 n-1 s*n return s;5 s := 1;1 n > 12 s := s*n;3 return s;5 n := n-1;4 fact(n) exit

Very busy expressions analysis fact(n)0 entry 1 s := 1;1 ∅ n > 12 ∅ s*n, n-1 ∅ s := s*n;3 return s;5 n-1 ∅ n := n-1;4 fact(n) exit ∅

May/Must & Forward/Backward Forward, computes exit(s) from entry(s) Join (∪) at CFG join points e.g., Reaching Defs (use-def) (which assignments reach uses) Meet (∩) at CFG join points e.g., Available Expressions Backward Backward, computes entry(s) from exit(s) Join (∪) at CFG fork points e.g., Live Variables Meet (∩) at CFG fork points e.g., Very Busy Expressions

Forward may analysis exit(s) = ⟂ W = all statements s //worklist while W not empty: remove s from W entry(s) = ∪s’∈pred(s)exit(s’) update = (entry(s) \ kill(s)) ∪ gen(s) if update != exit(s): exit(s) = update W = W ∪ succ(s) Forward may analysis

Forward must analysis exit(s) = ⊤ // except ⟂ at function entry W = all statements s //worklist while W not empty: remove s from W entry(s) = ∩s’∈pred(s)exit(s’) update = (entry(s) \ kill(s)) ∪ gen(s) if update != exit(s): exit(s) = update W = W ∪ succ(s) Forward must analysis

Backward may analysis entry(s) = ⟂ W = all statements s //worklist while W not empty: remove s from W exit(s) = ∪s’∈succ(s)entry(s’) update = (exit(s) \ kill(s)) ∪ gen(s) if update != entry(s): entry(s) = update W = W ∪ succ(s) Backward may analysis

Backward must analysis entry(s) = ⊤ // except ⟂ at function exit W = all statements s //worklist while W not empty: remove s from W exit(s) = ∩s’∈succ(s)entry(s’) update = (exit(s) \ kill(s)) ∪ gen(s) if update != entry(s): entry(s) = update W = W ∪ succ(s) Backward must analysis

Abstract interpretation A general methodology for justifying or calculating sound analyses, given a precise semantics for the target language. Abstract interpretation establishes abstract semantic domains and a Galois connection between concrete and abstract. A function alpha (α: X → X) defines a notion of abstraction. A function gamma (γ: X → X) defines a corresponding notion of concretization (α implies γ and vice versa; more on this…). A concrete interpreter (F: X → X) and Galois connection can be used to justify or calculate an abstract interpretation: ^ ^ ^ α∘ F ∘ γ ⊑ F

Abstraction/Concretization (Galois) ^ α( x ) ⊑ x if and only if x ⊑ γ( x ) ^

Abstraction/Concretization (Galois conn.) γ γ( ) ^ x ^ x ⊑ ⊑ x α( x ) α ^ X X

Abstraction/Concretization (Galois conn.) γ Int {…,-1,0,1,…} γ ⊑ ⊑ {1,2,3,…} Pos-Int ⊑ ⊑ {1} {2} {3} … α( 2 ) = Pos-Int α Values Simple Types

Constant Propagation Forward must style of DFA. Or as an abstract interpretation: Uses a flat lattice of constants with top and bottom (C, ⊑): Facts become sets of pairs (Var x C) or a map (Env: Var → C). ⊤ … … 1 2 … “a” … #f void … … ⟂

Flow analysis Intraprocedural analysis: considers functions independently. Interprocedural analysis: considers multiple functions together. Whole-program analysis: considers an entire program at once. DFA is great for simple, local-variable-focused analyses. Analysis of heap-allocated data is much harder. The simple case is called pointer analysis (aliases, nullable). The general case is called shape analysis (full data-structures).

What about Scheme or ANF/CPS IRs?

Depends on call sites for the lambda that binds it. (lambda (k f x y) (let ([a (prim + x y)]) (f k a))) What value can f be? Data-flow depends upon control-flow.

Where does control propagate Depends on the possible values of parameter f. (lambda (k f x y) (let ([a (prim + x y)]) (f k a))) Where does control propagate from this call-site? Control-flow depends upon data-flow.

The higher-order control-flow problem: Data-flow and control-flow properties are thoroughly entangled and mutually dependent.

Control-flow analysis: The solution? Control-flow analysis: Simultaneously model control-flow behavior and data-flow behavior in a single analysis.

Control-flow analysis Use abstract interpretation to produce a single uniform analysis of all interdependent program properties. CFAs are whole-program interprocedural analyses. (Shivers 1991) k-CFA tracks k latest call-sites; bindings have context. (EXPTIME) 0-CFA produces a single model of each function. (O(n3)) Abstracting abstract machines (AAM): a unified methodology for deriving CFAs from concrete (precise) abstract machines! (Van Horn and Might 2010)

CPS lambda calculus e∈Exp ::= (ae ae …) ae∈AExp ::= x | lam lam∈Lam ::= (lambda (x …) e) x∈Var is a set of variables

CPS lambda calculus (semantics) DOMAINS ATOMIC EVAL State: Exp x Env Env: Var → Clo Clo: Lam x Env A(lam,env) = (lam,env) A(x,env) = env(x) SMALL-STEP TRANSITION ((ae0 … aej), env) → (e0, env’), where ((lambda (x1 … xj) e0),envc) = A(ae0) cloi = A(aei) env’ = envc[xi→cloi]

Collecting semantics (generates a trace) Tarski. (1955)

Might. Abstract interpreters for free. (2010) Exp x Env Env: Var → Clo Might. Abstract interpreters for free. (2010)

CPS store-passing semantics DOMAINS State: Exp x Env X Store Env: Var → Addr Store: Addr → Clo Clo: Lam x Env Addr: some infinite set ATOMIC EVAL A(lam,env,st) = (lam,env) A(x,env,st) = st(env(x))

CPS store-passing semantics SMALL-STEP TRANSITION ((ae0 … aej), env, st) → (e0, env’, st’), where ((lambda (x1 … xj) e0),envc) = A(ae0) cloi = A(aei) env’ = envc[xi→alloc(xi)] st’ = st[alloc(xi)→cloi] alloc(xi) = fresh address = xi (yields 0-CFA)

Now we may finitize the set Addr State: Exp x Env X Store Env: Var → Addr Store: Addr → Clo Clo: Lam x Env

Now we may finitize the set Addr State: Exp x Env X Store Env: Var → Addr Store: Addr → Clo: Lam x Env ℘( ) Clo

Cousot and Cousot (1977), Cousot (2000) … -3 … -2 -1 1 2 … 3 … abstraction Negative Zero Int Positive ⊑ Int Cousot and Cousot (1977), Cousot (2000)

abstraction … … ⊑

Abstract abstract machines [ax → {Int}, ay → {Int}, af → {(λ (w) e1), (λ (z) e2)}] [x → ax, y → ay, f → af] ( ) , , (f x)

Abstract-state transition [ax → {Int}, ay → {Int}, af → {(λ (w) e1), (λ (z) e2)}] [x → ax, y → ay, f → af] ( ) , , (f x)

e

Soundness concrete transition abstraction abstraction abstract transition

Exponential complexity { values (…, store) { ✔︎ ✔︎ addresses ✔︎ ✔︎ ✔︎

Global store widening (flow-sensitive) ✔︎ ✔︎ ✔︎ ✔︎ … ✔︎ … … … ✔︎ ✔︎ ✔︎ ✔︎ … … ✔︎ Control-flow graph per-point stores

Global store widening (flow-insensitive) … … ✔︎ ✔︎ ✔︎ ✔︎ … … , ✔︎ Control-flow graph Global store

Flow-insensitive (CPS) 0-CFA { ( ) lambdas { call call ✔︎ ✔︎ variables ✔︎ ✔︎ , call call ✔︎ Variables map to sets of lambdas Control-flow graph (of call-sites)

Let’s live-code this 0-CFA