Dataflow Analysis: Dataflow Frameworks

Slides:

Advertisements

Similar presentations

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.

Advertisements

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.

CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.

1 CS 201 Compiler Construction Data Flow Framework.

1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,

Lecture 15 – Dataflow Analysis Eran Yahav 1

Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.

1 Data flow analysis Goal : –collect information about how a procedure manipulates its data This information is used in various optimizations –For example,

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

Data Flow Analysis Compiler Design Nov. 3, 2005.

From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.

Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.

CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.

Administrative stuff Office hours: After class on Tuesday.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.

San Diego October 4-7, 2006 Over 1,000 women in computing Events for undergraduates considering careers and graduate school Events for graduate students.

Recap: Reaching defns algorithm From last time: reaching defns worklist algo We want to avoid using structure of the domain outside of the flow functions.

1 Data-Flow Frameworks Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity.

U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.

1 CS 201 Compiler Construction Lecture 4 Data Flow Framework.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.

Partially Ordered Sets (POSets)

Relations Chapter 9.

Sets, POSets, and Lattice © Marcelo d’Amorim 2010.

Abstract Interpretation (Cousot, Cousot 1977) also known as Data-Flow Analysis.

1 CS 201 Compiler Construction Data Flow Analysis.

MIT Foundations of Dataflow Analysis Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.

Solving fixpoint equations

Machine-Independent Optimizations Ⅱ CS308 Compiler Theory1.

Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.

The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.

Chapter 9. Chapter Summary Relations and Their Properties n-ary Relations and Their Applications (not currently included in overheads) Representing Relations.

Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Still at dataflow frameworks.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.

Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.

Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.

Iterative Dataflow Problems Taken largely from notes of Alex Aiken (UC Berkeley) and Martin Rinard (MIT) Dataflow information used in optimization Several.

Optimization Simone Campanoni

U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2011 Yet More Data flow analysis John Cavazos.

Dataflow Analysis CS What I s Dataflow Analysis? Static analysis reasoning about flow of data in program Different kinds of data: constants, variables,

Compiler Principles Fall Compiler Principles Lecture 9: Dataflow & Optimizations 2 Roman Manevich Ben-Gurion University of the Negev.

Code Optimization Data Flow Analysis. Data Flow Analysis (DFA)  General framework  Can be used for various optimization goals  Some terms  Basic block.

DFA foundations Simone Campanoni

Binary Relation: A binary relation between sets A and B is a subset of the Cartesian Product A x B. If A = B we say that the relation is a relation on.

Relations Chapter 9 Copyright © McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill.

Chapter 6 Order Relations and Structures

Data Flow Analysis Suman Jana

Relations Chapter 9.

Simone Campanoni DFA foundations Simone Campanoni

Global optimizations.

University Of Virginia

Sungho Kang Yonsei University

Fall Compiler Principles Lecture 10: Global Optimizations

Data Flow Analysis Compiler Design

Lecture 20: Dataflow Analysis Frameworks 11 Mar 02

Topic-4a Dataflow Analysis 2019/2/22 \course\cpeg421-08s\Topic4-a.ppt.

Static Single Assignment

Dataflow Analysis, cont.

Background material.

Live variables and copy propagation

Presentation transcript:

Dataflow Analysis: Dataflow Frameworks

Outline of Today’s Class Catch up Dataflow frameworks Lattices Transfer functions Worklist algorithm Reading: Dragon Book, Chapter 9.2and 9.3 Spring 19 CSCI 4450/6450, A Milanova

Dataflow Analysis Control-flow graph (CFG): G = (N, E, 1) Nodes are basic blocks Data Dataflow equations out(j) = (in(j) – kill(j)) gen(j) (gen and kill are parameters) Merge operator V in(j) = V out(i) i is predecessor of j Entry node: 1 2 3 4 5 6 How do we write a specific dataflow analysis? First, we have to define what kind of data it will be collecting: available expressions, reaching definitions, something else (pretty much anything!). Second, we have to define the data-flow equations. There is a data-flow equation out(i) = gen(i) U (in(i)-kill(i)) associated to each basic block (i.e., node in the CFG). This data-flow equation reflects the effect of the basic block on the data that we are collecting: each statement generates some new data, and “kills” some incoming data. gen and kill are parameters to the framework. Third, we choose the merge operator V: how do we combine data coming from different paths at merge nodes? There is another equation in(i) = V out(j) --- it collects the incoming data into node i by merging (appropriately) outgoing data over all of i’s predecessors. 7 Exit node: 8 9 10

Problem 1: Reaching Definitions Forward, may dataflow problem in(j) j Usually, when we define data-flow equations, we skip the “out”. We define the data-flow equation for in(j) in terms of the in’s of j’s predecessors. What are the primitive dataflow facts? Definitions, e.g., (x,1),(y,6) Equations act on sets of definitions. Spring 19 CSCI 4450/6450, A Milanova

Problem 2. Live Uses of Variables (Live) We say that a variable x is “live on exit from node j” if there is a live use of x on exit from j (recall the definition of “live use of x on exit from j”) Problem statement: for each node n, compute the set of variables that may be live on exit from n. We say that variable x is “live at exit of node i”, if there is a use of x which is live on exit from i (recall the definition of live use of x). Or in other words, x is live at exit from i, if there is a path from i to some use of x at n, and the path is free of a definition of x. Variable x is not live at the exit from label 1. the first assignment is redundant. Both x and y are live at the exit of 3. 1. x=2; 2. y=4; 3. x=1; if (y>x) then 5. z=y; else 6. z=y*y; 7. x=z; What variables are live on exit from statement 3? Statement 1?

Live Example 1.x=2 2.y=4 3.x=1 4.(y>x) T F 5.z=y 6.z=y*y 7.x=z

Live Uses of Variables (Live) Data Primitive facts: variables x Propagates sets: {x,y,z} Dataflow equations. At j: x = y+z killLV(j): {x} genLV(j): {y,z} Merge operator: set union Spring 19 CSCI 4450/6450, A Milanova

Live Uses of Variables (Live) Problem statement: for each node n, compute the set of variables that may be live on exit from n. inLV(j)= (outLV(j) – killLV(j)) genLV(j) j: x = y+z outLV(j) = { inLV(i) | i is a successor of j } We say that variable x is “live at exit of node i”, if there is a use of x which is live on exit from i (recall the definition of live use of x). Or in other words, x is live at exit from i, if there is a path from i to some use of x at n, and the path is free of a definition of x. Variable x is not live at the exit from label 1. the first assignment is redundant. Both x and y are live at the exit of 3. Q: What are the primitive dataflow facts? Q: What is genLV(j)? Q: What is killLV(j)? Spring 19 CSCI 4450/6450, A Milanova

Problem 2: Live Uses of Variables Backward, may dataflow problem j out(j) i1 i2 i3 out(i1) out(i2) out(i3) What are the primitive dataflow facts? Variables, e.g., x,y,z. Equations act on sets of variables.

Problem 3: Available Expressions (Avail) An expression x op y is available at program point n if every path from entry to n evaluates x op y, and after every evaluation prior to reaching n, there are NO subsequent assignments to x or y 1 x op y x = … y = … x op x x = … y = … x op y x = … y = … n Spring 19 CSCI 4450/6450, A Milanova

Avail Enables Global Common Subexpressions q=a*b z=a*b r=2*z u=a*b z=u/2 W=a*b Cannot be eliminated because a*b is not available on all paths. w=a*b Spring 19 CSCI 4450/6450, A Milanova

Avail Enables Global Common Subexpressions Can we eliminate w=a*b? t1=a*b z=t1 r=2*z t1=a*b q=t1 u=t1 z=u/2 w=a*b Spring 19 CSCI 4450/6450, A Milanova

Available Expressions (Avail) Data? Primitive dataflow facts are expressions, e.g., x+y, a*b, a+2 Analysis propagates sets of expressions, e.g., {x+y,a*b} Dataflow equations at j: x = y op z? outAE(j) = (inAE(j) – killAE(j)) genAE(j) killAE(j): all expressions with operand x: (x op _),(_ op x) genAE(j): new expression: {(y op z)}

Available Expressions (Avail) Merge operator? For Avail, it is set intersection inAE(j) = { outAE(i) | i is predecessor of j } j Spring 19 CSCI 4450/6450, A Milanova

Example 1.y=a+b 2.x=a*b 3.if y<=a*b 4.a=a+1 5.x=a*b 6.goto 3 7. … What is the data that is propagated: sets of expressions e.g., {a+b, a*b} Forward or backward: forward Equations: out(i) = (in(i) – kill(i)) U gen(i) What is gen(i): all expressions computed at i whose operands are not defined at i. E.g., a*b is generated at 5, but a+1 is not generated at 4 because a was defined at 4! What is kiil(i): all expressions that have an operand defined at i are killed. E.g., a+b and a*b are killed at 4. 4. Is it a must or a may problem: A MUST problem. In order for an expression to be available at node n, it must be available on all paths to n. Thus, in(i) = intersection of out(j) over all predecessors j of i. 6.goto 3 7. …

Problem 3: Available Expressions in(i1) in(i2) in(i3) i1 i2 i3 Forward, must dataflow problem in(j) j x=y+z Is it a forward or a backward problem? Is it a may problem or a must problem? What are the primitive dataflow facts? Expressions, e.g., x+y,a*b. Equations act on sets of expressions. Spring 19 CSCI 4450/6450, A Milanova

Problem 4: Very Busy Expressions (VeryB) An expression x op y is very busy at node n, if along EVERY path from n to the end of the program, we come to a computation of x op y BEFORE any redefinition of x or y. n X = … Y = … t1=X op Y X = … Y = … t1=X op Y X = … Y = … t1=X op Y Spring 19 CSCI 4450/6450, A Milanova

Very Busy Expressions (VeryB) Data? Primitive dataflow facts are expressions, e.g., x+y, a*b Analysis propagates sets of expressions, e.g., {x+y,a*b} Dataflow equations at j: x = y op z? inVB(j) = (outVB(j) – killVB(j)) genVB(j) killVB(j): all expressions with operand x: (x op _),(_ op x) genVB(j): new expression: {(y op z)}

Very Busy Expressions (VeryB) Merge operator? For VeryB, it is set intersection outVB(j) = { inVB(i) | i is successor of j } j Spring 19 CSCI 4450/6450, A Milanova

Very Busy Expressions j Backward, must dataflow problem outVB(j) i1 i2 outVB(i1) outVB(i2) outVB(i3) Spring 19 CSCI 4450/6450, A Milanova

Another Example: Taint Analysis A definition (x,k) is tainted if k is designated as a taint source, or (x,k) is computed based on an operand that is tainted. Problem statement: for each node n, compute the set of tainted definitions that may reach n. Spring 19 CSCI 4450/6450, A Milanova

Example: Taint Analysis (explicit flow) 1.x=read() 2.y=1 3.x>=2 4.y=x*y 5.x=x-1 6.goto 3 7.z=y-1

Outline of Today’s Class Catch up Dataflow frameworks Lattice Transfer functions Worklist algorithm Reading: Dragon Book, Chapter 9.2and 9.3 Spring 19 CSCI 4450/6450, A Milanova

Dataflow Problems May Problems Must Problems Forward Problems Reaching Definitions Available Expressions Backward Problems Live Uses of Variables Very Busy Expressions Spring 19 CSCI 4450/6450, A Milanova

Similarities Analyses operate over similar property spaces In all cases, analysis operates over a finite set D of primitive dataflow facts Reach: D is the set of all definitions in the program: e.g., {(x,1),(y,2),(x,4),(y,5)} Avail and VeryB: D is the set of all arithmetic expressions: e.g., { a+b,a*b,a+1} Live: D is the set of all variables e.g., { x,y,z } Solution at node n is a subset of D (e.g., a definition either reaches n or it does not reach n) Spring 19 CSCI 4450/6450, A Milanova

Similarities Dataflow equations have the same form (from now on, we’ll focus on forward problems): out(j) = (in(j) – kill(j)) gen(j) = (in(j) pres(j)) gen(j) in(j) = { V out(i) | i is predecessor of j } pres(j) is the complement of kill(j) A note: what makes the 4 classical problems special is that sets kill(j)/pres(j) and gen(j) do not depend on in(j) Thus, set union and set intersection can be implemented as logical OR and AND respectively Spring 19 CSCI 4450/6450, A Milanova

Similarities out(j) = fj(in(j)) The dataflow equation at node j is a transfer functions. It take in(j) as argument and produces out(j) as result: out(j) = fj(in(j)) Spring 19 CSCI 4450/6450, A Milanova

Dataflow Frameworks We generalize and study the properties of the property space Property space is a lattice Choice settles merge operator We generalize and study the properties of the transfer function space Functions are monotone or distributive We generalize and study the properties of the worklist algorithm that computes a solution Spring 19 CSCI 4450/6450, A Milanova

Lattice Theory Partial ordering (denoted by ≤ or ) Relation between pairs of elements Reflexive a ≤ a Anti-symmetric a ≤ b and b ≤ a ==> a = b Transitive a ≤ b and b ≤ c ==> a ≤ c Partially ordered set (poset) (set S, ≤) 0 Element 0 ≤ a, for every a in S 1 Element a ≤ 1, for every a in S We don’t necessarily need 0 and 1 element. Spring 19 CSCI 4450/6450, A Milanova

Poset Example {a,b,c} D = {a,b,c} The poset is 2D, ≤ is set inclusion {a,c} A canonical example of a poset is the poset of subsets. Let D be a finite set. The subsets of D form a poset under the set inclusion relation. {a} {b} {c} Spring 19 CSCI 4450/6450, A Milanova {}

Lattice Theory Greatest lower bound (glb) Least upper bound (lub) l1, l2 in poset S, a in poset S is the glb(l1,l2) iff 1) a ≤ l1 and a ≤ l2 2) for any b in S, b ≤ l1, b ≤ l2 implies b ≤ a If glb exists, it is unique. Why? Called meet (denoted by Λ or┌┐) of l1 and l2. Least upper bound (lub) l1, l2 in poset S, c in poset S is the lub(l1,l2) iff 1) c ≥ l1 and c ≥ l2 2) for any d in S, d ≥ l1, d ≥ l2 implies d ≥ c If lub exists, it is unique. Called join (denoted by V or└┘) of l1 and l2.

Definition of a Lattice (L, Λ, V) A lattice L is a poset under ≤, such that every pair of elements has a glb (meet) and lub (join) A lattice need not contain a 0 or 1 element A finite lattice must contain 0 and 1 elements Not every poset is a lattice If there is element a such that a ≤ x for every x in L, then a is the 0 element of L If there is a such that x ≤ a for every x in L, then a is the 1 element of L Spring 19 CSCI 4450/6450, A Milanova

A Poset but Not a Lattice There is no lub(e3,e4) in this poset so it is not a lattice. Suppose we add the lub(e3,e4), is it a lattice? Spring 19 CSCI 4450/6450, A Milanova

Is This Poset a Lattice {a,b,c} D = {a,b,c} The poset is 2D, ≤ is set inclusion {a,b} {b,c} {a,c} A canonical example of a poset is the poset of subsets. Let D be a finite set. The subsets of D form a poset under the set inclusion relation. {a} {b} {c} Spring 19 CSCI 4450/6450, A Milanova {}

Examples of Lattices H = (2D, ∩, U) where D is a finite set glb(s1,s2) denoted s1Λs2, is set intersection s1∩s2 lub(s1,s2) denoted s1Vs2, is set union s1Us2 J = (N1, gcd, lcm) Partial order is integer divide on N1 lub(n1,n2) denoted n1Vn2 is lcm(n1,n2) glb(n1,n2) denoted n1Λn2 is gcd(n1,n2) (N1 denotes natural numbers starting at 1) Spring 19 CSCI 4450/6450, A Milanova

Chain A poset C where for every pair of elements c1, c2 in C, either c1 ≤ c2 or c2 ≤ c1. E.g., {} ≤ {a} ≤ {a,b} ≤ {a,b,c} E.g., from the lattice J as shown here, 1 ≤ 2 ≤ 6 ≤ 30 1 ≤ 3 ≤ 15 ≤ 30 A lattice s.t. every ascending chain is finite, is said to satisfy the Ascending Chain Condition 30 6 15 10 2 5 3 1 Spring 19 CSCI 4450/6450, A Milanova

Lattices in Dataflow Analysis Lattices define property space Lattices entail properties of the standard dataflow analysis solution procedure (the worklist algorithm, which we will study shortly) Spring 19 CSCI 4450/6450, A Milanova

Dataflow Lattices: Reach D = all definitions:{(x,1),(x,4),(a,3)} Poset is 2D, ≤ is the subset relation {(x,1),(x,4),(a,3)} 1 1. x=a*b 2. if y<=a*b {(x,1),(x,4)} {(x,4),(a,3)} {(x,1),(a,3)} 3. a=a+1 {(x,1)} {(x,4)} {(a,3)} 4. x=a*b 5. goto 3 Spring 19 CSCI 4450/6450, A Milanova {}

Dataflow Lattices: Avail D = all expressions: {a*b,a+1,y*z} Poset is 2D, ≤ is the superset relation {} 1 1. x:=a*b 2. if y*z<=a*b {a*b} {a+1} {y*z} 3. a:=a+1 {a*b,y*z} {a*b,a+1} {a+1,y*z} 4. x:=a*b 5. goto 2 Spring 19 CSCI 4450/6450, A Milanova {a*b,a+1,y*z}

Dataflow Frameworks Equations: in(j) = V out(i) out(j) = fj(in(j)) where: in(j), out(j) are elements of a property space fj is the transfer function associated with node j V is the merge operator i in pred(j) To instantiate an analysis in the framework we must instantiate The property space, 2) the transfer functions. There are requirements on the property space, transfer functions and merge operator! Spring 19 CSCI 4450/6450, A Milanova

Dataflow Frameworks (cont.) The property space must be: 1. A lattice L, ≤ 2. L satisfies the Ascending Chain Condition Requires that all ascending chains are finite The merge operator V must be the join of L In dataflow, L is often the lattice of the subsets over a finite set of dataflow facts D Choose universal set D (e.g., all definitions) Choose ordering operation ≤. Since the merge operator is must be to the join of L, a may problem entails that ≤ is subset. Conversely, a must problem entails that ≤ is superset The requirements on the property space and combination operator are listed above. The requirement on the transfer functions is that they must be monotone (we discuss monotonicity a little later).

Example: Reach Lattice Property space is the lattice of the subsets where D is the set of all definitions in the program ≤ is the subset operation Join is set union , as needed for Reach, which is a may problem Lattice has 0 being {}, and 1 being D Lattice satisfies the Ascending Chain Condition Spring 19 CSCI 4450/6450, A Milanova

Reach Lattice D = all definitions:{(x,1),(x,4),(a,3)} Poset is 2D, ≤ is the subset relation {(x,1),(x,4),(a,3)} 1 1. x=a*b 2. if y<=a*b {(x,1),(x,4)} {(x,4),(a,3)} {(x,1),(a,3)} 3. a=a+1 {(x,1)} {(x,4)} {(a,3)} 4. x=a*b 5. goto 3 Spring 19 CSCI 4450/6450, A Milanova {}

Example: Avail Lattice Property space is the lattice of the subsets where D is the set of all expressions in the program ≤ is superset join of the lattice is set intersection, as needed for Avail, which is a must problem Lattice has 0 being D, and 1 being {} Lattice satisfies Ascending Chain Condition Spring 19 CSCI 4450/6450, A Milanova

Dataflow Lattices: Avail D = all expressions: {a*b,a+1,y*z} Poset is 2D, ≤ is the superset relation {} 1 1. x:=a*b 2. if y*z<=a*b {a*b} {a+1} {y*z} 3. a:=a+1 {a*b,y*z} {a*b,a+1} {a+1,y*z} 4. x:=a*b 5. goto 2 Spring 19 CSCI 4450/6450, A Milanova {a*b,a+1,y*z}

Transfer Functions The transfer functions: fj: L L. Formally, function space F is such that F contains all fj, F contains the identity function id(x) = x F is closed under composition. Each fj is monotone Spring 19 CSCI 4450/6450, A Milanova

Monotonicity F: L L is monotone if and only if: (1) a,b in L, f in F then a ≤ b f(a) ≤ f(b) or (equivalently): (2) x,y in L, f in F then f(x) V f(y) ≤ f(x V y) Theorem: Definitions (1) and (2) are equivalent. Show that (1) implies (2) Show that (2) implies (1) Spring 19 CSCI 4450/6450, A Milanova

Distributivity F: L  L is distributive if and only if x,y in L, f in F then f(x V y) = f(x) V f(y) Every distributive function is also monotone but not the other way around Distributivity is a very nice property! Spring 19 CSCI 4450/6450, A Milanova

Monotonicity and Distributivity Is classical Reach distributive? Yes To show distributivity: For each j ( ( in(j) U in’(j) ) ∩ pres(j) ) U gen(j) = ((in(j)∩pres(j)) U gen(j)) U ((in’(j)∩pres(j)) U gen(j)) ( ( in(j) U in’(j) ) ∩ pres(j) ) U gen(j) = ( ( in(j) ∩ pres(j) ) U ( in’(j) ∩ pres(j) ) ) U gen(j) = Spring 19 CSCI 4450/6450, A Milanova

Monotone Dataflow Frameworks A problem fits into the dataflow framework if its property space is a lattice L, ≤ that satisfies the Ascending Chain Condition its merge operator V is the join of L and its function space F: L L is monotone Thus, we can make use of a generic solution procedure, known as the worklist algorithm or the maximal fixpoint algorithm or the fixpoint iteration algorithm

Worklist Algorithm for Forward Dataflow Problems /* Initialize to initial values; 1 is entry node of CFG */ in(1) = InitialValue; inReach (1) = UNDEF (or {}) for m = 2 to n do in(m) = 0 inReach (m) = {} W = {1,2,…,n} /* put every node on the worklist */ while W ≠ Ø do { remove j from W out(j) = fj(in(j)) outReach(j) = inReach(j)∩pres(j)Ugen(j) for i in successors(j) if out(j) ≤ in(i) then { if outReach(j) inReach (i) in(i) = out(j) V in(i) inReach(i) = outReach(j) U inReach(i) W = W U { i } }

Worklist Algorithm for Forward Dataflow Problems (slightly different) /* Initialize to initial values; 1 is entry node of CFG */ in(1) = InitialValue; out(1) = f1(in(1)) for m := 2 to n do in(m) = 0; out(m) = fm(0) W := {2,…,n} /* put every node but 1 on the worklist */ while W ≠ Ø do { remove j from W in(j) = V { out(i) | i is predecessor of j } out(j) = fj(in(j)) if out(j) changed then W = W U { k | k is successor of j } } Spring 19 CSCI 4450/6450, A Milanova

Termination Argument Why does the algorithm terminate? Sketch of proof: At each iteration, at least one out(j) changes. Since out(j) in L, and L satisfies the Ascending Chain Condition, out(j) changes at most O(h) times where h is the height of the lattice L Spring 19 CSCI 4450/6450, A Milanova

Correctness Argument Theorem: The worklist algorithm computes a solution that satisfies the dataflow equations Why? Sketch of proof: Whenever j is processed, algorithms sets out(j) = fj(in(j)). Whenever out(j) changes, algorithm puts successors on the list, so in(j) = V { out(i) }. So final solution will satisfy equations. Spring 19 CSCI 4450/6450, A Milanova

Precision Argument Theorem: The algorithm computes the least solution of the dataflow equations. Historically though, this solution is often called the maximal fixpoint solution (MFP) I.e., For every node j, the worklist algorithm computes a solution MFP(j) = {in(j),out(j)}, such that every other solution {in’(j),out’(j)} of the dataflow equations is in(j) ≤ in’(j), out(j) ≤ out’(j) i.e., for every node, the MFP computes solution MFP={input(i),output(i)}, such that every other solution of the dataflow equations {input’(i),output’(i)} is “larger” than MFP. 1 z = x+y 2 while true 3 skip Spring 19 CSCI 4450/6450, A Milanova

Example Solution1 Solution2 1. z:=x+y 2. if (z > 500) 3. skip inAvail(1) = Ø Ø Ø 1. z:=x+y outAvail(1) = (inAvail(1)-Ez) {x+y} {x+y} {x+y} inAvail(2) = outAvail(1) V outAvail(3) {x+y} Ø 2. if (z > 500) outAvail(2) = inAvail(2) {x+y} Ø 3. skip inAvail(3) = outAvail(2) outAvail(3) = inAvail(3) Equivalent to: inAvail(2) = {x+y} V inAvail(2) and recall that V is ∩ (i.e., set intersection). Spring 19 CSCI 4450/6450, A Milanova

Many Applications! Static debugging Memory errors in C/C++ programs Memory leaks Null pointer dereferences Array-out-of-bound accesses Concurrency errors in shared-memory apps Data-races, atomicity violations, deadlocks Information flow (as known as taint analysis) Spring 19 CSCI 4450/6450, A Milanova

Many Applications! White-box testing: compute coverage Control-flow-based testing Data-flow-based testing Intuitively, test each def-use chain Regression testing Analyze changes and select regression tests that actually test changed code Spring 19 CSCI 4450/6450, A Milanova

Dataflow Analysis Classical technique Compared to Hoare logic, it captures state in a more coarse way Still relevant, many interesting problems are phrased in dataflow terms Spring 19 CSCI 4450/6450, A Milanova

Next Class MOP vs MFP solutions Two classical non-distributive dataflow analyses: Constant propagation and Points-to analysis Spring 19 CSCI 4450/6450, A Milanova