Download presentation
Presentation is loading. Please wait.
1
Dataflow Analysis: Dataflow Frameworks
2
Outline of Today’s Class
Catch up Dataflow frameworks Lattices Transfer functions Worklist algorithm Reading: Dragon Book, Chapter 9.2and 9.3 Spring 19 CSCI 4450/6450, A Milanova
3
Dataflow Analysis Control-flow graph (CFG): G = (N, E, 1)
Nodes are basic blocks Data Dataflow equations out(j) = (in(j) – kill(j)) gen(j) (gen and kill are parameters) Merge operator V in(j) = V out(i) i is predecessor of j Entry node: 1 2 3 4 5 6 How do we write a specific dataflow analysis? First, we have to define what kind of data it will be collecting: available expressions, reaching definitions, something else (pretty much anything!). Second, we have to define the data-flow equations. There is a data-flow equation out(i) = gen(i) U (in(i)-kill(i)) associated to each basic block (i.e., node in the CFG). This data-flow equation reflects the effect of the basic block on the data that we are collecting: each statement generates some new data, and “kills” some incoming data. gen and kill are parameters to the framework. Third, we choose the merge operator V: how do we combine data coming from different paths at merge nodes? There is another equation in(i) = V out(j) --- it collects the incoming data into node i by merging (appropriately) outgoing data over all of i’s predecessors. 7 Exit node: 8 9 10
4
Problem 1: Reaching Definitions
Forward, may dataflow problem in(j) j Usually, when we define data-flow equations, we skip the “out”. We define the data-flow equation for in(j) in terms of the in’s of j’s predecessors. What are the primitive dataflow facts? Definitions, e.g., (x,1),(y,6) Equations act on sets of definitions. Spring 19 CSCI 4450/6450, A Milanova
5
Problem 2. Live Uses of Variables (Live)
We say that a variable x is “live on exit from node j” if there is a live use of x on exit from j (recall the definition of “live use of x on exit from j”) Problem statement: for each node n, compute the set of variables that may be live on exit from n. We say that variable x is “live at exit of node i”, if there is a use of x which is live on exit from i (recall the definition of live use of x). Or in other words, x is live at exit from i, if there is a path from i to some use of x at n, and the path is free of a definition of x. Variable x is not live at the exit from label 1. the first assignment is redundant. Both x and y are live at the exit of 3. 1. x=2; 2. y=4; 3. x=1; if (y>x) then 5. z=y; else 6. z=y*y; 7. x=z; What variables are live on exit from statement 3? Statement 1?
6
Live Example 1.x=2 2.y=4 3.x=1 4.(y>x) T F 5.z=y 6.z=y*y 7.x=z
7
Live Uses of Variables (Live)
Data Primitive facts: variables x Propagates sets: {x,y,z} Dataflow equations. At j: x = y+z killLV(j): {x} genLV(j): {y,z} Merge operator: set union Spring 19 CSCI 4450/6450, A Milanova
8
Live Uses of Variables (Live)
Problem statement: for each node n, compute the set of variables that may be live on exit from n. inLV(j)= (outLV(j) – killLV(j)) genLV(j) j: x = y+z outLV(j) = { inLV(i) | i is a successor of j } We say that variable x is “live at exit of node i”, if there is a use of x which is live on exit from i (recall the definition of live use of x). Or in other words, x is live at exit from i, if there is a path from i to some use of x at n, and the path is free of a definition of x. Variable x is not live at the exit from label 1. the first assignment is redundant. Both x and y are live at the exit of 3. Q: What are the primitive dataflow facts? Q: What is genLV(j)? Q: What is killLV(j)? Spring 19 CSCI 4450/6450, A Milanova
9
Problem 2: Live Uses of Variables
Backward, may dataflow problem j out(j) i1 i2 i3 out(i1) out(i2) out(i3) What are the primitive dataflow facts? Variables, e.g., x,y,z. Equations act on sets of variables.
10
Problem 3: Available Expressions (Avail)
An expression x op y is available at program point n if every path from entry to n evaluates x op y, and after every evaluation prior to reaching n, there are NO subsequent assignments to x or y 1 x op y x = … y = … x op x x = … y = … x op y x = … y = … n Spring 19 CSCI 4450/6450, A Milanova
11
Avail Enables Global Common Subexpressions
q=a*b z=a*b r=2*z u=a*b z=u/2 W=a*b Cannot be eliminated because a*b is not available on all paths. w=a*b Spring 19 CSCI 4450/6450, A Milanova
12
Avail Enables Global Common Subexpressions
Can we eliminate w=a*b? t1=a*b z=t1 r=2*z t1=a*b q=t1 u=t1 z=u/2 w=a*b Spring 19 CSCI 4450/6450, A Milanova
13
Available Expressions (Avail)
Data? Primitive dataflow facts are expressions, e.g., x+y, a*b, a+2 Analysis propagates sets of expressions, e.g., {x+y,a*b} Dataflow equations at j: x = y op z? outAE(j) = (inAE(j) – killAE(j)) genAE(j) killAE(j): all expressions with operand x: (x op _),(_ op x) genAE(j): new expression: {(y op z)}
14
Available Expressions (Avail)
Merge operator? For Avail, it is set intersection inAE(j) = { outAE(i) | i is predecessor of j } j Spring 19 CSCI 4450/6450, A Milanova
15
Example 1.y=a+b 2.x=a*b 3.if y<=a*b 4.a=a+1 5.x=a*b 6.goto 3 7. …
What is the data that is propagated: sets of expressions e.g., {a+b, a*b} Forward or backward: forward Equations: out(i) = (in(i) – kill(i)) U gen(i) What is gen(i): all expressions computed at i whose operands are not defined at i. E.g., a*b is generated at 5, but a+1 is not generated at 4 because a was defined at 4! What is kiil(i): all expressions that have an operand defined at i are killed. E.g., a+b and a*b are killed at 4. 4. Is it a must or a may problem: A MUST problem. In order for an expression to be available at node n, it must be available on all paths to n. Thus, in(i) = intersection of out(j) over all predecessors j of i. 6.goto 3 7. …
16
Problem 3: Available Expressions
in(i1) in(i2) in(i3) i1 i2 i3 Forward, must dataflow problem in(j) j x=y+z Is it a forward or a backward problem? Is it a may problem or a must problem? What are the primitive dataflow facts? Expressions, e.g., x+y,a*b. Equations act on sets of expressions. Spring 19 CSCI 4450/6450, A Milanova
17
Problem 4: Very Busy Expressions (VeryB)
An expression x op y is very busy at node n, if along EVERY path from n to the end of the program, we come to a computation of x op y BEFORE any redefinition of x or y. n X = … Y = … t1=X op Y X = … Y = … t1=X op Y X = … Y = … t1=X op Y Spring 19 CSCI 4450/6450, A Milanova
18
Very Busy Expressions (VeryB)
Data? Primitive dataflow facts are expressions, e.g., x+y, a*b Analysis propagates sets of expressions, e.g., {x+y,a*b} Dataflow equations at j: x = y op z? inVB(j) = (outVB(j) – killVB(j)) genVB(j) killVB(j): all expressions with operand x: (x op _),(_ op x) genVB(j): new expression: {(y op z)}
19
Very Busy Expressions (VeryB)
Merge operator? For VeryB, it is set intersection outVB(j) = { inVB(i) | i is successor of j } j Spring 19 CSCI 4450/6450, A Milanova
20
Very Busy Expressions j Backward, must dataflow problem outVB(j) i1 i2
outVB(i1) outVB(i2) outVB(i3) Spring 19 CSCI 4450/6450, A Milanova
21
Another Example: Taint Analysis
A definition (x,k) is tainted if k is designated as a taint source, or (x,k) is computed based on an operand that is tainted. Problem statement: for each node n, compute the set of tainted definitions that may reach n. Spring 19 CSCI 4450/6450, A Milanova
22
Example: Taint Analysis (explicit flow)
1.x=read() 2.y=1 3.x>=2 4.y=x*y 5.x=x-1 6.goto 3 7.z=y-1
23
Outline of Today’s Class
Catch up Dataflow frameworks Lattice Transfer functions Worklist algorithm Reading: Dragon Book, Chapter 9.2and 9.3 Spring 19 CSCI 4450/6450, A Milanova
24
Dataflow Problems May Problems Must Problems Forward Problems
Reaching Definitions Available Expressions Backward Problems Live Uses of Variables Very Busy Expressions Spring 19 CSCI 4450/6450, A Milanova
25
Similarities Analyses operate over similar property spaces
In all cases, analysis operates over a finite set D of primitive dataflow facts Reach: D is the set of all definitions in the program: e.g., {(x,1),(y,2),(x,4),(y,5)} Avail and VeryB: D is the set of all arithmetic expressions: e.g., { a+b,a*b,a+1} Live: D is the set of all variables e.g., { x,y,z } Solution at node n is a subset of D (e.g., a definition either reaches n or it does not reach n) Spring 19 CSCI 4450/6450, A Milanova
26
Similarities Dataflow equations have the same form (from now on, we’ll focus on forward problems): out(j) = (in(j) – kill(j)) gen(j) = (in(j) pres(j)) gen(j) in(j) = { V out(i) | i is predecessor of j } pres(j) is the complement of kill(j) A note: what makes the 4 classical problems special is that sets kill(j)/pres(j) and gen(j) do not depend on in(j) Thus, set union and set intersection can be implemented as logical OR and AND respectively Spring 19 CSCI 4450/6450, A Milanova
27
Similarities out(j) = fj(in(j))
The dataflow equation at node j is a transfer functions. It take in(j) as argument and produces out(j) as result: out(j) = fj(in(j)) Spring 19 CSCI 4450/6450, A Milanova
28
Dataflow Frameworks We generalize and study the properties of the property space Property space is a lattice Choice settles merge operator We generalize and study the properties of the transfer function space Functions are monotone or distributive We generalize and study the properties of the worklist algorithm that computes a solution Spring 19 CSCI 4450/6450, A Milanova
29
Lattice Theory Partial ordering (denoted by ≤ or )
Relation between pairs of elements Reflexive a ≤ a Anti-symmetric a ≤ b and b ≤ a ==> a = b Transitive a ≤ b and b ≤ c ==> a ≤ c Partially ordered set (poset) (set S, ≤) 0 Element 0 ≤ a, for every a in S 1 Element a ≤ 1, for every a in S We don’t necessarily need 0 and 1 element. Spring 19 CSCI 4450/6450, A Milanova
30
Poset Example {a,b,c} D = {a,b,c} The poset is 2D, ≤ is set inclusion
{a,c} A canonical example of a poset is the poset of subsets. Let D be a finite set. The subsets of D form a poset under the set inclusion relation. {a} {b} {c} Spring 19 CSCI 4450/6450, A Milanova {}
31
Lattice Theory Greatest lower bound (glb) Least upper bound (lub)
l1, l2 in poset S, a in poset S is the glb(l1,l2) iff 1) a ≤ l1 and a ≤ l2 2) for any b in S, b ≤ l1, b ≤ l2 implies b ≤ a If glb exists, it is unique. Why? Called meet (denoted by Λ or┌┐) of l1 and l2. Least upper bound (lub) l1, l2 in poset S, c in poset S is the lub(l1,l2) iff 1) c ≥ l1 and c ≥ l2 2) for any d in S, d ≥ l1, d ≥ l2 implies d ≥ c If lub exists, it is unique. Called join (denoted by V or└┘) of l1 and l2.
32
Definition of a Lattice (L, Λ, V)
A lattice L is a poset under ≤, such that every pair of elements has a glb (meet) and lub (join) A lattice need not contain a 0 or 1 element A finite lattice must contain 0 and 1 elements Not every poset is a lattice If there is element a such that a ≤ x for every x in L, then a is the 0 element of L If there is a such that x ≤ a for every x in L, then a is the 1 element of L Spring 19 CSCI 4450/6450, A Milanova
33
A Poset but Not a Lattice
There is no lub(e3,e4) in this poset so it is not a lattice. Suppose we add the lub(e3,e4), is it a lattice? Spring 19 CSCI 4450/6450, A Milanova
34
Is This Poset a Lattice {a,b,c}
D = {a,b,c} The poset is 2D, ≤ is set inclusion {a,b} {b,c} {a,c} A canonical example of a poset is the poset of subsets. Let D be a finite set. The subsets of D form a poset under the set inclusion relation. {a} {b} {c} Spring 19 CSCI 4450/6450, A Milanova {}
35
Examples of Lattices H = (2D, ∩, U) where D is a finite set
glb(s1,s2) denoted s1Λs2, is set intersection s1∩s2 lub(s1,s2) denoted s1Vs2, is set union s1Us2 J = (N1, gcd, lcm) Partial order is integer divide on N1 lub(n1,n2) denoted n1Vn2 is lcm(n1,n2) glb(n1,n2) denoted n1Λn2 is gcd(n1,n2) (N1 denotes natural numbers starting at 1) Spring 19 CSCI 4450/6450, A Milanova
36
Chain A poset C where for every pair of elements c1, c2 in C, either c1 ≤ c2 or c2 ≤ c1. E.g., {} ≤ {a} ≤ {a,b} ≤ {a,b,c} E.g., from the lattice J as shown here, 1 ≤ 2 ≤ 6 ≤ 30 1 ≤ 3 ≤ 15 ≤ 30 A lattice s.t. every ascending chain is finite, is said to satisfy the Ascending Chain Condition 30 6 15 10 2 5 3 1 Spring 19 CSCI 4450/6450, A Milanova
37
Lattices in Dataflow Analysis
Lattices define property space Lattices entail properties of the standard dataflow analysis solution procedure (the worklist algorithm, which we will study shortly) Spring 19 CSCI 4450/6450, A Milanova
38
Dataflow Lattices: Reach
D = all definitions:{(x,1),(x,4),(a,3)} Poset is 2D, ≤ is the subset relation {(x,1),(x,4),(a,3)} 1 1. x=a*b 2. if y<=a*b {(x,1),(x,4)} {(x,4),(a,3)} {(x,1),(a,3)} 3. a=a+1 {(x,1)} {(x,4)} {(a,3)} 4. x=a*b 5. goto 3 Spring 19 CSCI 4450/6450, A Milanova {}
39
Dataflow Lattices: Avail
D = all expressions: {a*b,a+1,y*z} Poset is 2D, ≤ is the superset relation {} 1 1. x:=a*b 2. if y*z<=a*b {a*b} {a+1} {y*z} 3. a:=a+1 {a*b,y*z} {a*b,a+1} {a+1,y*z} 4. x:=a*b 5. goto 2 Spring 19 CSCI 4450/6450, A Milanova {a*b,a+1,y*z}
40
Dataflow Frameworks Equations: in(j) = V out(i) out(j) = fj(in(j))
where: in(j), out(j) are elements of a property space fj is the transfer function associated with node j V is the merge operator i in pred(j) To instantiate an analysis in the framework we must instantiate The property space, 2) the transfer functions. There are requirements on the property space, transfer functions and merge operator! Spring 19 CSCI 4450/6450, A Milanova
41
Dataflow Frameworks (cont.)
The property space must be: 1. A lattice L, ≤ 2. L satisfies the Ascending Chain Condition Requires that all ascending chains are finite The merge operator V must be the join of L In dataflow, L is often the lattice of the subsets over a finite set of dataflow facts D Choose universal set D (e.g., all definitions) Choose ordering operation ≤. Since the merge operator is must be to the join of L, a may problem entails that ≤ is subset. Conversely, a must problem entails that ≤ is superset The requirements on the property space and combination operator are listed above. The requirement on the transfer functions is that they must be monotone (we discuss monotonicity a little later).
42
Example: Reach Lattice
Property space is the lattice of the subsets where D is the set of all definitions in the program ≤ is the subset operation Join is set union , as needed for Reach, which is a may problem Lattice has 0 being {}, and 1 being D Lattice satisfies the Ascending Chain Condition Spring 19 CSCI 4450/6450, A Milanova
43
Reach Lattice D = all definitions:{(x,1),(x,4),(a,3)} Poset is 2D, ≤ is the subset relation {(x,1),(x,4),(a,3)} 1 1. x=a*b 2. if y<=a*b {(x,1),(x,4)} {(x,4),(a,3)} {(x,1),(a,3)} 3. a=a+1 {(x,1)} {(x,4)} {(a,3)} 4. x=a*b 5. goto 3 Spring 19 CSCI 4450/6450, A Milanova {}
44
Example: Avail Lattice
Property space is the lattice of the subsets where D is the set of all expressions in the program ≤ is superset join of the lattice is set intersection, as needed for Avail, which is a must problem Lattice has 0 being D, and 1 being {} Lattice satisfies Ascending Chain Condition Spring 19 CSCI 4450/6450, A Milanova
45
Dataflow Lattices: Avail
D = all expressions: {a*b,a+1,y*z} Poset is 2D, ≤ is the superset relation {} 1 1. x:=a*b 2. if y*z<=a*b {a*b} {a+1} {y*z} 3. a:=a+1 {a*b,y*z} {a*b,a+1} {a+1,y*z} 4. x:=a*b 5. goto 2 Spring 19 CSCI 4450/6450, A Milanova {a*b,a+1,y*z}
46
Transfer Functions The transfer functions: fj: L L. Formally, function space F is such that F contains all fj, F contains the identity function id(x) = x F is closed under composition. Each fj is monotone Spring 19 CSCI 4450/6450, A Milanova
47
Monotonicity F: L L is monotone if and only if:
(1) a,b in L, f in F then a ≤ b f(a) ≤ f(b) or (equivalently): (2) x,y in L, f in F then f(x) V f(y) ≤ f(x V y) Theorem: Definitions (1) and (2) are equivalent. Show that (1) implies (2) Show that (2) implies (1) Spring 19 CSCI 4450/6450, A Milanova
48
Distributivity F: L L is distributive if and only if x,y in L, f in F then f(x V y) = f(x) V f(y) Every distributive function is also monotone but not the other way around Distributivity is a very nice property! Spring 19 CSCI 4450/6450, A Milanova
49
Monotonicity and Distributivity
Is classical Reach distributive? Yes To show distributivity: For each j ( ( in(j) U in’(j) ) ∩ pres(j) ) U gen(j) = ((in(j)∩pres(j)) U gen(j)) U ((in’(j)∩pres(j)) U gen(j)) ( ( in(j) U in’(j) ) ∩ pres(j) ) U gen(j) = ( ( in(j) ∩ pres(j) ) U ( in’(j) ∩ pres(j) ) ) U gen(j) = Spring 19 CSCI 4450/6450, A Milanova
50
Monotone Dataflow Frameworks
A problem fits into the dataflow framework if its property space is a lattice L, ≤ that satisfies the Ascending Chain Condition its merge operator V is the join of L and its function space F: L L is monotone Thus, we can make use of a generic solution procedure, known as the worklist algorithm or the maximal fixpoint algorithm or the fixpoint iteration algorithm
51
Worklist Algorithm for Forward Dataflow Problems
/* Initialize to initial values; 1 is entry node of CFG */ in(1) = InitialValue; inReach (1) = UNDEF (or {}) for m = 2 to n do in(m) = 0 inReach (m) = {} W = {1,2,…,n} /* put every node on the worklist */ while W ≠ Ø do { remove j from W out(j) = fj(in(j)) outReach(j) = inReach(j)∩pres(j)Ugen(j) for i in successors(j) if out(j) ≤ in(i) then { if outReach(j) inReach (i) in(i) = out(j) V in(i) inReach(i) = outReach(j) U inReach(i) W = W U { i } }
52
Worklist Algorithm for Forward Dataflow Problems (slightly different)
/* Initialize to initial values; 1 is entry node of CFG */ in(1) = InitialValue; out(1) = f1(in(1)) for m := 2 to n do in(m) = 0; out(m) = fm(0) W := {2,…,n} /* put every node but 1 on the worklist */ while W ≠ Ø do { remove j from W in(j) = V { out(i) | i is predecessor of j } out(j) = fj(in(j)) if out(j) changed then W = W U { k | k is successor of j } } Spring 19 CSCI 4450/6450, A Milanova
53
Termination Argument Why does the algorithm terminate?
Sketch of proof: At each iteration, at least one out(j) changes. Since out(j) in L, and L satisfies the Ascending Chain Condition, out(j) changes at most O(h) times where h is the height of the lattice L Spring 19 CSCI 4450/6450, A Milanova
54
Correctness Argument Theorem: The worklist algorithm computes a solution that satisfies the dataflow equations Why? Sketch of proof: Whenever j is processed, algorithms sets out(j) = fj(in(j)). Whenever out(j) changes, algorithm puts successors on the list, so in(j) = V { out(i) }. So final solution will satisfy equations. Spring 19 CSCI 4450/6450, A Milanova
55
Precision Argument Theorem: The algorithm computes the least solution of the dataflow equations. Historically though, this solution is often called the maximal fixpoint solution (MFP) I.e., For every node j, the worklist algorithm computes a solution MFP(j) = {in(j),out(j)}, such that every other solution {in’(j),out’(j)} of the dataflow equations is in(j) ≤ in’(j), out(j) ≤ out’(j) i.e., for every node, the MFP computes solution MFP={input(i),output(i)}, such that every other solution of the dataflow equations {input’(i),output’(i)} is “larger” than MFP. 1 z = x+y 2 while true 3 skip Spring 19 CSCI 4450/6450, A Milanova
56
Example Solution1 Solution2 1. z:=x+y 2. if (z > 500) 3. skip
inAvail(1) = Ø Ø Ø 1. z:=x+y outAvail(1) = (inAvail(1)-Ez) {x+y} {x+y} {x+y} inAvail(2) = outAvail(1) V outAvail(3) {x+y} Ø 2. if (z > 500) outAvail(2) = inAvail(2) {x+y} Ø 3. skip inAvail(3) = outAvail(2) outAvail(3) = inAvail(3) Equivalent to: inAvail(2) = {x+y} V inAvail(2) and recall that V is ∩ (i.e., set intersection). Spring 19 CSCI 4450/6450, A Milanova
57
Many Applications! Static debugging
Memory errors in C/C++ programs Memory leaks Null pointer dereferences Array-out-of-bound accesses Concurrency errors in shared-memory apps Data-races, atomicity violations, deadlocks Information flow (as known as taint analysis) Spring 19 CSCI 4450/6450, A Milanova
58
Many Applications! White-box testing: compute coverage
Control-flow-based testing Data-flow-based testing Intuitively, test each def-use chain Regression testing Analyze changes and select regression tests that actually test changed code Spring 19 CSCI 4450/6450, A Milanova
59
Dataflow Analysis Classical technique
Compared to Hoare logic, it captures state in a more coarse way Still relevant, many interesting problems are phrased in dataflow terms Spring 19 CSCI 4450/6450, A Milanova
60
Next Class MOP vs MFP solutions
Two classical non-distributive dataflow analyses: Constant propagation and Points-to analysis Spring 19 CSCI 4450/6450, A Milanova
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.