Simone Campanoni simonec@eecs.northwestern.edu DFA foundations Simone Campanoni simonec@eecs.northwestern.edu.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
CSE 231 : Advanced Compilers Building Program Analyzers.
Worklist algorithm Initialize all d i to the empty set Store all nodes onto a worklist while worklist is not empty: –remove node n from worklist –apply.
1 Data flow analysis Goal : –collect information about how a procedure manipulates its data This information is used in various optimizations –For example,
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
From last time: Lattices A lattice is a tuple (S, v, ?, >, t, u ) such that: –(S, v ) is a poset – 8 a 2 S. ? v a – 8 a 2 S. a v > –Every two elements.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Administrative stuff Office hours: After class on Tuesday.
Recap Let’s do a recap of what we’ve seen so far Started with worklist algorithm for reaching definitions.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.
San Diego October 4-7, 2006 Over 1,000 women in computing Events for undergraduates considering careers and graduate school Events for graduate students.
Recap: Reaching defns algorithm From last time: reaching defns worklist algo We want to avoid using structure of the domain outside of the flow functions.
1 Data-Flow Frameworks Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
1 CS 201 Compiler Construction Lecture 4 Data Flow Framework.
From last lecture We want to find a fixed point of F, that is to say a map m such that m = F(m) Define ?, which is ? lifted to be a map: ? = e. ? Compute.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
1 CS 201 Compiler Construction Data Flow Analysis.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
1 Data-Flow Analysis Proving Little Theorems Data-Flow Equations Major Examples.
Data Flow Analysis. 2 Source code parsed to produce AST AST transformed to CFG Data flow analysis operates on control flow graph (and other intermediate.
MIT Foundations of Dataflow Analysis Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Solving fixpoint equations
Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.
MIT Introduction to Program Analysis and Optimization Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology.
Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.
Formalization of DFA using lattices. Recall worklist algorithm let m: map from edge to computed value at edge let worklist: work list of nodes for each.
Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.
CS 598 Scripting Languages Design and Implementation 9. Constant propagation and Type Inference.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
Semilattices presented by Niko Simonson, CSS 548, Autumn 2012 Semilattice City, © 2009 Nora Shader.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Iterative Dataflow Problems Taken largely from notes of Alex Aiken (UC Berkeley) and Martin Rinard (MIT) Dataflow information used in optimization Several.
Optimization Simone Campanoni
Compiler Principles Fall Compiler Principles Lecture 9: Dataflow & Optimizations 2 Roman Manevich Ben-Gurion University of the Negev.
Lub and glb Given a poset (S, · ), and two elements a 2 S and b 2 S, then the: –least upper bound (lub) is an element c such that a · c, b · c, and 8 d.
Code Optimization Data Flow Analysis. Data Flow Analysis (DFA)  General framework  Can be used for various optimization goals  Some terms  Basic block.
DFA foundations Simone Campanoni
Simone Campanoni SSA Simone Campanoni
Data Flow Analysis Suman Jana
Simone Campanoni Dependences Simone Campanoni
CSC D70: Compiler Optimization Dataflow-2 and Loops
CSC D70: Compiler Optimization Dataflow Analysis
Advanced Compiler Techniques
Topic 10: Dataflow Analysis
University Of Virginia
Another example: constant prop
Fall Compiler Principles Lecture 10: Global Optimizations
Data Flow Analysis Compiler Design
Lecture 20: Dataflow Analysis Frameworks 11 Mar 02
Topic-4a Dataflow Analysis 2019/2/22 \course\cpeg421-08s\Topic4-a.ppt.
Static Single Assignment
Formalization of DFA using lattices
Formalization of DFA using lattices
Live variables and copy propagation
Formalization of DFA using lattices
COMPILERS Liveness Analysis
Formalization of DFA using lattices
Presentation transcript:

Simone Campanoni simonec@eecs.northwestern.edu DFA foundations Simone Campanoni simonec@eecs.northwestern.edu

We have seen several examples of DFAs Are they correct? Are they precise? Will they always terminate? How long will they take to converge?

Outline Lattice and data-flow analysis DFA correctness DFA precision DFA complexity

Understanding DFAs We need to understand all of them Liveness analysis: is it correct? Precision? Convergence? Reaching definitions: is it correct? Precision? Convergence? … Idea: create a framework to help reasoning about them Provide a single formal model that describes all data-flow analyses Formalize the notions of “safe,” “conservative,” and “optimal” Correctness proof for DFAs Place bounds on time complexity of iterative DFAs

Lattices a Lattice L = (V, ≤): Properties of ≤: V is a (possible infinite) set of elements ≤ is a binary relation over elements of V Properties of ≤: ≤ is a partial order (reflexive, transitive, anti-symmetric) Every pair of elements in V has A unique greatest lower bound (a.k.a. meet) and A unique least upper bound (a.k.a. join) Top (T) = unique greatest element of V (if it exists) Bottom (⊥) = unique least element of V (if it exists) Height of L: longest path from T to ⊥ Infinite large lattice can still have finite height b c d

Lattices and DFA A lattice L = (V, ≤) describes all possible solutions of a given DFA A lattice for reaching definitions Another lattice for liveness analysis … The relation ≤ connects all solutions of its related DFA from the best one (T) to the worst one (⊥) Liveness analysis: T = no variable is alive = { } ⊥ = all variables are alive = V We traverse the lattice of a given DFA to find the correct solution in a given point of the CFG We repeat it for every point in the CFG

Lattice example T={ , , } { , } { , } { , } { } ≤ { , } { } { } { } How many apples I must have? V = sets of apples ≤ = set inclusion T = (best case) = all apples ⊥ = (worst case) no apples (empty set) Apples, definitions, variables, expressions … T={ , , } { , } { , } { , } { } ≤ { , } { } { } { } ⊥={ }

How can we use this mathematical framework , lattice, to study a DFA?

Use of lattice for DFA Define domain of program properties (flow values --- apple sets) computed by data-flow analysis, and organize the domain of elements as a lattice Define how to traverse this domain to compute the final solution using lattice operations Exploit lattice theory in achieving goals

Data-flow analysis and lattice Elements of the lattice (V) represent flow values (e.g., an IN[] set) e.g., Sets of apples T “best-case” information e.g., Empty set ⊥ “worst-case” information e.g., Universal set If x ≤ y, then x is a conservative approximation of y e.g., Superset T={ , , } { , } { , } { , } { } { } { } ⊥={ }

Data-flow analysis and lattice Elements of the lattice (V) represent flow values (e.g., an IN[] set) e.g., Sets of live variables for liveness ⊥ “worst-case” information e.g., Universal set T “best-case” information e.g., Empty set If x ≤ y, then x is a conservative approximation of y e.g., Superset T={ } { v1 } { v2 } { v3 } {v1,v2} {v1,v3} {v2,v3} ⊥={v1,v2,v3}

Data-flow analysis and lattice (reaching defs) Elements of the lattice (V) represent flow values (IN[], OUT[]) e.g., Sets of definitions T represents “best-case” information e.g., Empty set ⊥ represents “worst-case” information e.g., Universal set If x ≤ y, then x is a conservative approximation of y e.g., Superset

How do we choose which element in our lattice is the data-flow value of a given point of the input program?

We traverse the lattice { , } { , } { , } { } { } { } ⊥={ }

Merging information New information is found e.g., a new definition (d1) reaches a given point in the CFG New information is described as a point in the lattice e.g. {d1} We use the ”meet” operator (∧) of the lattice to merge the new information with the current one e.g., set union Current information: {d2} New information: {d1} Result: {d1} U {d2} = {d1, d2}

How can we find new facts/information to iterate over the lattice?

Computing a data-flow value (ideal) For a forward problem, consider all possible paths from the entry to a given program point, compute the flow values at the end of each path, and then meet these values together Meet-over-all-paths (MOP) solution at each program point It’s a correct solution Ventry Entry

Computing MOP solution for reaching definitions Ventry T={ } Entry d3 {d1} d1 d2 {d1,d2} {d1,d2,d3}

From ideal to practical solution Problem: all preceding paths must be analyzed Exponential blow-up Solution: compute meets early (at merge points) rather than at the end Maximum fixed-point (MFP) Questions: Is MFP correct? What’s the precision of MFP? d2 d1 d3

Outline Lattice and data-flow analysis DFA correctness DFA precision DFA complexity

Correctness T={ } { d1 } { d2 } { d3 } d1 d2 {d1,d2} {d1,d3} {d2,d3} Vcorrect ≤ VMOP Ventry T={ } Entry { d1 } { d2 } { d3 } d1 d2 {d1,d2} {d1,d3} {d2,d3} ⊥={d1,d2,d3}

fs is monotonic => MFP is correct! Correctness Key idea: “Is MFP correct?” iff VMFP ≤ VMOP Focus on merges: VMOP = fs(Vp1) ∧ fs(Vp2) VMFP = fs(Vp1 ∧ Vp2 ) VMFP ≤ VMOP iff fs(Vp1 ∧ Vp2 ) ≤ fs(Vp1) ∧ fs(Vp2) If fs is monotonic: X ≤ Y then fs(X) ≤ fs(Y) So fs(Vp1 ∧ Vp2 ) ≤ fs(Vp1) ∧ fs(Vp2) And therefore VMFP ≤ VMOP fs is monotonic => MFP is correct!

Monotonicity X ≤ Y then fs(X) ≤ fs(Y) If the flow function f is applied to two members of V, the result of applying f to the “lesser” of the two members will be under the result of applying f to the “greater” of the two More conservative inputs leads to more conservative outputs (never more optimistic outputs)

Convergence From lattice theory If fs is monotonic, then the maximum number of times fs can be applied w/o reaching a fixed point is Height(V) – 1 Iterative DFA is guaranteed to terminate if the fs is monotonic and the lattice has finite height

Outline Lattice and data-flow analysis DFA correctness DFA precision DFA complexity

Precision VMOP: the best solution VMFP ≤ VMOP fs(Vp1 ∧ Vp2 ) ≤ fs(Vp1) ∧ fs(Vp2) Distributive fs over ∧ fs(Vp1 ∧ Vp2 ) = fs(Vp1) ∧ fs(Vp2) VMFP = VMOP Is reaching definition fs distributive?

A new DFA example: reaching constants Goal Compute the value that a variable must have at a program point Flow values (V) Set of (variable,constant) pairs Merge function Intersection Data-flow equations Effect of node n x = c KILL[n] = {(x,k)| ∀k} GEN[n] = {(x,c)} Effect of node n x = y + z GEN[n] = {(x,c) | c=valy+valz, (y, valy) ∈ in[n], (z, valz) ∈ in[n]}

Reaching constants: characteristics ⊥ = ? IN = ? OUT = ? Let’s study this analysis Correctness? Convergence? is fs monotonic? Has the lattice a finite height? Precision? is fs distributive? { }

Outline Lattice and data-flow analysis DFA correctness DFA precision DFA complexity

Complexity OUT[ENTRY] = { }; for (each instruction i other than ENTRY) OUT[i] = { }; while (changes to any OUT occur){ for (each instruction i other than ENTRY) { IN[i] = ∪p a predecessor of i OUT[p]; OUT[i] = GEN[i] ∪ (IN[i] ─ KILL[i]); } }

Complexity N instructions (N definitions at most) N=500 Each IN/OUT set has at most N elements Each set-union operation takes O(N) time The for loop body constant # of set operations per node O(N) nodes ⇒ O(N2) time for the loop Each iteration of the repeat loop can only make the set larger Each iteration modifies in the worst case only one set ⇒ O(N3) N iterations to reach the fixed point at most Worst case: O(N4) Typical case: 2 to 3 iterations with good ordering & sparse sets O(N) to O(N2) N=500 Worst case: 62,500,000,000 Optimized average case: 500 – 250,000

Optimization: work list OUT[ENTRY] = { }; for (each basic block B other than ENTRY) OUT[B] = { }; workList = all basic blocks while (workList isn’t empty) B = pick and remove a block from workList oldOUT = OUT[B] IN[B] = ∪p a predecessor of B OUT[p]; OUT[B]= GEN[B]∪ (IN[B] ─ KILL[B]); if (oldOut != OUT[B]) workList = workList U {all successors of B} }