A Deeper Look at Data-flow Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University.

Slides:



Advertisements
Similar presentations
Continuing Abstract Interpretation We have seen: 1.How to compile abstract syntax trees into control-flow graphs 2.Lattices, as structures that describe.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
Data Flow Analysis. Goal: make assertions about the data usage in a program Use these assertions to determine if and when optimizations are legal Local:
Code Motion of Control Structures From the paper by Cytron, Lowry, and Zadeck, COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda.
SSA-Based Constant Propagation, SCP, SCCP, & the Issue of Combining Optimizations 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon,
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
Foundations of Data-Flow Analysis. Basic Questions Under what circumstances is the iterative algorithm used in the data-flow analysis correct? How precise.
CS 536 Spring Global Optimizations Lecture 23.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Data Flow Analysis Compiler Design Nov. 3, 2005.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Another example p := &x; *p := 5 y := x + 1;. Another example p := &x; *p := 5 y := x + 1; x := 5; *p := 3 y := x + 1; ???
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs, Data-flow Analysis Data-flow Frameworks --- today’s.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
1 Data-Flow Frameworks Lattice-Theoretic Formulation Meet-Over-Paths Solution Monotonicity/Distributivity.
Introduction to Optimization Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Data flow analysis Emery Berger University.
1 CS 201 Compiler Construction Lecture 4 Data Flow Framework.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Loops Guo, Yao.
From last time: reaching definitions For each use of a variable, determine what assignments could have set the value being read from the variable Information.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
From last lecture We want to find a fixed point of F, that is to say a map m such that m = F(m) Define ?, which is ? lifted to be a map: ? = e. ? Compute.
Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
1 Region-Based Data Flow Analysis. 2 Loops Loops in programs deserve special treatment Because programs spend most of their time executing loops, improving.
Global Common Subexpression Elimination with Data-flow Analysis Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Example x := read() v := a + b x := x + 1 w := x + 1 a := w v := a + b z := x + 1 t := a + b.
Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Data Flow Analysis. 2 Source code parsed to produce AST AST transformed to CFG Data flow analysis operates on control flow graph (and other intermediate.
Solving fixpoint equations
Machine-Independent Optimizations Ⅱ CS308 Compiler Theory1.
Structural Data-flow Analysis Algorithms: Allen-Cocke Interval Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.
Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
Proliferation of Data-flow Problems Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University.
12/5/2002© 2002 Hal Perkins & UW CSER-1 CSE 582 – Compilers Data-flow Analysis Hal Perkins Autumn 2002.
Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Terminology, Principles, and Concerns, IV With examples from LIVE and global block positioning Copyright 2011, Keith D. Cooper & Linda Torczon, all rights.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
Compilation Lecture 8 Abstract Interpretation Noam Rinetzky 1.
Compiler Principles Fall Compiler Principles Lecture 11: Loop Optimizations Roman Manevich Ben-Gurion University.
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
Iterative Data-flow Analysis C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis: Data-flow frameworks –Classic.
Building SSA Form, I 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at.
Data Flow Analysis II AModel Checking and Abstract Interpretation Feb. 2, 2011.
DFA foundations Simone Campanoni
Definition-Use Chains
Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Introduction to Optimization
Data Flow Analysis Suman Jana
CSC D70: Compiler Optimization Dataflow-2 and Loops
Topic 10: Dataflow Analysis
Introduction to Optimization
University Of Virginia
Another example: constant prop
Optimizations using SSA
Data Flow Analysis Compiler Design
Introduction to Optimization
The Partitioning Algorithm for Detecting Congruent Expressions COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper.
Presentation transcript:

A Deeper Look at Data-flow Analysis Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring 2011

COMP 512, Rice University2 Data-flow Analysis Definition Data-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values We use the results of DFA to prove safety & identify opportunities  Not an end unto itself Almost always involves building a graph  Control-flow graph, call graph, or derivatives thereof  Sparse evaluation graphs to model flow of values (efficiency) Usually formulated as a set of simultaneous equations  Sets attached to nodes and edges  Often use sets with a lattice or semilattice structure Desired result is usually meet over all paths solution  “What is true on every path from the entry?”  “Can this happen on any path from the entry?”

Data-flow Analysis We have seen two data-flow problems: Dom and Live Computing Dominators Domain is nodes in the flow graph being analyzed Simple set of data-flow equations Can solve equations solve them with any data-flow solver COMP 512, Rice University3 Initializations: D OM (n 0 ) = { n 0 } D OM (n ) = N,  n  n 0 Fixed-point equation: D OM (n) = { n }  (  p  preds(n) D OM (p )) N is the set of nodes in the flow graph

Data-flow Analysis Computing Live variables Domain is the set of variable names in the procedure Data-flow equations are more complex where UEVAR(b) is the set of names used in b before definition in b VARKILL(b) is the set of names defined in b COMP 512, Rice University4 Initializatio n LIVEOUT(n ) = , ∀ n Fixed-point equations LIVEOUT(b) =  s  succ(b) LIVEIN(s) LIVEIN(b) = UEVAR(b)  (LIVEOUT(b)  VARKILL(b))

COMP 512, Rice University5 Classic Algorithm: Round-robin Iterative Algorithm Very Simple Algorithm Halts when DOM sets stop changing Makes successive sweeps over the nodes in some fixed order  i  0 to |N | DOM(n 0 )  { n 0 } for i  1 to |N | DOM(n i )  { N } change  true while (change) change  false for i  0 to |N | T EMP  { n i }  (  p  pred(ni) DOM(p) if DOM(n i ) ≠ T EMP then change  true DOM(n i )  T EMP Just the fixed-point equation

Solving a Data-flow Problem To compute Dominator sets We need to build the control-flow graph  Defines predecessors and successors Run the round-robin worklist algorithm  Initializes DOM(n) for each node n  Iterates until it reaches a fixed point ( e.g., DOM stabilizes ) To solve another data-flow problem Replace the initialization step and the fixed-point equation Fixed-point equation includes direction of propagation  Predecessors or successors, as needed To explain data-flow analysis, Kildall introduced a lattice-theoretic model. Kam & Ullman (among others) developed specific formulations for iterative data-flow algorithms COMP 512, Rice University6 See J.B. Kam and J.S. Ullman, “Global Data Flow Analysis and Iterative Algorithms”, JACM 23(1), January 1976, pp

COMP 512, Rice University7 Classic Algorithm: Round-robin Iterative Algorithm Questions we must ask Termination: does it halt? Correctness: what answer does it produce? Speed: how quickly does it find that answer? DOM(n 0 )  Ø for i  1 to |N | DOM(n i )  { N } change  true while (change) change  false for i  0 to |N | T EMP  { n i }  (  p  pred(ni) DOM(p) if DOM(n i ) ≠ T EMP then change  true DOM(n i )  T EMP Just the fixed-point equation

Data-flow Analysis The basics Data-flow sets are drawn from a semi-lattice, L, of facts Sets are modified by transfer functions, f i, that model effect of code on contents of the sets  Function space of all possible transfer functions is F Properties of L and F govern termination, correctness, & speed To reason about the properties of a ( proposed ) data-flow problem, we cast it into a lattice-theory framework and prove some simple theorems about the problem COMP 512, Rice University8

9 Data-flow Analysis Limitations 1. Precision – “up to symbolic execution”  Assume all paths are taken 2.Solution – cannot afford to compute M OP solution  Large class of problems where M OP = M FP = L FP  Not all problems of interest are in this class 3.Arrays – treated naively in classical analysis  Represent whole array with a single fact 4.Pointers – difficult ( and expensive ) to analyze  Imprecision rapidly adds up  Need to ask the right questions Summary For scalar values, we can quickly solve simple problems Good news: Simple problems can carry us pretty far *

COMP 512, Rice University10 Data-flow Analysis Semilattice A semilattice is a set L and a meet operation  such that,  a, b, & c  L : 1. a  a = a 2. a  b = b  a 3. a  (b  c) = (a  b)  c  imposes an order on L,  a, b, & c  L : 1. a ≥ b  a  b = b 2. a > b  a ≥ b and a ≠ b A semilattice has a bottom element, denoted  1.  a  L,   a =  2.  a  L, a ≥  The meet operator combines the sets when two paths converge, or meet. Sometimes we work with a lattice, which has a top element, denoted  a  L,  a = a ⊥ ⊥

COMP 512, Rice University11 Data-flow Analysis How does this relate to data-flow analysis? Choose a semilattice to represent the facts Attach a meaning to each a  L Each a  L is a distinct set of known facts With each node n, associate a function f n : L  L f n models behavior of code in block corresponding to n Let F be the set of all functions that the code might generate Example — DOM Semilattice is (2 N,  ), where N is the set of nodes in the flow graph and  is , and  is Ø For a node n, f n has the form f n (x) = x World’s simplest data-flow equation

COMP 512, Rice University12 Data-flow Analysis How does this relate to data-flow analysis? Choose a semilattice to represent the facts Attach a meaning to each a  L Each a  L is a distinct set of known facts With each node n, associate a function f n : L  L f n models behavior of code in block corresponding to n Let F be the set of all functions that the code might generate Example — Live Semilattice is (2 Vars,  ), where Vars is the set of names in the code and  is ∪, and  is Vars For a node n, f n has the form f n (x) = a ∪ (x ∩ b), where a & b are constants ( UEVAR & VARKILL respectively ) A common form for a data-flow equation

COMP 512, Rice University13 Iterative Data-flow Analysis Any finite semilattice is bounded Some infinite semilattices are bounded … … 0 ….001 ….002 … Real constants Termination If every f n  F is monotone, i.e., x ≤ y  f(x) ≤ f(y), and If the lattice is bounded, i.e., every descending chain is finite  Chain is sequence x 1, x 2, …, x n where x i  L, 1 ≤ i ≤ n  x i > x i+1, 1 ≤ i < n  chain is descending Then The set at each node can only change a finite number of times The iterative algorithm must halt on an instance of the problem  Both DOM & LIVE have monotone transfer functions & finite (bounded) semilattices. Finite lattice, bounded descending chains, & monotone functions  termination

COMP 512, Rice University14 Iterative Data-flow Analysis Correctness ( What does it compute? ) If every f n  F is monotone, i.e., x ≤ y  f(x) ≤ f(y), and If the semilattice is bounded, i.e., every descending chain is finite  Chain is sequence x 1, x 2, …, x n where x i  L, 1 ≤ i ≤ n  x i > x i+1, 1 ≤ i < n  chain is descending Given a bounded semilattice S and a monotone function space F  k such that f k (  ) = f j (  )  j > k f k (  ) is called the least fixed-point of f over S If L has a T, then  k such that f k ( T ) = f j ( T )  j > k and f k ( T ) is called the maximal fixed-point of f over S optimism f k (x) is the application of f to x k times

COMP 512, Rice University15 Iterative Data-flow Analysis Correctness If every f n  F is monotone, i.e., f(x  y) ≤ f(x)  f(y), and If the lattice is bounded, i.e., every descending chain is finite  Chain is sequence x 1, x 2, …, x n where x i  L, 1 ≤ i ≤ n  x i > x i+1, 1 ≤ i < n  chain is descending Then The round-robin algorithm computes a least fixed-point ( LFP ) The uniqueness of the solution depends on other properties of F Unique solution  it finds the one we want Multiple solutions  we need to know which one it finds

COMP 512, Rice University16 Iterative Data-flow Analysis Correctness Does the iterative algorithm compute the desired answer? Admissible Function Spaces 1.  f  F,  x,y  L, f (x  y) = f (x)  f (y) 2.  f i  F such that  x  L, f i (x) = x 3.f,g  F  h  F such that h(x ) = f (g(x)) 4.  x  L,  a finite subset H  F such that x =  f  H f (  ) If F meets these four conditions, then an instance of the problem will have a unique fixed point solution (instance  graph + initial values)  LFP = MFP = MOP  order of evaluation does not matter * Both DOM & LIVE meet all four criteria If meet does not distribute over function application, then the fixed point solution may not be unique. The iterative algorithm will find a LFP.

COMP 512, Rice University17 Iterative Data-flow Analysis If a data-flow framework meets those admissibility conditions then it has a unique fixed-point solution The iterative algorithm finds the (best) answer The solution does not depend on order of computation Algorithm can choose an order that converges quickly Intuition Choose an order that propagates changes as far as possible on each “sweep”  Process a node’s predecessors before the node Cycles pose problems, of course  Ignore back edges when computing the order? *

COMP 512, Rice University18 Ordering the Nodes to Maximize Propagation Postorder Reverse Postorder Reverse postorder visits predecessors before visiting a node Use reverse preorder for backward problems  Reverse postorder on reverse CFG is reverse preorder N+1 - postorder number See exercise 9.4 in EaC2e for an example