SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.

Slides:



Advertisements
Similar presentations
Static Single-Assignment ? ? Introduction: Over last few years [1991] SSA has been Stablished as… Intermediate program representation.
Advertisements

1 SSA review Each definition has a unique name Each use refers to a single definition The compiler inserts  -functions at points where different control.
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
8. Static Single Assignment Form Marcus Denker. © Marcus Denker SSA Roadmap  Static Single Assignment Form (SSA)  Converting to SSA Form  Examples.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.
Register allocation Morgensen, Torben. "Register Allocation." Basics of Compiler Design. pp from (
Static Single Assignment (SSA) Form Jaeho Shin :00 ROPAS Weekly Show & Tell.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Compilation 2011 Static Analysis Johnni Winther Michael I. Schwartzbach Aarhus University.
Graph-Coloring Register Allocation CS153: Compilers Greg Morrisett.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
SSA.
Data-Flow Analysis Framework Domain – What kind of solution is the analysis looking for? Ex. Variables have not yet been defined – Algorithm assigns a.
CS412/413 Introduction to Compilers Radu Rugina Lecture 37: DU Chains and SSA Form 29 Apr 02.
Some Properties of SSA Mooly Sagiv. Outline Why is it called Static Single Assignment form What does it buy us? How much does it cost us? Open questions.
Register Allocation CS 671 March 27, CS 671 – Spring Register Allocation - Motivation Consider adding two numbers together: Advantages: Fewer.
1 Introduction to Data Flow Analysis. 2 Data Flow Analysis Construct representations for the structure of flow-of-data of programs based on the structure.
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
More Dataflow Analysis CS153: Compilers Greg Morrisett.
CS 536 Spring Global Optimizations Lecture 23.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
U NIVERSITY OF M ASSACHUSETTS, A MHERST D EPARTMENT OF C OMPUTER S CIENCE Advanced Compilers CMPSCI 710 Spring 2003 Computing SSA Emery Berger University.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
1 Intermediate representation Goals: encode knowledge about the program facilitate analysis facilitate retargeting facilitate optimization scanning parsing.
CS 201 Compiler Construction
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
1 CS 201 Compiler Construction Lecture 9 Static Single Assignment Form.
Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
CSE P501 – Compiler Construction
Λλ Fernando Magno Quintão Pereira P ROGRAMMING L ANGUAGES L ABORATORY Universidade Federal de Minas Gerais - Department of Computer Science P ROGRAM A.
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Static Single Assignment John Cavazos.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Optimization Simone Campanoni
Loops Simone Campanoni
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
COMPILERS Liveness Analysis hussein suleman uct csc3003s 2009.
Data Flow Analysis Suman Jana
Static Single Assignment
Efficiently Computing SSA
Topic 10: Dataflow Analysis
Factored Use-Def Chains and Static Single Assignment Forms
University Of Virginia
Chapter 6 Intermediate-Code Generation
Static Single Assignment Form (SSA)
Optimizations using SSA
Data Flow Analysis Compiler Design
EECS 583 – Class 7 Static Single Assignment Form
Static Single Assignment
Optimizing Compilers CISC 673 Spring 2011 Static Single Assignment II
Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/4/2019 CPEG421-05S/Topic5.
Reference These slides, with minor modification and some deletion, come from U. of Delaware – and the web, of course. 4/17/2019 CPEG421-05S/Topic5.
EECS 583 – Class 7 Static Single Assignment Form
Taken largely from University of Delaware Compiler Notes
Building SSA Harry Xu CS 142 (b) 04/22/2018.
COMPILERS Liveness Analysis
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Objectives Identify advantages (and disadvantages ?) of optimizing in SSA form Given a CFG in SSA form, perform Global Constant Propagation Dead code elimination.
Objectives Identify advantages (and disadvantages ?) of optimizing in SSA form Given a CFG in SSA form, perform Global Constant Propagation Dead code elimination.
Presentation transcript:

SSA and CPS CS153: Compilers Greg Morrisett

Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v 2 {y:=e | x=y or x in e} When variables are immutable, simplifies to: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v 2 {} (Assumes variables are unique.)

Monadic Form vs CFGs Almost all data flow analyses simplify when variables are defined once. –no kills in dataflow analysis –can interpret as either functional or imperative Our monadic form had this property, which made many of the optimizations simpler. –e.g., just keep around a set of available definitions that we keep adding to.

On the other hand… CFGs have their own advantages over monadic form. –support control-flow graphs not just trees. if b < c then if b < c goto L1 let x1 = e1 x1 := e1 x2 = e2 x2 := e2 x3 = x1 + x2 goto L2 in x3 L1: x1 := e4 else x2 := e5 let x1 = e4 L2: x3 := x1 + x2 x2 = e5 x3 = x1 + x2 in x3

Best of both worlds… Static Single Assignment (SSA) –CFGs but with functional variables –A slight “hack” to make graphs work out –Now widely used (e.g., LLVM). –Intra-procedural representation only. Continuation Passing Style (CPS) –Inter-procedural representation. –So slightly more general. –Used by FP compilers (e.g., SML/NJ).

The idea behind SSA Start with CFG code and give each definition a fresh name, and propagate the fresh name to subsequent uses. x := nx0 := n y := m y0 := m x := x + y x1 := x0 + y0 return x return x1

The problem… What do we do with this? x := n y := m if x < y x := n y := m if x < y x := x+1 y := y-1 y := x+2 z := x * y return z

The problem… In particular, what happens at join points? x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 y2 := x0+2 z0 := x? * y? return z0

The solution: “phony” nodes x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 y2 := x0+2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0

The solution: “phony” nodes x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 y2 := x0+2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0 A phi-node is a phony “use” of a variable. From an analysis standpoint, it’s as if an oracle chooses to set x2 to either x1 or x0 based on how control got here.

Variant: gated SSA x0 := n y0 := m c0 := x0 < y0 if c0 x1 := x0+1 y1 := y0-1 y2 := x0+2 x2 :=  (c0 ? x1 : x0) y3 :=  (c0 ? y1 : y2) z0 := x2 * y3 return z0 Use a functional “if” based on the tests that brought you here.

Back to normal SSA x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 y2 := x0+2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0 Compilers often build an overlay graph that connects definitions to uses (“def-use” chains.) Then information really flows along these SSA def-use edges (e.g., constant propagation.) Some practical benefit to SSA def- use over CFG.

Two Remaining Issues x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 y2 := x0+2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0 How do we generate SSA from the CFG representation? How do we generate CFG (or MIPS) from the SSA?

SSA back to CFG x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 y2 := x0+2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0 Just realize the assignments corresponding to the phi nodes on the edges.

SSA back to CFG x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 x2 := x1 y3 := y1 y2 := x0+2 x2 := x0 y3 := y2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0 Just realize the assignments corresponding to the phi nodes on the edges.

SSA back to CFG x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 x2 := x1 y3 := y1 y2 := x0+2 x2 := x0 y3 := y2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0 You can always rely upon either a copy propagation pass or a coalescing register allocator to get rid of all of these copies. But this can blow up the size of the code considerably, so there are better algorithms that try to avoid this.

Naïve Conversion to SSA Insert phi nodes in each basic block except the start node. Calculate the dominator tree. Then, traversing the dominator tree in a breadth-first fashion: –give each definition of x a fresh index –propagate that index to all of the uses each use of x that’s not killed by a subsequent definition. propagate the last definition of x to the successors’ phi nodes.

Example: B1 x := n y := m a := 0 x := n y := m a := 0 a := a + y x := x – 1 a := a + y x := x – 1 if x > 0 z := a + y return z z := a + y return z

Insert phi B1 x := n y := m a := 0 x := n y := m a := 0 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x :=  (x,x) y :=  (y) a :=  (a,a) if x > 0 x :=  (x,x) y :=  (y) a :=  (a,a) if x > 0 x :=  (x) y :=  (y) a :=  (a) z := a+y return z x :=  (x) y :=  (y) a :=  (a) z := a+y return z

Dominators B1 x := n y := m a := 0 x := n y := m a := 0 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x :=  (x,x) y :=  (y) a :=  (a,a) if x > 0 x :=  (x,x) y :=  (y) a :=  (a,a) if x > 0 x :=  (x) y :=  (y) a :=  (a) z := a+y return z x :=  (x) y :=  (y) a :=  (a) z := a+y return z

Successors B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x :=  (x0,x) y :=  (y0) a :=  (a0,a) if x > 0 x :=  (x0,x) y :=  (y0) a :=  (a0,a) if x > 0 x :=  (x) y :=  (y) a :=  (a) z := a+y return a x :=  (x) y :=  (y) a :=  (a) z := a+y return a

Next Block B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x :=  (x) y :=  (y) a :=  (a) a := a + y x := x – 1 x1 :=  (x0,x) y1 :=  (y0) a1 :=  (a0,a) if x1 > 0 x1 :=  (x0,x) y1 :=  (y0) a1 :=  (a0,a) if x1 > 0 x :=  (x) y :=  (y) a :=  (a) z := a+y return z x :=  (x) y :=  (y) a :=  (a) z := a+y return z

Successors B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x :=  (x1) y :=  (y1) a :=  (a1) a := a + y x := x – 1 x :=  (x1) y :=  (y1) a :=  (a1) a := a + y x := x – 1 x1 :=  (x0,x) y1 :=  (y0) a1 :=  (a0,a) if x1 > 0 x1 :=  (x0,x) y1 :=  (y0) a1 :=  (a0,a) if x1 > 0 x :=  (x1) y :=  (y1) a :=  (a1) z := a+y return z x :=  (x1) y :=  (y1) a :=  (a1) z := a+y return z

Next Block B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x1 :=  (x0,x) y1 :=  (y0) a1 :=  (a0,a) if x1 > 0 x1 :=  (x0,x) y1 :=  (y0) a1 :=  (a0,a) if x1 > 0 x :=  (x1) y :=  (y1) a :=  (a1) z := a+y return z x :=  (x1) y :=  (y1) a :=  (a1) z := a+y return z

Successors B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x :=  (x1) y :=  (y1) a :=  (a1) z := a+y return z x :=  (x1) y :=  (y1) a :=  (a1) z := a+y return z

Last Block B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x4 :=  (x1) y4 :=  (y1) a4 :=  (a1) z0 := a4+y4 return z0 x4 :=  (x1) y4 :=  (y1) a4 :=  (a1) z0 := a4+y4 return z0

Key Problem B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x4 :=  (x1) y4 :=  (y1) a4 :=  (a1) z0 := a4+y4 return z0 x4 :=  (x1) y4 :=  (y1) a4 :=  (a1) z0 := a4+y4 return z0 Quadratic in the size of the original graph!

Key Problem B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x2 :=  (x1) y2 :=  (y1) a2 :=  (a1) a3 := a2 + y2 x3 := x2 – 1 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x1 :=  (x0,x3) y1 :=  (y0) a1 :=  (a0,a3) if x1 > 0 x4 :=  (x1) y4 :=  (y1) a4 :=  (a1) z0 := a4+y4 return z0 x4 :=  (x1) y4 :=  (y1) a4 :=  (a1) z0 := a4+y4 return z0 Could clean up using copy propagation and dead code elimination.

Key Problem B1 x0 := n y0 := m a0 := 0 x0 := n y0 := m a0 := 0 a3 := a1 + y0 x3 := x1 – 1 a3 := a1 + y0 x3 := x1 – 1 x1 :=  (x0,x3) a1 :=  (a0,a3) if x1 > 0 x1 :=  (x0,x3) a1 :=  (a0,a3) if x1 > 0 z0 := a1+y0 return z0 z0 := a1+y0 return z0 Could clean up using copy propagation and dead code elimination.

Smarter Algorithm Compute the dominance frontier. Use dominance frontier to place the phi nodes. –If a block B defines x then put a phi node in every block in the dominance frontier of B. Do renaming pass using dominator tree. This isn’t optimal but in practice, produces code that’s linear in the size of the input and is efficient to compute.

Dominance Frontiers Defn: d dominates n if every path from the start node to n must go through d. Defn: If d dominates n and d ≠ n, we say d strictly dominates n. Defn: the dominance frontier of x is the set of all nodes w such that: 1.x dominates a predecessor of w 2.x does not strictly dominate w.

Example (Fig 19.5) dominates 5,6,7,8

Example (Fig 19.5) These are edges that cross from the frontier of 5’s dominator tree.

Example (Fig 19.5) This identifies nodes that satisfy the first condition: nodes that have some predecessor dominated by 5.

Example (Fig 19.5) does not strictly dominate any of the targets of these edges.

Computing Dominance Frontiers local[n]: successors of n not strictly dominated by n. up[n]: nodes in dominance frontier of n that are not strictly dominated by n’s immediate dominator. DF[n] = local[n] U { up[c] | c in children[n] }

Algorithm computeDF[n] = S := {} for each y in succ[n] (* compute local[n] *) if immediate_dominator(y) ≠ n S := S U {y} for each child c of n in dominator tree computeDF[c] for each w in DF[c] (* compute up[c] *) if n does not dominate w or n = w S := S U {w} DF[n] := S

A few notes Algorithm does work proportional to number of edges in control flow graph + size of the dominance frontiers. –pathological cases can lead to quadratic behavior. –in practice, linear All depends upon computing dominator tree. –iterative dataflow algorithm is cubic in worst case. –but Lengauer & Tarjan give an essentially linear time algorithm.

CPS: Recall x0 := n y0 := m if x0 < y0 x1 := x0+1 y1 := y0-1 x2 := x1 y3 := y1 y2 := x0+2 x2 := x0 y3 := y2 x2 :=  (x1,x0) y3 :=  (y1,y2) z0 := x2 * y3 return z0 Just realize the assignments corresponding to the phi nodes on the edges.

CPS n,m. let x0 = n let y0 = m in if x0 < y0 then f1(x0,y0) else f2(x0,y0) x0,y0. let x1 := x0+1 let y1 := y0-1 f3(x1,y1) x0,y0. let y2 := x0+2 f3(x0,y2) x2,y3. let z0 := x2 * y3 return z0

CPS less compact than SSA Can always encode SSA. But requires us to match up a block’s output variables to its successor’s input variables: f(v1,…x…,vn)  x1,…x…,xn. It’s possible to avoid some of this threading, but not as much as in SSA. –Worst case is again quadratic in the size of the code. –CPS: tree-based scope for variables –SSA: graph-based scope for variables

CPS more powerful than SSA On the other hand, CPS supports dynamic control flow graphs. –e.g., “goto x” where x is a variable, not a static label name. That makes it possible to encode strictly more than SSA can. –return addresses (no need for special return instruction – just goto return address.) –exceptions, function calls, loops, etc. all turn into just “goto f(x1,…,xn)”.

Core CPS language op ::= x | true | false | i | … v := op | x 1,…,x n.exp | prim(op 1,…,op n ) exp ::= op(op 1,...,op n ) | if cond(op 1,op 2 ) exp 1 else exp 2 | let x = v in exp | letrec x 1 = v 1,…, x n = v n in exp

CPS let x0 = n in let y0 = m in if x0 < y0 then f1(x0,y0) else f2(x0,y0) let f1 = x0,y0. let x1 := x0+1 in let y1 := y0-1 in f3(x1,y1) let f2 = x0,y0. let y2 := x0+2 in f3(x0,y2) let f3 = x2,y3. let z0 := x2 * y3 in return z0

Dataflow SSA/CFG vs CPS To solve dataflow equations, for CFG or SSA, we iterate over the control flow graph. But for CPS, we don’t know what the graph is (in general) at compile time. –our “successors” and “predecessors” depend upon which values flow into the variables wt jump to. –To figure out which functions we might jump to, we need to do dataflow analysis… –Oops!

Simultaneous Solution: CFA In general, we must simultaneously solve for (an approximation of) the dynamic control flow graph, and the set of values that a variable might take on. This is called control-flow analysis (CFA). The good news: if you solve this, then you don’t need lots of special cases in your dataflow analyses (e.g., for function calls/returns, dynamic method resolution, exceptions, threads, etc.) The bad news: must use very coarse approximations to scale.