Dataflow II: Finish Dataflow Analysis, Start on Classical Optimizations EECS 483 – Lecture 24 University of Michigan Wednesday, November 29, 2006.

Slides:

Advertisements

Similar presentations

SSA and CPS CS153: Compilers Greg Morrisett. Monadic Form vs CFGs Consider CFG available exp. analysis: statement gen's kill's x:=v 1 p v 2 x:=v 1 p v.

Advertisements

P3 / 2004 Register Allocation. Kostis Sagonas 2 Spring 2004 Outline What is register allocation Webs Interference Graphs Graph coloring Spilling Live-Range.

7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.

Lecture 11: Code Optimization CS 540 George Mason University.

Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.

19 Classic Examples of Local and Global Code Optimizations Local Constant folding Constant combining Strength reduction.

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.

Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-

C Chuen-Liang Chen, NTUCS&IE / 321 OPTIMIZATION Chuen-Liang Chen Department of Computer Science and Information Engineering National Taiwan University.

1 Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

1 Classical Optimization Types of classical optimizations  Operation level: one operation in isolation  Local: optimize pairs of operations in same basic.

Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.

School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Dataflow Analysis Introduction Guo, Yao Part of the slides are adapted from.

Program Representations. Representing programs Goals.

Optimization Compiler Baojian Hua

1 Data flow analysis Goal : collect information about how a procedure manipulates its data This information is used in various optimizations For example,

EECS 583 – Class 6 Dataflow Analysis University of Michigan September 26, 2011.

Common Sub-expression Elim Want to compute when an expression is available in a var Domain:

1 Copy Propagation What does it mean? Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.

1 CS 201 Compiler Construction Lecture 5 Code Optimizations: Copy Propagation & Elimination.

Organization Introduction Classifications of Optimization techniques

1 Data flow analysis Goal : –collect information about how a procedure manipulates its data This information is used in various optimizations –For example,

CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

9. Optimization Marcus Denker. 2 © Marcus Denker Optimization Roadmap  Introduction  Optimizations in the Back-end  The Optimizer  SSA Optimizations.

1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.

1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.

Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.

Intermediate Code. Local Optimizations

Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...

Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.

Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.

Direction of analysis Although constraints are not directional, flow functions are All flow functions we have seen so far are in the forward direction.

Prof. Bodik CS 164 Lecture 16, Fall Global Optimization Lecture 16.

Precision Going back to constant prop, in what cases would we lose precision?

1 CS 201 Compiler Construction Data Flow Analysis.

Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.

Dataflow IV: Loop Optimizations, Exam 2 Review EECS 483 – Lecture 26 University of Michigan Wednesday, December 6, 2006.

1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.

Dataflow Analysis Topic today Data flow analysis: Section 3 of Representation and Analysis Paper (Section 3) NOTE we finished through slide 30 on Friday.

1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”

Jeffrey D. Ullman Stanford University. 2 boolean x = true; while (x) {... // no change to x }  Doesn’t terminate.  Proof: only assignment to x is at.

1 Data Flow Analysis Data flow analysis is used to collect information about the flow of data values across basic blocks. Dominator analysis collected.

EECS 583 – Class 6 Dataflow Analysis University of Michigan September 24, 2012.

EECS 583 – Class 8 Classic Optimization University of Michigan October 3, 2011.

EECS 583 – Class 5 Hyperblocks, Control Height Reduction University of Michigan September 21, 2011.

Optimization Simone Campanoni

©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.

Code Optimization Data Flow Analysis. Data Flow Analysis (DFA)  General framework  Can be used for various optimization goals  Some terms  Basic block.

Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.

Code Optimization Overview and Examples

Code Optimization.

Control Flow IV: Control Flow Optimizations Start Dataflow Analysis

Optimization Code Optimization ©SoftMoore Consulting.

Fall Compiler Principles Lecture 8: Loop Optimizations

Topic 10: Dataflow Analysis

Code Generation Part III

Code Optimization Overview and Examples Control Flow Graph

Code Generation Part III

EECS 583 – Class 6 More Dataflow Analysis

EECS 583 – Class 8 Classic Optimization

Optimizations using SSA

Fall Compiler Principles Lecture 10: Loop Optimizations

EECS 583 – Class 8 Finish SSA, Start on Classic Optimization

EECS 583 – Class 5 Dataflow Analysis

EECS 583 – Class 7 Static Single Assignment Form

Lecture 19: Code Optimisation

EECS 583 – Class 9 Classic and ILP Optimization

Objectives Identify advantages (and disadvantages ?) of optimizing in SSA form Given a CFG in SSA form, perform Global Constant Propagation Dead code elimination.

Presentation transcript:

Dataflow II: Finish Dataflow Analysis, Start on Classical Optimizations EECS 483 – Lecture 24 University of Michigan Wednesday, November 29, 2006

- 1 - Announcements and Reading v Project 3 – should have started work on this v Schedule for the rest of the semester »Today – Dataflow analysis »Wednes 11/29 – Finish dataflow, optimizations »Mon 12/4 – Optimizations, start on register allocation »Wednes 12/6 – Register allocation, Exam 2 review »Mon 12/11 – Exam 2 in class »Wednes 12/13 – No class (Project 3 due) v Reading for today’s class »10.5, , 10.11

- 2 - Class Problem – From Last Time 1: r1 = 3 2: r2 = r3 3: r3 = r4 4: r1 = r : r7 = r1 * r2 6: r2 = 07: r2 = r : r4 = r2 + r1 9: r9 = r4 + r8 Reaching definitions Calculate GEN/KILL Calculate IN/OUT GEN = 1,2,3 KILL = 4,6,7 GEN = 4,5 KILL = 1 GEN = 7 KILL = 2,6 GEN = 8 KILL =  GEN = 9 KILL =  GEN = 6 KILL = 2,7 IN =  IN = 1,2,3  1,2,3,4,5,6,7,8 IN = 2,3,4,5  2,3,4,5,6,7,8 IN = 3,4,5,6,7,8 IN = 3,4,5,6,7  3,4,5,6,7,8 IN = 2,3,4,5  2,3,4,5,6,7,8 OUT = 1,2,3 OUT = 2,3,4,5  2,3,4,5,6,7,8 OUT = 3,4,5,7  3,4,5,7,8 OUT = 3,4,5,6,7,8  3,4,5,6,7,8 OUT = 3,4,5,6,7,8,9 OUT = 3,4,5,6  3,4,5,6,8

- 3 - Some Things to Think About v Liveness and reaching defs are basically the same thing!!!!!!!!!!!!!!!!!! »All dataflow is basically the same with a few parameters  Meaning of gen/kill (use/def)  Backward / Forward  All paths / some paths (must/may) u So far, we have looked at may analysis algorithms u How do you adjust to do must algorithms? v Dataflow can be slow »How to implement it efficiently? (Block traversal order can speed things up) »How to represent the info? (Bitvectors)

- 4 - Generalizing Dataflow Analysis v Transfer function »How information is changed by “something” (BB) »OUT = GEN + (IN – KILL) forward analysis »IN = GEN + (OUT – KILL) backward analysis v Meet function »How information from multiple paths is combined »IN = Union(OUT(predecessors)) forward analysis »OUT = Union(IN(successors)) backward analysis »Note, this is only for “any path

- 5 - Generalized Dataflow Algorithm v while (change) »change = false »for each BB  apply meet function  apply transfer function  if any changes  change = true

- 6 - Liveness Using GEN/KILL v Liveness = upward exposed uses for each basic block in the procedure, X, do up_use_GEN(X) = 0 up_use_KILL(X) = 0 for each operation in reverse sequential order in X, op, do for each destination operand of op, dest, do up_use_GEN(X) -= dest up_use_KILL(X) += dest endfor for each source operand of op, src, do up_use_GEN(X) += src up_use_KILL(X) -= src endfor

- 7 - Example - Liveness with GEN/KILL r1 = MEM[r2+0] r2 = r2 + 1 r3 = r1 * r4 r1 = r1 + 5 r3 = r5 – r1 r7 = r3 * 2 r2 = 0 r7 = 23 r1 = 4 r3 = r3 + r7 r1 = r3 – r8 r3 = r1 * 2 up_use_GEN(4.1) = r1up_use_KILL(4.1) = r3 up_use_GEN(4.2) = r3,r8 up_use_GEN(4.3) = r3,r7,r8 up_use_KILL(4.2) = r1 up_use_KILL(4.3) = r1 up_use_KILL(3) = r1, r2, r7 up_use_KILL(1) = r1,r3 up_use_KILL(2) = r3,r7 up_use_GEN(3) = 0 up_use_GEN(1) = r2,r4 up_use_GEN(2) = r1,r5 meet: OUT = Union(IN(succs)) xfer: IN = GEN + (OUT – KILL) BB1 BB2BB3 BB4

- 8 - Beyond Liveness (Upward Exposed Uses) v Upward exposed defs »IN = GEN + (OUT – KILL) »OUT = Union(IN(successors)) »Walk ops reverse order  GEN += dest; KILL += dest v Downward exposed uses »IN = Union(OUT(predecessors)) »OUT = GEN + (IN-KILL) »Walk ops forward order  GEN += src; KILL -= src;  GEN -= dest; KILL += dest; v Downward exposed defs »IN = Union(OUT(predecessors)) »OUT = GEN + (IN-KILL) »Walk ops forward order  GEN += dest; KILL += dest;

- 9 - What About All Path Problems? v Up to this point »Any path problems (maybe relations)  Definition reaches along some path  Some sequence of branches in which def reaches  Lots of defs of the same variable may reach a point »Use of Union operator in meet function v All-path: Definition guaranteed to reach »Regardless of sequence of branches taken, def reaches »Can always count on this »Only 1 def can be guaranteed to reach »Availability (as opposed to reaching)  Available definitions  Available expressions (could also have reaching expressions, but not that useful)

Reaching vs Available Definitions 1: r1 = r2 + r3 2: r6 = r4 – r5 3: r4 = 4 4: r6 = 8 5: r6 = r2 + r3 6: r7 = r4 – r5 1,2,3,4 reach 1 available 1,2 reach 1,2 available 1,3,4 reach 1,3,4 available 1,2 reach 1,2 available

Available Definition Analysis (Adefs) v A definition d is available at a point p if along all paths from d to p, d is not killed v Remember, a definition of a variable is killed between 2 points when there is another definition of that variable along the path »r1 = r2 + r3 kills previous definitions of r1 v Algorithm »Forward dataflow analysis as propagation occurs from defs downwards »Use the Intersect function as the meet operator to guarantee the all-path requirement »GEN/KILL/IN/OUT similar to reaching defs  Initialization of IN/OUT is the tricky part

Compute Adef GEN/KILL Sets for each basic block in the procedure, X, do GEN(X) = 0 KILL(X) = 0 for each operation in sequential order in X, op, do for each destination operand of op, dest, do G = op K = {all ops which define dest – op} GEN(X) = G + (GEN(X) – K) KILL(X) = K + (KILL(X) – G) endfor Exactly the same as reaching defs !!!!!!!

Compute Adef IN/OUT Sets U = universal set of all operations in the Procedure IN(0) = 0 OUT(0) = GEN(0) for each basic block in procedure, W, (W != 0), do IN(W) = 0 OUT(W) = U – KILL(W) change = 1 while (change) do change = 0 for each basic block in procedure, X, do old_OUT = OUT(X) IN(X) = Intersect(OUT(Y)) for all predecessors Y of X OUT(X) = GEN(X) + (IN(X) – KILL(X)) if (old_OUT != OUT(X)) then change = 1 endif endfor

Available Expression Analysis (Aexprs) v An expression is a RHS of an operation »r2 = r3 + r4, r3+r4 is an expression v An expression e is available at a point p if along all paths from e to p, e is not killed v An expression is killed between 2 points when one of its source operands are redefined »r1 = r2 + r3 kills all expressions involving r1 v Algorithm »Forward dataflow analysis »Use the Intersect function as the meet operator to guarantee the all-path requirement »Looks exactly like adefs, except GEN/KILL/IN/OUT are the RHS’s of operations rather than the LHS’s

Class Problem - Aexprs Calculation 1: r1 = r6 * r9 2: r2 = r : r5 = r3 * r4 4: r1 = r : r3 = r3 * r4 6: r8 = r3 * 2 7: r7 = r3 * r4 8: r1 = r : r7 = r : r8 = r : r1 = r3 * r4 12: r3 = r6 * r9 Compute the Aexpr IN/OUT sets for each BB

Optimization – Put Dataflow To Work! v Make the code run faster on the target processor »Anything goes  Look at benchmark kernels, what’s the bottleneck??  Invent your own optis v Classes of optimization »1. Classical (machine independent)  Reducing operation count (redundancy elimination)  Simplifying operations »2. Machine specific  Peephole optimizations  Take advantage of specialized hardware features »3. ILP enhancing  Increasing parallelism  Possibly increase instructions

Types of Classical Optimizations v Operation-level – 1 operation in isolation »Constant folding, strength reduction »Dead code elimination (global, but 1 op at a time) v Local – Pairs of operations in same BB »May or may not use dataflow analysis v Global – Again pairs of operations »But, operations in different BBs »Dataflow analysis necessary here v Loop – Body of a loop

Caveat v Traditional compiler class »Fancy implementations of optimizations, efficient algorithms »Bla bla bla »Spend entire class on 1 optimization v For this class – Go over concepts of each optimization »What it is »When can it be applied (set of conditions that must be satisfied)

Constant Folding v Simplify operation based on values of src operands »Constant propagation creates opportunities for this v All constant operands »Evaluate the op, replace with a move  r1 = 3 * 4  r1 = 12  r1 = 3 / 0  ??? Don’t evaluate excepting ops !, what about FP? »Evaluate conditional branch, replace with BRU or noop  if (1 < 2) goto BB2  BRU BB2  if (1 > 2) goto BB2  convert to a noop v Algebraic identities »r1 = r2 + 0, r2 – 0, r2 | 0, r2 ^ 0, r2 > 0  r1 = r2 »r1 = 0 * r2, 0 / r2, 0 & r2  r1 = 0 »r1 = r2 * 1, r2 / 1  r1 = r2

Strength Reduction v Replace expensive ops with cheaper ones »Constant propagation creates opportunities for this v Power of 2 constants »Mpy by power of 2: r1 = r2 * 8  r1 = r2 << 3 »Div by power of 2: r1 = r2 / 4  r1 = r2 >> 2 »Rem by power of 2: r1 = r2 REM 16  r1 = r2 & 15 v More exotic »Replace multiply by constant by sequence of shift and adds/subs  r1 = r2 * 6 u r100 = r2 << 2; r101 = r2 << 1; r1 = r100 + r101  r1 = r2 * 7 u r100 = r2 << 3; r1 = r100 – r2

Dead Code Elimination v Remove any operation who’s result is never consumed v Rules »X can be deleted  no stores or branches »DU chain empty or dest not live v This misses some dead code!! »Especially in loops »Critical operation  store or branch operation »Any operation that does not directly or indirectly feed a critical operation is dead »Trace UD chains backwards from critical operations »Any op not visited is dead r1 = 3 r2 = 10 r4 = r4 + 1 r7 = r1 * r4 r2 = 0r3 = r3 + 1 r3 = r2 + r1 store (r1, r3)

Class Problem r1 = 0 r4 = r1 | -1 r7 = r1 * 4 r6 = r1 r3 = 8 / r6 r3 = 8 * r6 r3 = r3 + r2 r2 = r2 + r1 r6 = r7 * r6 r1 = r1 + 1 store (r1, r3) Optimize this applying 1. constant folding 2. strength reduction 3. dead code elimination

Constant Propagation v Forward propagation of moves of the form »rx = L (where L is a literal) »Maximally propagate »Assume no instruction encoding restrictions v When is it legal? »SRC: Literal is a hard coded constant, so never a problem »DEST: Must be available  Guaranteed to reach  May reach not good enough r1 = 5 r2 = r1 + r3 r1 = r1 + r2r7 = r1 + r4 r8 = r1 + 3 r9 = r1 + r11

Local Constant Propagation v Consider 2 ops, X and Y in a BB, X is before Y »1. X is a move »2. src1(X) is a literal »3. Y consumes dest(X) »4. There is no definition of dest(X) between X and Y  Defn is locally available! »5. Be careful if dest(X) is SP, FP or some other special register – If so, no subroutine calls between X and Y 1: r1 = 5 2: r2 = ‘_x’ 3: r3 = 7 4: r4 = r4 + r1 5: r1 = r1 + r2 6: r1 = r : r3 = 12 8: r8 = r1 - r2 9: r9 = r3 + r5 10: r3 = r : r10 = r3 – r1

Global Constant Propagation v Consider 2 ops, X and Y in different BBs »1. X is a move »2. src1(X) is a literal »3. Y consumes dest(X) »4. X is in adef_IN(BB(Y)) »5. dest(X) is not modified between the top of BB(Y) and Y  Rules 4/5 guarantee X is available »6. If dest(X) is SP/FP/..., no subroutine call between X and Y r1 = 5 r2 = ‘_x’ r1 = r1 + r2r7 = r1 – r2 r8 = r1 * r2 r9 = r1 + r2 Note: checks for subroutine calls whenever SP/FP/etc. are involved is required for all optis. I will omit the check from here on!

Class Problem 1: r1 = 0 2: r2 = 10 3: r4 = 1 4: r7 = r1 * 4 5: r6 = 8 6: r2 = 0 7: r3 = r2 / r6 8: r3 = r4 * r6 9: r3 = r3 + r2 10: r2 = r2 + r1 11: r6 = r7 * r6 12: r1 = r : store (r1, r3) Optimize this applying 1. constant propagation 2. constant folding 3. strength reduction 4. dead code elimination

Forward Copy Propagation v Forward propagation of the RHS of moves »X: r1 = r2 »… »Y: r4 = r1 + 1  r4 = r2 + 1 v Benefits »Reduce chain of dependences »Possibly eliminate the move v Rules (ops X and Y) »X is a move »src1(X) is a register »Y consumes dest(X) »X.dest is an available def at Y »X.src1 is an available expr at Y r1 = r2 r3 = r4 r2 = 0r6 = r3 + 1 r5 = r2 + r3

Backward Copy Propagation v Backward prop. of the LHS of moves »X: r1 = r2 + r3  r4 = r2 + r3 »… »r5 = r1 + r6  r5 = r4 + r6 »… »Y: r4 = r1  noop v Rules (ops X and Y in same BB) »dest(X) is a register »dest(X) not live out of BB(X) »Y is a move »dest(Y) is a register »Y consumes dest(X) »dest(Y) not consumed in (X…Y) »dest(Y) not defined in (X…Y) »There are no uses of dest(X) after the first redefinition of dest(Y) r1 = r8 + r9 r2 = r9 + r1 r4 = r2 r6 = r2 + 1 r9 = r1 r10 = r6 r5 = r6 + 1 r4 = 0 r8 = r2 + r7