Loop Invariant Code Motion — classical approaches — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students.

Slides:



Advertisements
Similar presentations
Code Optimization, Part II Regional Techniques Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Operator Strength Reduction From Cooper, Simpson, & Vick, “Operator Strength Reduction”, ACM TOPLAS, 23(5), See also § of EaC2e. 1COMP 512,
Lecture 11: Code Optimization CS 540 George Mason University.
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
SSA.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Components of representation Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements occur in right order Data.
Program Representations. Representing programs Goals.
Code Motion of Control Structures From the paper by Cytron, Lowry, and Zadeck, COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda.
SSA-Based Constant Propagation, SCP, SCCP, & the Issue of Combining Optimizations 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon,
The Last Lecture Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission.
1 CS 201 Compiler Construction Lecture 7 Code Optimizations: Partial Redundancy Elimination.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
Lazy Code Motion Comp 512 Spring 2011
Partial Redundancy Elimination & Lazy Code Motion
Lazy Code Motion C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Introduction to Code Optimization Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice.
6/9/2015© Hal Perkins & UW CSEU-1 CSE P 501 – Compilers SSA Hal Perkins Winter 2008.
Intermediate Representations Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University.
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Code Shape III Booleans, Relationals, & Control flow Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students enrolled.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
Class canceled next Tuesday. Recap: Components of IR Control dependencies: sequencing of operations –evaluation of if & then –side-effects of statements.
1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.
Recap from last time: live variables x := 5 y := x + 2 x := x + 1 y := x y...
Instruction Scheduling II: Beyond Basic Blocks Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp.
Global optimization. Data flow analysis To generate better code, need to examine definitions and uses of variables beyond basic blocks. With use- definition.
Dynamic Optimization as typified by the Dynamo System See “Dynamo: A Transparent Dynamic Optimization System”, V. Bala, E. Duesterwald, and S. Banerjia,
Code Optimization, Part III Global Methods Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412.
Introduction to Optimization, II Value Numbering & Larger Scopes Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved. Students.
First Principles (with examples from value numbering) C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
Order from Chaos — the big picture — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled.
Order from Chaos — the big picture — C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Building SSA Form, III 1COMP 512, Rice University This lecture presents the problems inherent in out- of-SSA translation and some ways to solve them. Copyright.
Replication & Consolidation — a grab bag of transformations — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Cleaning up the CFG Eliminating useless nodes & edges C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon,
11/23/2015© Hal Perkins & UW CSEQ-1 CSE P 501 – Compilers Introduction to Optimization Hal Perkins Autumn 2009.
High-Level Transformations for Embedded Computing
Algebraic Reassociation of Expressions Briggs & Cooper, “Effective Partial Redundancy Elimination,” Proceedings of the ACM SIGPLAN 1994 Conference on Programming.
Terminology, Principles, and Concerns, III With examples from DOM (Ch 9) and DVNT (Ch 10) Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Dead Code Elimination This lecture presents the algorithm Dead from EaC2e, Chapter 10. That algorithm derives, in turn, from Rob Shillner’s unpublished.
Boolean & Relational Values Control-flow Constructs Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in.
Terminology, Principles, and Concerns, II With examples from superlocal value numbering (Ch 8 in EaC2e) Copyright 2011, Keith D. Cooper & Linda Torczon,
Cleaning up the CFG Eliminating useless nodes & edges This lecture describes the algorithm Clean, presented in Chapter 10 of EaC2e. The algorithm is due.
Profile-Guided Code Positioning See paper of the same name by Karl Pettis & Robert C. Hansen in PLDI 90, SIGPLAN Notices 25(6), pages 16–27 Copyright 2011,
3/2/2016© Hal Perkins & UW CSES-1 CSE P 501 – Compilers Optimizing Transformations Hal Perkins Autumn 2009.
CS412/413 Introduction to Compilers and Translators April 2, 1999 Lecture 24: Introduction to Optimization.
Profile Guided Code Positioning C OMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Eliminating Array Bounds Checks — and related problems — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.
Code Optimization Overview and Examples
Introduction to Optimization
Finding Global Redundancies with Hopcroft’s DFA Minimization Algorithm
Introduction to Optimization
Intermediate Representations
Optimizing Transformations Hal Perkins Autumn 2011
Code Shape III Booleans, Relationals, & Control flow
Intermediate Representations
Optimizing Transformations Hal Perkins Winter 2008
Optimization through Redundancy Elimination: Value Numbering at Different Scopes COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith.
Code Optimization Overview and Examples Control Flow Graph
Topic 5a Partial Redundancy Elimination and SSA Form
Copyright 2003, Keith D. Cooper & Linda Torczon, all rights reserved.
Introduction to Optimization
Algebraic Reassociation of Expressions COMP 512 Rice University Houston, Texas Fall 2003 P. Briggs & K.D. Cooper, “Effective Partial Redundancy Elimination,”
The Partitioning Algorithm for Detecting Congruent Expressions COMP 512 Rice University Houston, Texas Fall 2003 Copyright 2003, Keith D. Cooper.
Presentation transcript:

Loop Invariant Code Motion — classical approaches — 1COMP 512, Rice University Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use. Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved. Comp 512 Spring 2011

COMP 512, Rice University2 Dead code Dead code elimination Clean Simp. of Algebraic identities The Comp 512 Taxonomy Machine Independent Redundancy LVN, SVN, DVNT GCSE Red. Store Elim. Code motion Constant Folding Hoisting/Sinking Loop Invariant Code Motion Create opportunities Loop Unrolling Inline Substitution Replication Specialization Loop Unrolling Loop Fusion (unroll & jam) Inline Substitution Consolidation Today

Background Code in loops executes more often than code outside loops An expression is loop invariant if it computes the same value, independent of the iteration in which it is computed Loop invariant code can often be moved to a spot before the loop  Execute once and reuse the value multiple times  Reduce the execution cost of the loop Techniques seem to fall into two categories Ad-hoc graph-based algorithms Complete data-flow solutions Think of loop invariant as redundant across iterations Always lengthens live range COMP 512, Rice University3

Loop-invariant Code Motion Example x + y is invariant in the loop  Computed on each iteration  Subexpressions don’t change Move evaluation out of loop  Need block to hold it  Use existing block or insert new Relationship to redundancy x + y is redundant along back edge x + y is not redundant from loop entry If we add an evaluation on the entry edge, x + y in the loop is redundant COMP 512, Rice University4 x,y ← … … x + y Neither x nor y change in loop

Loop-invariant Code Motion COMP 512, Rice University5 x,y ← … … x + y Neither x nor y change in loop x,y ← … t ← x + y … t x,y ← … … x + y Option 1 Move x + y into predecessor Option 2 Create a block for x + y t ← x + y What’s the difference?

Loop-invariant Code Motion COMP 512, Rice University6 x,y ← … … x + y x,y ← … t ← x + y … t x,y ← … … t t ← x + y In practice, many loops have a zero-trip test Determines whether to run the first iteration Moving x + y above zero-trip test is speculative > Lengthens path around the loop > LICM techniques often create a landing pad SpeculativeConservative landing pad zero- trip test

Loop-invariant Code Motion COMP 512, Rice University7 x,y ← … z ← x + y x,y ← … z ← x + y Another difference between methods Some methods move expression evaluation, but not assignment > Easier safety conditions, easier to manage transformation > Leaves a copy operation in the loop ( may coalesce ) Other methods move the entire assignment > Eliminates copy from loop, as well Move expressionMove assignment x,y ← … z ← t t ← x + y

Loop-invariant Code Motion Control flow Another source of speculation  Move it & lengthen other path  Don’t move it Divergence may be an issue Don’t want to move the op if doing so introduces an exception Might move it if that path is hot Would like branch probabilities COMP 512, Rice University8 x,y ← … … x / y None of x, y, or z change in loop if (y != 0) then z = x/y else z = x Note that y is invariant in this classic example, so we could move the test, as well. See Cytron, Lowry, & Zadeck.

Loop-invariant Code Motion Pedagogical Plan 1. A simple and direct algorithm 2. Lazy code motion, a data-flow solution based on availability 3. A structural approach based on SSA Which is best? The authors of 2 & 3 would each argue for their technique In practice, each has strengths and weaknesses COMP 512, Rice University9

10 Loop Shape Is Important Loops Evaluate condition before loop (if needed) Evaluate condition after loop Branch back to the top (if needed) Merges test with last block of loop body while, for, do, & until all fit this basic model For tail recursion, unroll the recursion once … Pre-test Loop body Post-test Next block COMP 412 teaches this loop shape. It creates a natural place to insert a landing pad. Landing pad Paper by Mueller & Whalley in SIGPLAN ‘92 PLDI about converting arbitrary loops to this form.

Loop-invariant Code Motion Preliminaries Need a list of loops, ordered inner to outer in each loop nest  Find loops from back edges in dominator tree  DOM and IDOM sets can show nesting relationships Need a landing pad on each loop COMP 512, Rice University11

Loop-invariant Code Motion A Simple Algorithm COMP 512, Rice University12 FactorInvariants(loop) MarkInvariants(loop) for each op o ∈ loop (x ← y + z) if x is marked invariant then begin allocate a new name t replace each use of o with t insert t ← o in landing pad for loop end MarkInvariants(loop ) LOOPDEF ← ∪ block b ∈ loop DEF(b) for each op o ∈ loop (x ← y + z) mark x as invariant if y ∈ LOOPDEF then mark x as variant if z ∈ LOOPDEF then mark x as variant for each l in LOOPLIST do FactorInvariants(l )

Example Consider the following simple loop nest COMP 512, Rice University13 do i ← 1 to 100 Dimensions addressedMultiplies do j ← 1 to 100 do k ← 1 to 100 a(i,j,k) ← i * j * k 3,000,0002,000,000 end Original Code

Example Consider the following simple loop nest COMP 512, Rice University14 do i ← 1 to 100 Dimensions addressedMultiplies do j ← 1 to 100 t 1 ← addr(a(i,j)) 20,000 t 2 ← i * j 10,000 do k ← 1 to 100 t 1 (k) ← t 2 * k 1,000,000 end After Doing Inner Loop

Example Consider the following simple loop nest COMP 512, Rice University15 do i ← 1 to 100 Dimensions addressed Multiplies t 3 ← addr(a(i)) 100 do j ← 1 to 100 t 1 ← addr(t 3 (j)) 10,000 t 2 ← i * j 10,000 do k ← 1 to 100 t 1 (k) ← t 2 * k 1,000,000 end After Doing Middle Loop

Safety of Loop-invariant Code Motion ( all algorithms ) What happens if we move an expression that generates an exception when evaluated? Maybe the fault happens at a different time in execution Maybe the original code would not have faulted Ideally, the compiler should delay the fault until it would occur Options Mask fault & add a test to loop body Replace value with one that causes fault when used Block read access to the faulted value Patch the executable with a bad opcode Generate two copies of the loop and rerun with original code Never move an evaluation that can fault COMP 512, Rice University16

Profitability of Loop-invariant Code Motion ( all algs ) Does the loop body always execute?  Placement of the landing pad is crucial Lengthen any paths?  Conservative code motion: not a problem  Speculative code motion: might be an issue  Rely on estimates of branch frequency? Register pressure?  Target of moved operation has longer live range  Might shorten live ranges of operands of moved op COMP 512, Rice University17 Backus’ dilemmna

Shortcomings of the Simple LICM Algorithm Moves code out of conditionals in a speculative fashion  Ignores control flow inside the loop Moves only evaluation, not assignment  Requires carefully constructed name space to cure ( SSA? ) Only finds first order invariants Need to repeat on loop l until it stabilizes, then move on COMP 512, Rice University18 for i ← 1 to n for j ← 1 to m … x ← a * b y ← x * c … end First order invariant Second order invariant

Next Lecture Lazy code motion Knoop, Ruthing, & Steffen’s Drechsler & Stadel COMP 512, Rice University19