Advanced Compiler Techniques

Slides:



Advertisements
Similar presentations
Example of Constructing the DAG (1)t 1 := 4 * iStep (1):create node 4 and i 0 Step (2):create node Step (3):attach identifier t 1 (2)t 2 := a[t 1 ]Step.
Advertisements

Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Lecture 11: Code Optimization CS 540 George Mason University.
Chapter 9 Code optimization Section 0 overview 1.Position of code optimizer 2.Purpose of code optimizer to get better efficiency –Run faster –Take less.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses.
Control-Flow Graphs & Dataflow Analysis CS153: Compilers Greg Morrisett.
SSA.
Chapter 10 Code Optimization. A main goal is to achieve a better performance Front End Code Gen Intermediate Code source Code target Code user Machine-
Advanced Compiler Techniques LIU Xianhua School of EECS, Peking University Control Flow Analysis & Local Optimizations.
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) Partial Redundancy Elimination Guo, Yao.
Introduction to Optimizations
Recap from last time We were trying to do Common Subexpression Elimination Compute expressions that are available at each program point.
Peephole Optimization Final pass over generated code: examine a few consecutive instructions: 2 to 4 See if an obvious replacement is possible: store/load.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
4/23/09Prof. Hilfinger CS 164 Lecture 261 IL for Arrays & Local Optimizations Lecture 26 (Adapted from notes by R. Bodik and G. Necula)
Code Generation Professor Yihjia Tsai Tamkang University.
1 CS 201 Compiler Construction Lecture 3 Data Flow Analysis.
2015/6/24\course\cpeg421-10F\Topic1-b.ppt1 Topic 1b: Flow Analysis Some slides come from Prof. J. N. Amaral
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
CS 412/413 Spring 2007Introduction to Compilers1 Lecture 29: Control Flow Analysis 9 Apr 07 CS412/413 Introduction to Compilers Tim Teitelbaum.
1 COMP 144 Programming Language Concepts Felix Hernandez-Campos Lecture 34: Code Optimization COMP 144 Programming Language Concepts Spring 2002 Felix.
Intermediate Code. Local Optimizations
Improving Code Generation Honors Compilers April 16 th 2002.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
Compiler Construction A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University Second Semester 2008/2009.
Machine-Independent Optimizations Ⅰ CS308 Compiler Theory1.
1 CS 201 Compiler Construction Data Flow Analysis.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
1 Code optimization “Code optimization refers to the techniques used by the compiler to improve the execution efficiency of the generated object code”
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Synopsys University Courseware Copyright © 2012 Synopsys, Inc. All rights reserved. Compiler Optimization and Code Generation Lecture - 1 Developed By:
Code Generation Ⅰ CS308 Compiler Theory1. 2 Background The final phase in our compiler model Requirements imposed on a code generator –Preserving the.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
1 CS 201 Compiler Construction Lecture 2 Control Flow Analysis.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
More Code Generation and Optimization Pat Morin COMP 3002.
Code Optimization Code produced by compilation algorithms can often be improved (ideally optimized) in terms of run-time speed and the amount of memory.
Code Optimization Overview and Examples
High-level optimization Jakub Yaghob
Code Optimization.
Optimization Code Optimization ©SoftMoore Consulting.
Fall Compiler Principles Lecture 8: Loop Optimizations
Coding Design, Style, Documentation and Optimization
Topic 10: Dataflow Analysis
Unit IV Code Generation
Chapter 6 Intermediate-Code Generation
Topic 4: Flow Analysis Some slides come from Prof. J. N. Amaral
Code Optimization Overview and Examples Control Flow Graph
Optimizations using SSA
Control Flow Analysis (Chapter 7)
Interval Partitioning of a Flow Graph
Data Flow Analysis Compiler Design
Static Single Assignment
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Intermediate Code Generation
Lecture 19: Code Optimisation
Taken largely from University of Delaware Compiler Notes
Code Generation Part II
CSE P 501 – Compilers SSA Hal Perkins Autumn /31/2019
Code Optimization.
Presentation transcript:

Advanced Compiler Techniques Control Flow Analysis & Local Optimizations LIU Xianhua School of EECS, Peking University

Levels of Optimizations Local inside a basic block Global (intraprocedural) Across basic blocks Whole procedure analysis Interprocedural Across procedures Whole program analysis “Advanced Compiler Techniques”

The Golden Rules of Optimization Premature Optimization is Evil Donald Knuth, premature optimization is the root of all evil Optimization can introduce new, subtle bugs Optimization usually makes code harder to understand and maintain Get your code right first, then, if really needed, optimize it Document optimizations carefully Keep the non-optimized version handy, or even as a comment in your code “Advanced Compiler Techniques”

The Golden Rules of Optimization The 80/20 Rule In general, 80% percent of a program’s execution time is spent executing 20% of the code 90%/10% for performance-hungry programs Spend your time optimizing the important 10/20% of your program Optimize the common case even at the cost of making the uncommon case slower “Advanced Compiler Techniques”

The Golden Rules of Optimization Good Algorithms Rule The best and most important way of optimizing a program is using good algorithms E.g. O(n*logn) rather than O(n2) However, we still need lower level optimization to get more of our programs In addition, asymptotic complexity is not always an appropriate metric of efficiency Hidden constant may be misleading E.g. a linear time algorithm than runs in 100*n+100 time is slower than a cubic time algorithm than runs in n3+10 time if the problem size is small “Advanced Compiler Techniques”

General Optimization Techniques Strength reduction Use the fastest version of an operation E.g. x >> 2 instead of x / 4 x << 1 instead of x * 2 Common sub expression elimination Eliminate redundant calculations double x = d * (lim / max) * sx; double y = d * (lim / max) * sy; double depth = d * (lim / max); double x = depth * sx; double y = depth * sy; “Advanced Compiler Techniques”

General Optimization Techniques Code motion Invariant expressions should be executed only once E.g. for (int i = 0; i < x.length; i++) x[i] *= Math.PI * Math.cos(y); double picosy = Math.PI * Math.cos(y); x[i] *= picosy; “Advanced Compiler Techniques”

General Optimization Techniques Loop unrolling The overhead of the loop control code can be reduced by executing more than one iteration in the body of the loop. E.g. double picosy = Math.PI * Math.cos(y); for (int i = 0; i < x.length; i++) x[i] *= picosy; for (int i = 0; i < x.length; i += 2) { x[i+1] *= picosy; } “Advanced Compiler Techniques”

Compiler Optimizations Compilers try to generate good code i.e. Fast Code improvement is challenging Many problems are NP-hard Code improvement may slow down the compilation process In some domains, such as just-in-time compilation, compilation speed is critical “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Phases of Compilation The first three phases are language-dependent The last two are machine-dependent The middle two dependent on neither the language nor the machine “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Phases “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Control Flow Control transfer = branch (taken or fall-through) Control flow Branching behavior of an application What sequences of instructions can be executed Execution  Dynamic control flow Direction of a particular instance of a branch Predict, speculate, squash, etc. Compiler  Static control flow Not executing the program Input not known, so what could happen Control flow analysis Determining properties of the program branch structure Determining instruction execution properties “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Basic Blocks A basic block is a maximal sequence of consecutive three-address instructions with the following properties: The flow of control can only enter the basic block thru the 1st instruction in the block. (no jumps into the middle of the block) Control will leave the block without halting or branching, except possibly at the last instruction in the block. Basic blocks become the nodes of a flow graph, with edges indicating the order. “Advanced Compiler Techniques”

Examples for i from 1 to 10 do for j from 1 to 10 do a[i,j]=0.0 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0.0 j = j + 1 if j <= 10 goto (3) i = i + 1 if i <= 10 goto (2) t5 = i - 1 t6 = 88 * t5 a[t6] = 1.0 if i <= 10 goto (13) for i from 1 to 10 do for j from 1 to 10 do a[i,j]=0.0 a[i,i]=1.0

Identifying Basic Blocks Input: sequence of instructions instr(i) Output: A list of basic blocks Method: Identify leaders: the first instruction of a basic block Iterate: add subsequent instructions to basic block until we reach another leader “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Identifying Leaders Rules for finding leaders in code First instr in the code is a leader Any instr that is the target of a (conditional or unconditional) jump is a leader Any instr that immediately follow a (conditional or unconditional) jump is a leader “Advanced Compiler Techniques”

Basic Block Partition Algorithm leaders = {1} // start of program for i = 1 to |n| // all instructions if instr(i) is a branch leaders = leaders U targets of instr(i) U instr(i+1) worklist = leaders While worklist not empty x = first instruction in worklist worklist = worklist – {x} block(x) = {x} for i = x + 1; i <= |n| && i not in leaders; i++ block(x) = block(x) U {i} “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Basic Block Example A i = 1 j = 1 t1 = 10 * i t2 = t1 + j t3 = 8 * t2 t4 = t3 - 88 a[t4] = 0.0 j = j + 1 if j <= 10 goto (3) i = i + 1 if i <= 10 goto (2) t5 = i - 1 t6 = 88 * t5 a[t6] = 1.0 if i <= 10 goto (13) B C Leaders Basic Blocks D E F “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Control-Flow Graphs Control-flow graph: Node: an instruction or sequence of instructions (a basic block) Two instructions i, j in same basic block iff execution of i guarantees execution of j Directed edge: potential flow of control Distinguished start node Entry & Exit First & last instruction in program “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Control-Flow Edges Basic blocks = nodes Edges: Add directed edge between P and S if: Jump/branch from last statement of P to first statement of S, or According to the initial order, S immediately follows P in program order and P does not end with unconditional branch (goto/return/call) Definition of predecessor and successor P is a predecessor of S S is a successor of P “Advanced Compiler Techniques”

Control-Flow Edge Algorithm Input: block(i), sequence of basic blocks Output: CFG where nodes are basic blocks for i = 1 to the number of blocks x = last instruction of block(i) if instr(x) is a branch/jump for each target y of instr(x), create edge (i -> y) if instr(x) is not unconditional branch, create edge (i -> i+1) “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Dominator Defn: Dominator – Given a CFG(V, E, Entry, Exit), a node x dominates a node y, if every path from the Entry block to y contains x In the reverse direction, node x post-dominates block y if every path from y to the exit has to pass through block x. Some properties of dominators: Reflexivity, transitivity, anti-symmetry If x dominates z and y dominates z, then either x dominates y or y dominates x Intuition Given some BB, which blocks are guaranteed to have executed prior to executing the BB “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Dominator Tree It is said that a block x immediately dominates block y if x dominates y, and there is no intervening block P such that x dominates P and P dominates y. In other words, x is the last dominator on all paths from entry to y. Each block has a unique immediate dominator. A dominator tree is a tree where each node's children are those nodes it immediately dominates. Because the immediate dominator is unique, it is a tree. The start node is the root of the tree. {1} 1 1 2 {1,2} 4 {1,4} 2 4 3 5 {1,2,3} 3 5 {1,5} “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Loops Loops comes from while, do-while, for, goto…… Many transformation depends on loops Back edge: An edge is a back edge if its head dominates its tail. Loop definition: A set of nodes L in a CFG is a loop if There is a node called the loop entry: no other node in L has a predecessor outside L. Every node in L has a nonempty path (within L) to the entry of L. “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Example: Back Edges {1} 1 CFG(Control Flow Graph) 2 {1,2} 4 {1,4} 3 5 {1} {1,2,3} 1 {1,5} 2 {1,2} 4 {1,4} DAG(Directed Acyclic Graph) 3 5 {1,2,3} {1,5} “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Loop Examples {B3} {B6} {B2, B3, B4} “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Identifying Loops Motivation majority of runtime focus optimization on loop bodies! remove redundant code, replace expensive operations ) speed up program Finding loops: easy… i = 1; j = 1; k = 1; A1: if i > 1000 goto L1; A2: if j > 1000 goto L2; A3: if k > 1000 goto L3; do something k = k + 1; goto A3; L3: j = j + 1; goto A2; L2: i = i + 1; goto A1; L1: halt for i = 1 to 1000 for j = 1 to 1000 for k = 1 to 1000 do something or harder (GOTOs) “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) T1 Transformation T2 Transformation “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 4 3 5 “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 14 T1 23 5 “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 14 23 5 “Advanced Compiler Techniques”

Interval Analysis(T1/T2 Trans) 12345 12345 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Structure Analysis 静态特征 特征描述 1 SS_No. 典型子结构唯一标识 2 Edge_No. 典型子结构中控制流边的唯一标识 3 I_last_of_head 该边首基本块最后一条指令的操作码 4 Br_direction 该边首基本块最后一条指令的跳转方向 5 I_pre_last 该边首基本块最后一条指令的前一条指令的操作码 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Weighted CFG Profiling – Run the application on 1 or more sample inputs, record some behavior Control flow profiling edge profile block profile Path profiling Cache profiling Memory dependence profiling Annotate control flow profile onto a CFG  weighted CFG Optimize more effectively with profile info!! Optimize for the common case Make educated guess Entry 20 BB1 10 10 BB2 BB3 10 10 BB4 20 BB5 BB6 20 BB7 20 Exit “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Local Optimization Optimization of basic blocks §8.5 “Advanced Compiler Techniques”

Transformations on basic blocks eliminating local common sub-expressions eliminating dead code reordering statements that do not depend on one another applying algebraic laws to reorder operands of three-address instructions All of the above require symbolic execution of the basic block, to obtain def/use information “Advanced Compiler Techniques”

Simple symbolic interpretation: next-use information If x is computed in statement i, and is an operand of statement j, j > i, its value must be preserved (register or memory) until j. If x is computed at k, k > i, the value computed at i has no further use, and be discarded (i.e. register reused) Next-use information is annotated over statements and symbol table. Computed on one backwards pass over statement. “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Next-Use Information Definitions Statement i assigns a value to x; Statement j has x as an operand; Control can flow from i to j along a path with no intervening assignments to x; Statement j uses the value of x computed at statement i. i.e., x is live at statement i. “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Computing next-use Use symbol table to annotate status of variables Each operand in a statement carries additional information: Operand liveness (boolean) Operand next use (later statement) On exit from block, all temporaries are dead (no next-use) “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Algorithm INPUT: a basic block B OUTPUT: at each statement i: x=y op z in B, create liveness and next-use for x, y, z METHOD: for each statement in B (backward) Retrieve liveness & next-use info from a table Set x to “not live” and “no next-use” Set y, z to “live” and the next uses of y,z to “i” Note: step 2 & 3 cannot be interchanged. E.g., x = x + y “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Example x = 1 y = 1 x = x + y z = y x = y + z Exit: x: live, 6 y: not live z: not live 3: x: live, 3 y: live, 3 z: not live, no 5: x: not live, no y: live, 5 z: live, 5 2: x: live, 3 y: not live, no z: not live, no Exit: x: live, 6 y: not live z: not live 4: x: not live, no y: live, 4 z: not live, no 1: x: not live, no y: not live, no z: not live, no “Advanced Compiler Techniques”

Computing dependencies in BB: the DAG Use directed acyclic graph (DAG) to recognize common subexpressions and remove redundant quadruples. Intermediate code optimization: basic block => DAG => improved block => assembly Leaves are labeled with identifiers and constants. Internal nodes are labeled with operators and identifiers “Advanced Compiler Techniques”

DAG Representation of Basic Blocks lec08-memoryorg DAG Representation of Basic Blocks December 29, 2018 Construct a DAG for a basic block 1. There is a node in the DAG for each of the initial values of the variables appearing in the basic block. 2. There is a node N associated with each statement s within the block. The children of N are those nodes corresponding to statements that are the last definitions, prior to s, of the operands used by s. 3. Node N is labeled by the operator applied at s, and also attached to N is the list of variables for which it is the last definition within the block. 4. Certain nodes are designated output nodes. These are the nodes whose variables are live on exit from the block; that is, their values may be used later, in another block of the flow graph. “Advanced Compiler Techniques”

“Advanced Compiler Techniques” DAG construction Forward pass over basic block For x = y op z; Find node labeled y, or create one Find node labeled z, or create one Create new node for op, or find an existing one with descendants y, z (need hash scheme) Add x to list of labels for new node Remove label x from node on which it appeared For x = y; Add x to list of labels of node which currently holds y a = b + c b = a – d c = b + c d = a - d c b d + — a + d0 b0 c0 “Advanced Compiler Techniques”

Finding Local Common Subexpr. Suppose b is not live on exit. a = b + c b = a – d c = b + c d = a - d c + b, d - + a d0 a = b + c d = a – d c = d + c a = b + c d = a – d b = d c = d + c b0 c0 “Advanced Compiler Techniques”

“Advanced Compiler Techniques” LCS: another example + - b0 c0 d0 a b e c a = b + c b = b – d c = c + d e = b + c “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Dead Code Elimination Delete any root that has no live variables attached Repeated application of this transformation will remove all nodes from the DAG that correspond to dead code. a = b + c b = b – d c = c + d e = b + c + - b0 c0 d0 a b e c On exit: a, b live c, e not live a = b + c b = b – d “Advanced Compiler Techniques”

The Use of Algebraic Identities Eliminate computations Reduction in strength Constant folding 2*3.14 = 6.28 evaluated at compile time Other algebraic transformations x*y => y*x x>y => x-y>0 a=b+c; e=c+d+b; => a=b+c; e=a+d; “Advanced Compiler Techniques”

Representation of Array References lec08-memoryorg Representation of Array References December 29, 2018 x = a[i] a[j]=y killed node x = a[i] a[j] = y z = a[i] z = x?? “Advanced Compiler Techniques”

Representation of Array References lec08-memoryorg Representation of Array References December 29, 2018 b = a + 12 x = b[i] b[j] = y a is an array. b is a position in the array a. x is killed by b[j]=y. “Advanced Compiler Techniques”

Pointer Assign. & Proc. Calls Problem of the following assignments x = *p *q = y we do not know what p or q point to. x = *p is a use of every variable *q = y is a possible assignment to every variable. the operator =* must take all nodes that are currently associated with identifiers as arguments, which is relevant for dead-code elimination. the *= operator kills all other nodes so far constructed in the DAG. Global pointer analyses can be used to limit the set of variables Procedure calls behave much like assignments through pointers. Assume that a procedure uses and changes any data to which it has access. If variable x is in the scope of a procedure P, a call to P both uses the node with attached variable x and kills that node. “Advanced Compiler Techniques”

Reassembling BBs From DAG 's b is not live on exit b is live on exit “Advanced Compiler Techniques”

Reassembling BBs From DAG 's The rules of reassembling The order of instructions must respect the order of nodes in the DAG Assignments to an array must follow all previous assignments to, or evaluations from, the same array Evaluations of array elements must follow any previous assignments to the same array Any use of a variable must follow all previous procedure calls or indirect assignments through a pointer. Any procedure call or indirect assignment through a pointer must follow all previous evaluations of any variable. “Advanced Compiler Techniques”

Peephole Optimization Dragon§8.7 Introduction to peephole Common techniques Algebraic identities An example “Advanced Compiler Techniques”

Peephole Optimization Simple compiler do not perform machine-independent code improvement They generates naive code It is possible to take the target hole and optimize it Sub-optimal sequences of instructions that match an optimization pattern are transformed into optimal sequences of instructions This technique is known as peephole optimization Peephole optimization usually works by sliding a window of several instructions (a peephole) “Advanced Compiler Techniques”

Peephole Optimization Goals: - improve performance - reduce memory footprint - reduce code size Method: 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one. redundant-instruction elimination algebraic simplifications flow-of-control optimizations use of machine idioms “Advanced Compiler Techniques”

Peephole Optimization Common Techniques “Advanced Compiler Techniques”

Peephole Optimization Common Techniques “Advanced Compiler Techniques”

Peephole Optimization Common Techniques “Advanced Compiler Techniques”

Peephole Optimization Common Techniques “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Algebraic identities Worth recognizing single instructions with a constant operand Eliminate computations A * 1 = A A * 0 = 0 A / 1 = A Reduce strenth A * 2 = A + A A/2 = A * 0.5 Constant folding 2 * 3.14 = 6.28 More delicate with floating-point “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Is this ever helpful? Why would anyone write X * 1? Why bother to correct such obvious junk code? In fact one might write #define MAX_TASKS 1 ... a = b * MAX_TASKS; Also, seemingly redundant code can be produced by other optimizations. This is an important effect. “Advanced Compiler Techniques”

Replace Multiply by Shift A := A * 4; Can be replaced by 2-bit left shift (signed/unsigned) But must worry about overflow if language does A := A / 4; If unsigned, can replace with shift right But shift right arithmetic is a well-known problem Language may allow it anyway (traditional C) “Advanced Compiler Techniques”

The Right Shift problem Arithmetic Right shift: shift right and use sign bit to fill most significant bits -5 111111...1111111011 SAR 111111...1111111101 which is -3, not -2 in most languages -5/2 = -2 “Advanced Compiler Techniques”

Addition chains for multiplication If multiply is very slow (or on a machine with no multiply instruction like the original SPARC), decomposing a constant operand into sum of powers of two can be effective: X * 125 = x * 128 - x*4 + x two shifts, one subtract and one add, which may be faster than one multiply Note similarity with efficient exponentiation method “Advanced Compiler Techniques”

Flow-of-control optimizations goto L1 . . . L1: goto L2 goto L2 . . . L1: goto L2 if a < b goto L1 . . . L1: goto L2 if a < b goto L2 . . . L1: goto L2 goto L1 . . . L1: if a < b goto L2 L3: if a < b goto L2 goto L3 . . . L3: “Advanced Compiler Techniques”

Peephole Opt: an Example debug = 0 . . . if(debug) { print debugging information } Source Code: debug = 0 . . . if debug = 1 goto L1 goto L2 L1: print debugging information L2: Intermediate Code: “Advanced Compiler Techniques”

Eliminate Jump after Jump debug = 0 . . . if debug = 1 goto L1 goto L2 L1: print debugging information L2: Before: debug = 0 . . . if debug  1 goto L2 print debugging information L2: After: “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Constant Propagation debug = 0 . . . if debug  1 goto L2 print debugging information L2: Before: debug = 0 . . . if 0  1 goto L2 print debugging information L2: After: “Advanced Compiler Techniques”

Unreachable Code (dead code elimination) debug = 0 . . . if 0  1 goto L2 print debugging information L2: Before: debug = 0 . . . After: “Advanced Compiler Techniques”

Peephole Optimization Summary Peephole optimization is very fast Small overhead per instruction since they use a small, fixed-size window It is often easier to generate naïve code and run peephole optimization than generating good code! “Advanced Compiler Techniques”

“Advanced Compiler Techniques” Summary Introduction to optimization Control Flow Analysis Basic knowledge Basic blocks Control-flow graphs Local Optimizations Peephole optimizations “Advanced Compiler Techniques”

“Advanced Compiler Techniques” HW & Next Time Homework EX 8.4.1, 8.5.1, 8.5.2 Next Time: Dataflow analysis Dragon§9.2 “Advanced Compiler Techniques”

If You Want to Get Started … Go to http://llvm.org Download and install LLVM on your favorite Linux box Read the installation instructions to help you Will need gcc 4.x Try to run it on a simple C program “Advanced Compiler Techniques”